A consistent approach to authoring articles/news provides the reader with a uniform experience when browsing the site. This includes quickly identifying the language, technology, and intended audience of the article, as well as some basic structure of the article, such as a table of contents, content structure, a conclusion, and a change history log.
Article Template
The following is a suggested template for authoring an article. The suggested template for the body of the article depends on whether the article is discussing a tool or component (such as a third party tool like Ivy, Drools, or an API, etc), a language or framework feature (such as delegates or thread pooling), or a solution to a problem (like "Assert Eventually" or a recovery service library.)
Obviously, none of this is fixed in stone--it is merely a template to give some basic structure to the article, from which the author is free to deviate. Therefore, without further ado, here is the suggested template and outline.
Article title
One line article description
Categorization
Language (C#, Java, VB.NET, SQL, XML, XSD, etc)
Technology
third party tool/framework/component?
language/framework feature?
solution to a problem?
Audience
Developer
Architect
Manager
CEO/CTO
Table of Contents
Content template:
Abstract
Introduction
Body
If discussing a tool/component:
Brief description of tool/component
Why is this tool useful?
What other tools/components does it work with and/or require?
Installation experience
Use case (one or more)
Configuration
Testing
If discussing a language or framework feature:
Brief description of the feature
Why is this important to know?
Use case (one or more)
What does the use case demonstrate?
What are the "take away" points?
If discussing a solution to a problem:
What is the problem?
Why is this an important problem to solve?
Use case (one or more)
Solution proposal
Technical issues that need to be addressed
Taking this further: additional research, features, etc.
Conclusion
Concluding statement about this article
What have we learned?
What was the author's experience?
Additional recommendations
Further reading
Statement about article content with regards to Risk Focus
Ex: Risk Focus is interested in leveraging this tool/component/technology to...
Ex: Risk Focus considers that a deep understanding of this technology is important in solving [x] problems
Article History
Changes to the article itself
Changes to code examples, etc.
Conclusion
Providing a uniform experience for the reader enhances the professionalism of the website, as well as facilitating searching and browsing, especially as the article count increases. It also provides a specific branding to the content, the goal of which is to identify the article as produced by Risk Focus.
Saxon is an integrated XML processor that offers tree based XML representation, schema validation, XPath and XQuery evaluation and XSLT processing. Saxon provides the several APIs to aid in programming and provide maximum flexibility to developers. The various options to the developer are listed below:
For XML parsing, the developer can choose from the following:
Working directly with Configuration object to parse and build documents
Using Saxon's new s9api interface
For XSLT processing, the developer can choose from the following:
Using JAXP Transformation API
Using Saxon's new s9api interface
For schema validation, the developer can choose from the following:
Using JAXP Schema Processing
Using Saxon's new s9api interface
For XQuery evaluation, the developer can choose from the following:
The legacy XQuery API (no longer recommended)
Invoking XQuery using the XQJ API
Using Saxon's new s9api interface
For XPath evaluation, the developer can choose from the following:
JAXP 1.3 API (extended to support XPath 2.0)
Using Saxon's new s9api interface
The primary motivation for using standardized interfaces like JAXP and XQJ is that it makes your code portable.
However, using Saxon's new s9api interface has several advantages. The s9api interface is designed to provide a uniform, integrated approach to XML processing capabilities supported by Saxon, taking advantage of the type safety offered by generics in Java 5. Unlike JAXP, it includes support for XSLT 2.0 capabilities. The API is simple to use and is intuitive.
The following code snippets illustrate how s9api interface can be used for XML processing.
The example below evaluates an XQuery that takes a parameter and returns the result node represented by XdmValue.
Example 1
Processor proc = new Processor(false);
XQueryCompiler comp = proc.newXQueryCompiler();
XQueryExecutable exp = comp.compile("declare variable $n external; for $i in 1 to $n return $i*$i");
XQueryEvaluator qe = exp.load();
qe.setExternalVariable(new QName("n"), new XdmAtomicValue(10));
XdmValue result = qe.evaluate();
...
The example below evaluates an XQuery against the data contained in file books.xml. The result is printed to the console.
Example 2
Processor proc = new Processor(false);
XQueryCompiler comp = proc.newXQueryCompiler();
XQueryExecutable exp = comp.compile("<copy>{//ITEM[1]}</copy>");
XQueryEvaluator qe = exp.load();
SAXSource source = new SAXSource(new FileInputStream("data/books.xml"));
qe.setSource(source);
Serializer out = new Serializer();
out.setOutputStream(System.out);
qe.run(new Destination(out))
...
The example below shows an XML file books.xml being parsed and loaded into memory. After that an XPath is evaluated and the result sequence is iterated in order to print data to console. This example also illustrates API for accessing Saxon's in- memory XML tree.
Example 3
publicstatic void main(String[] args) {
Processor proc = new Processor(false);
DocumentBuilder builder = proc.newDocumentBuilder();
builder.setLineNumbering(true);
builder.setWhitespaceStrippingPolicy(WhitespaceStrippingPolicy.ALL);
XdmNode booksDoc = builder.build(new File("data/books.xml"));
XPathCompiler xpath = proc.newXPathCompiler();
xpath.declareNamespace("saxon", "http://saxon.sf.net/");
// find all the ITEM elements, and for each one display the TITLE child
XPathSelector selector = xpath.compile("//ITEM").load();
selector.setContextItem(booksDoc);
QName titleName = new QName("TITLE");
for (XdmItem item: selector) {
XdmNode title = getChild((XdmNode)item, titleName);
System.out.println(title.getNodeName() + "(" + title.getLineNumber() + "): " + title.getStringValue());
}
...
}
// Helper method to get the first child of an element having a given name.
// If there is no child with the given name it returns nullprivatestatic XdmNode getChild(XdmNode parent, QName childName) {
XdmSequenceIterator iter = parent.axisIterator(Axis.CHILD, childName);
if (iter.hasNext())
return (XdmNode)iter.next();
elsereturnnull;
}
As illustrated in the above examples, the starting point while using the s9api interface is the creation of the Processor object. The Processor class encapsulates the configuration environment for XML processing. Another thing that should be noted is that in Example 3 is that same instance of Processor class is used for parsing the XML document and for XPath evaluation.
Saxon actually requires that the same Processor instance be used for parsing an XML document and for any subsequent processing like schema validation, XPath or XQuery evaluation or XSLT processing. If you try using a different Processor instance for any subsequent processing of XML document, Saxon will throw an exception. The Processor class is thread safe allowing a single instance to be shared across multiple threads.
For most software systems, a single instance of the Processor class representing XML environment setting is all that is needed. The instance can be created in the beginning and then reused. There are several ways in which this can be achieved.
If you are using Spring, you can define a Processor bean that can be used elsewhere.
On application startup, you could create a Processor instance and add it to a cache. Subsequently the application should only access the cache to get an instance of the Processor.
To conclude the ease, elegance and uniform approach to accessing Saxon's XML processing capabilities makes Saxon's s9api interface very appealing. Developers should definitely consider its use.
Saxon is an integrated XML processor that offers tree based XML representation, schema validation, XPath and XQuery evaluation and XSLT processing. Saxon provides the several APIs to aid in programming and provide maximum flexibility to developers. The various options to the developer are listed below:
For XML parsing, the developer can choose from the following:
Working directly with Configuration object to parse and build documents
Using Saxon's new s9api interface
For XSLT processing, the developer can choose from the following:
Using JAXP Transformation API
Using Saxon's new s9api interface
For schema validation, the developer can choose from the following:
Using JAXP Schema Processing
Using Saxon's new s9api interface
For XQuery evaluation, the developer can choose from the following:
The legacy XQuery API (no longer recommended)
Invoking XQuery using the XQJ API
Using Saxon's new s9api interface
For XPath evaluation, the developer can choose from the following:
JAXP 1.3 API (extended to support XPath 2.0)
Using Saxon's new s9api interface
The primary motivation for using standardized interfaces like JAXP and XQJ is that it makes your code portable.
However, using Saxon's new s9api interface has several advantages. The s9api interface is designed to provide a uniform, integrated approach to XML processing capabilities supported by Saxon, taking advantage of the type safety offered by generics in Java 5. Unlike JAXP, it includes support for XSLT 2.0 capabilities. The API is simple to use and is intuitive.
The following code snippets illustrate how s9api interface can be used for XML processing.
The example below evaluates an XQuery that takes a parameter and returns the result node represented by XdmValue.
Example 1
Processor proc = new Processor(false);
XQueryCompiler comp = proc.newXQueryCompiler();
XQueryExecutable exp = comp.compile("declare variable $n external; for $i in 1 to $n return $i*$i");
XQueryEvaluator qe = exp.load();
qe.setExternalVariable(new QName("n"), new XdmAtomicValue(10));
XdmValue result = qe.evaluate();
...
The example below evaluates an XQuery against the data contained in file books.xml. The result is printed to the console.
Example 2
Processor proc = new Processor(false);
XQueryCompiler comp = proc.newXQueryCompiler();
XQueryExecutable exp = comp.compile("<copy>{//ITEM[1]}</copy>");
XQueryEvaluator qe = exp.load();
SAXSource source = new SAXSource(new FileInputStream("data/books.xml"));
qe.setSource(source);
Serializer out = new Serializer();
out.setOutputStream(System.out);
qe.run(new Destination(out))
...
The example below shows an XML file books.xml being parsed and loaded into memory. After that an XPath is evaluated and the result sequence is iterated in order to print data to console. This example also illustrates API for accessing Saxon's in- memory XML tree.
Example 3
publicstatic void main(String[] args) {
Processor proc = new Processor(false);
DocumentBuilder builder = proc.newDocumentBuilder();
builder.setLineNumbering(true);
builder.setWhitespaceStrippingPolicy(WhitespaceStrippingPolicy.ALL);
XdmNode booksDoc = builder.build(new File("data/books.xml"));
XPathCompiler xpath = proc.newXPathCompiler();
xpath.declareNamespace("saxon", "http://saxon.sf.net/");
// find all the ITEM elements, and for each one display the TITLE child
XPathSelector selector = xpath.compile("//ITEM").load();
selector.setContextItem(booksDoc);
QName titleName = new QName("TITLE");
for (XdmItem item: selector) {
XdmNode title = getChild((XdmNode)item, titleName);
System.out.println(title.getNodeName() + "(" + title.getLineNumber() + "): " + title.getStringValue());
}
...
}
// Helper method to get the first child of an element having a given name.
// If there is no child with the given name it returns nullprivatestatic XdmNode getChild(XdmNode parent, QName childName) {
XdmSequenceIterator iter = parent.axisIterator(Axis.CHILD, childName);
if (iter.hasNext())
return (XdmNode)iter.next();
elsereturnnull;
}
As illustrated in the above examples, the starting point while using the s9api interface is the creation of the Processor object. The Processor class encapsulates the configuration environment for XML processing. Another thing that should be noted is that in Example 3 is that same instance of Processor class is used for parsing the XML document and for XPath evaluation.
Saxon actually requires that the same Processor instance be used for parsing an XML document and for any subsequent processing like schema validation, XPath or XQuery evaluation or XSLT processing. If you try using a different Processor instance for any subsequent processing of XML document, Saxon will throw an exception. The Processor class is thread safe allowing a single instance to be shared across multiple threads.
For most software systems, a single instance of the Processor class representing XML environment setting is all that is needed. The instance can be created in the beginning and then reused. There are several ways in which this can be achieved.
If you are using Spring, you can define a Processor bean that can be used elsewhere.
On application startup, you could create a Processor instance and add it to a cache. Subsequently the application should only access the cache to get an instance of the Processor.
To conclude the ease, elegance and uniform approach to accessing Saxon's XML processing capabilities makes Saxon's s9api interface very appealing. Developers should definitely consider its use.
Using XML in any software system requires you to parse and build documents. We have frequently used parsers like Xerces (default XML parser in current JDK distribution) or Crimson, in order to load XML documents into memory.
When we use any of the above mentioned parsers to load documents into memory, the XML content is represented in memory by a Document Object Model (DOM). DOM is cross-platform and language-independent convention, standardized by the World Wide Web Consortium (W3C) for representing and interacting with objects in HTML, XHTML and XML documents.
Although standardization allows for interoperability, the W3C DOM and its API has been found to be unnecessarily complicated and difficult to use by Java developers and developers of other high level Object Oriented languages. This has led to the development of several competing document object models.
Java developers can choose from the following alternatives to W3C DOM
JDOM was the first major alternative to W3C DOM. JDOM is not a wrapper on W3C DOM but an independent Java-based "document object model". It provides a way to represent XML document for easy and efficient reading, manipulation, and writing. It has a straightforward API, is a lightweight and fast, is optimized for the Java programmer and is much easier to use than W3C DOM.
It is important to note that JDOM is NOT an XML parser, like Xerces or Crimson. It is a document object model that uses XML parsers to build documents. JDOM's SAXBuilder class uses the SAX events generated by an XML parser (like Xerces) to build a JDOM tree. The default XML parser used by JDOM is the JAXP-selected parser (Xerces is the default JAXP-selected parser in current JDK version). However JDOM can use nearly any parser.
Both dom4j and XOM were inspired by JDOM. Like JDOM, they are independent document object models that use parsers like Xerces to build their respective xml tree structures. Although dom4j and XOM share some features with JDOM, their API has evolved along their own an independent trajectory.
dom4j was created by developers who were dissatisfied with JDOM and wanted to add to it. They added features like that they thought were good in W3C DOM. In particular the API is based on Java interfaces, so that plug and play document object model implementations are allowed. Although dom4j implements various DOM interfaces, it is not fully compliant with W3C (expected since it is not W3C DOM). Most developers find the API more involved than JDOM.
XOM was created by developer named Elliotte Rusty Harold, who was dissatisfied with JDOM and wanted to subtract from it. XOM emphasises correctness, simplicity, and performance in its tree-based API for XML representation. In the development of its API, it retained ideas that were good in JDOM and also included ideas believed to be elegant in W3C DOM and XPath.
Saxon was born out of the efforts of Michael Kay. It is an integrated XML processor that offers tree based XML representation, schema validation, XPath and XQuery evaluation and XSLT processing. The earlier versions of Saxon (prior to 7.2) used to have a built-in XML parser (called Aelfred XML Parser) that was used for XML parsing. However, subsequent versions removed the bundled parser and now Saxon (like JDOM, dom4j and XOM) relies on JAXP-selected parser for XML parsing. Saxon represents XML in its own xml tree structure that is very simple to use and is optimized for XPath, XQuery and XSLT.
Java developers should consider the above alternatives to W3C DOM when they want to parse and load documents into memory. Depending upon the software needs, the alternatives can potentially speed up development and result in smaller and easier to maintain code.
Using XML in any software system requires you to parse and build documents. We have frequently used parsers like Xerces (default XML parser in current JDK distribution) or Crimson, in order to load XML documents into memory.
When we use any of the above mentioned parsers to load documents into memory, the XML content is represented in memory by a Document Object Model (DOM). DOM is cross-platform and language-independent convention, standardized by the World Wide Web Consortium (W3C) for representing and interacting with objects in HTML, XHTML and XML documents.
Although standardization allows for interoperability, the W3C DOM and its API has been found to be unnecessarily complicated and difficult to use by Java developers and developers of other high level Object Oriented languages. This has led to the development of several competing document object models.
Java developers can choose from the following alternatives to W3C DOM
JDOM was the first major alternative to W3C DOM. JDOM is not a wrapper on W3C DOM but an independent Java-based "document object model". It provides a way to represent XML document for easy and efficient reading, manipulation, and writing. It has a straightforward API, is a lightweight and fast, is optimized for the Java programmer and is much easier to use than W3C DOM.
It is important to note that JDOM is NOT an XML parser, like Xerces or Crimson. It is a document object model that uses XML parsers to build documents. JDOM's SAXBuilder class uses the SAX events generated by an XML parser (like Xerces) to build a JDOM tree. The default XML parser used by JDOM is the JAXP-selected parser (Xerces is the default JAXP-selected parser in current JDK version). However JDOM can use nearly any parser.
Both dom4j and XOM were inspired by JDOM. Like JDOM, they are independent document object models that use parsers like Xerces to build their respective xml tree structures. Although dom4j and XOM share some features with JDOM, their API has evolved along their own an independent trajectory.
dom4j was created by developers who were dissatisfied with JDOM and wanted to add to it. They added features like that they thought were good in W3C DOM. In particular the API is based on Java interfaces, so that plug and play document object model implementations are allowed. Although dom4j implements various DOM interfaces, it is not fully compliant with W3C (expected since it is not W3C DOM). Most developers find the API more involved than JDOM.
XOM was created by developer named Elliotte Rusty Harold, who was dissatisfied with JDOM and wanted to subtract from it. XOM emphasises correctness, simplicity, and performance in its tree-based API for XML representation. In the development of its API, it retained ideas that were good in JDOM and also included ideas believed to be elegant in W3C DOM and XPath.
Saxon was born out of the efforts of Michael Kay. It is an integrated XML processor that offers tree based XML representation, schema validation, XPath and XQuery evaluation and XSLT processing. The earlier versions of Saxon (prior to 7.2) used to have a built-in XML parser (called Aelfred XML Parser) that was used for XML parsing. However, subsequent versions removed the bundled parser and now Saxon (like JDOM, dom4j and XOM) relies on JAXP-selected parser for XML parsing. Saxon represents XML in its own xml tree structure that is very simple to use and is optimized for XPath, XQuery and XSLT.
Java developers should consider the above alternatives to W3C DOM when they want to parse and load documents into memory. Depending upon the software needs, the alternatives can potentially speed up development and result in smaller and easier to maintain code.
A number of times I have had the same conversation, which usually starts with my partner in the conversation stating "With the addition of the 1.5 concurrency package, there is no need to use intrinsic locks anymore and its use should be deprecated". Sadly, they fail to understand the nuances of the two mechanisms. First, a quick look at intrinsic vs. explicit locking.
Intrinsic locking is the use of the synchronized key word. It can take a number of familiar forms, such as
While explicit locking relies on the new Lock interface
privatefinal Lock lock = new ReentrantLock();
public void incrementCount() {
lock.lock();
try {
...
} finally {
lock.unlock();
}
}
Observing the code at hand, its quite obvious what one of the drawbacks of using an explicit lock is, the use of a try/finally block. It can be the source of insidious and annoying bugs when a developer forgets to release the lock.
Setting the syntaxual difference aside, the JVM treats the two mechanisms quite differently when it comes to Thread States and Optimizations.
Thread States
When profiling the JVM for performance issues, its quite common to check for liveliness issues by doing a full thread dump:
ctrl-break on windows in console jvm
or kill -3 on linux.
The deadlock detection algorithms are quite robust and will explicitly report deadlocks caused by intrinsic locking. However, the same cannot be said for explicit locks. Not even 1.6_12 has the ability to detect a deadlock using explicit locks.
Looking at Thread states, you can see that a deadlock caused by synchronization keyword is reported as BLOCKED, while for Locks, WAITING.
Its not impossible to deduce that there is a deadlock, but we are on our own. So, when looking at liveliness issues, its important to know if the code base is dependent on explicit locks.
Locking Optimizations
Another difference between the two mechanisms is how much control you give to the JVM. If you use an explicit lock, you take away any ability of the JVM to optimize the locking strategies. The following 3 optimizations have been introduced for intrinsic locks:
Lock Elision - The jvm can opt to remove synchronization when references are analyzed to never be accessed by another thread.
Adaptive Locking - Spin wait vs. thread suspension
A number of times I have had the same conversation, which usually starts with my partner in the conversation stating "With the addition of the 1.5 concurrency package, there is no need to use intrinsic locks anymore and its use should be deprecated". Sadly, they fail to understand the nuances of the two mechanisms. First, a quick look at intrinsic vs. explicit locking.
Intrinsic locking is the use of the synchronized key word. It can take a number of familiar forms, such as
While explicit locking relies on the new Lock interface
privatefinal Lock lock = new ReentrantLock();
public void incrementCount() {
lock.lock();
try {
...
} finally {
lock.unlock();
}
}
Observing the code at hand, its quite obvious what one of the drawbacks of using an explicit lock is, the use of a try/finally block. It can be the source of insidious and annoying bugs when a developer forgets to release the lock.
Setting the syntaxual difference aside, the JVM treats the two mechanisms quite differently when it comes to Thread States and Optimizations.
Thread States
When profiling the JVM for performance issues, its quite common to check for liveliness issues by doing a full thread dump:
ctrl-break on windows in console jvm
or kill -3 on linux.
The deadlock detection algorithms are quite robust and will explicitly report deadlocks caused by intrinsic locking. However, the same cannot be said for explicit locks. Not even 1.6_12 has the ability to detect a deadlock using explicit locks.
Looking at Thread states, you can see that a deadlock caused by synchronization keyword is reported as BLOCKED, while for Locks, WAITING.
Its not impossible to deduce that there is a deadlock, but we are on our own. So, when looking at liveliness issues, its important to know if the code base is dependent on explicit locks.
Locking Optimizations
Another difference between the two mechanisms is how much control you give to the JVM. If you use an explicit lock, you take away any ability of the JVM to optimize the locking strategies. The following 3 optimizations have been introduced for intrinsic locks:
Lock Elision - The jvm can opt to remove synchronization when references are analyzed to never be accessed by another thread.
Adaptive Locking - Spin wait vs. thread suspension
MarkLogic is an industry leading XML Server with capabilities to store, manage, enrich, search, navigate, and dynamically deliver content. The server, designed and optimized for XML content, indexes XML structure and text and provides a wide range of query capabilities, such as
XML and Full Text Search (Boolean, phrase, range, proximity, wild card, stemming etc)
Complete W3C standard XQuery implementation
XQuery Extensions for inserts, updates, profiling, debugging etc.
MarkLogic also features flexible content management. This includes support for
Transforming and reformatting content for different uses (html, sms, xsl-fo for pdf)
Assembling content from different sources
Delivering content over multiple channels, feeds, devices and protocols
Basic MarkLogic Training
We recently had the opportunity to attend a 3-day basic training on MarkLogic. The training required familiarity with XML and XML schema as a prerequisite. However, no prior familiarity with XQuery or Mark Logic was expected, although a person will find such a background very helpful. The training was split evenly between XQuery and Mark Logic specific materials and emphasized a hands on philosophy with equal time for lectures and labs.
Topics covered in Training
Server Overview and Core Architectural Components
Installation and Basic Configuration to get started
XPath Expressions and Syntax
XQuery Basics
Data Model and Types
Functions and Operators
FLWOR expressions and Modules
Basic MarkLogic XQuery Extensions
Loading, deleting and updating documents and document collections
Exception handling and logging
Utility functions for nodes and atomic types like dates and strings
Database and Application Server Configuration
Setting up databases for security, schema, modules and triggers
Managing forests and XML Data fragmentation
Indexing
This included a brief discussion on the following
Text controls for case sensitive, stemmed search, fast phrase/word search, wild card search, position search
Phrase Controls for selectively omitting or ignoring markup for phrasing purposes
Range Indexes for ordered values
Setting up XDBC, HTTP and WebDAV servers
Marklogic XQuery Extensions for Search (relevance based search heuristics)
Basic Administrative Tasks
Discussion on content storage in Marklogic and associated settings
Merge tasks and policies for optimizing on disk space and speed
Doing backups and restores, canceling queries
Our first impressions of MarLogic from the training were very positive and we plan on further investigating the product features and how we can utilize them in financial services domain.
NOTE: The training was a beginner's training did not cover everything about MarkLogic. There are numerous advanced topics that were not discussed. These included
Advanced Search and Indexing
Fields
Word Lexicons (database, element, attribute)
Value lexicons (element or attribute)
Document-URI and Collection-URI lexicons
Thesaurus API and Spelling API
Geo-spatial Indexes and Diacritic sensitive indexes
Query Parsing, Rewriting and Reverse Queries
Content Processing Framework (for updates and transformation of content)
MarkLogic is an industry leading XML Server with capabilities to store, manage, enrich, search, navigate, and dynamically deliver content. The server, designed and optimized for XML content, indexes XML structure and text and provides a wide range of query capabilities, such as
XML and Full Text Search (Boolean, phrase, range, proximity, wild card, stemming etc)
Complete W3C standard XQuery implementation
XQuery Extensions for inserts, updates, profiling, debugging etc.
MarkLogic also features flexible content management. This includes support for
Transforming and reformatting content for different uses (html, sms, xsl-fo for pdf)
Assembling content from different sources
Delivering content over multiple channels, feeds, devices and protocols
Basic MarkLogic Training
We recently had the opportunity to attend a 3-day basic training on MarkLogic. The training required familiarity with XML and XML schema as a prerequisite. However, no prior familiarity with XQuery or Mark Logic was expected, although a person will find such a background very helpful. The training was split evenly between XQuery and Mark Logic specific materials and emphasized a hands on philosophy with equal time for lectures and labs.
Topics covered in Training
Server Overview and Core Architectural Components
Installation and Basic Configuration to get started
XPath Expressions and Syntax
XQuery Basics
Data Model and Types
Functions and Operators
FLWOR expressions and Modules
Basic MarkLogic XQuery Extensions
Loading, deleting and updating documents and document collections
Exception handling and logging
Utility functions for nodes and atomic types like dates and strings
Database and Application Server Configuration
Setting up databases for security, schema, modules and triggers
Managing forests and XML Data fragmentation
Indexing
This included a brief discussion on the following
Text controls for case sensitive, stemmed search, fast phrase/word search, wild card search, position search
Phrase Controls for selectively omitting or ignoring markup for phrasing purposes
Range Indexes for ordered values
Setting up XDBC, HTTP and WebDAV servers
Marklogic XQuery Extensions for Search (relevance based search heuristics)
Basic Administrative Tasks
Discussion on content storage in Marklogic and associated settings
Merge tasks and policies for optimizing on disk space and speed
Doing backups and restores, canceling queries
Our first impressions of MarLogic from the training were very positive and we plan on further investigating the product features and how we can utilize them in financial services domain.
NOTE: The training was a beginner's training did not cover everything about MarkLogic. There are numerous advanced topics that were not discussed. These included
Advanced Search and Indexing
Fields
Word Lexicons (database, element, attribute)
Value lexicons (element or attribute)
Document-URI and Collection-URI lexicons
Thesaurus API and Spelling API
Geo-spatial Indexes and Diacritic sensitive indexes
Query Parsing, Rewriting and Reverse Queries
Content Processing Framework (for updates and transformation of content)
Most software systems have a number of services running on multiple hosts, or they may have numerous instances of a service running as part of cluster. This results in multiple log files with each service or service instance writing to its own log file.
Under these circumstances, debugging an error or tracing how a request has been serviced by a system may require the developer to check and monitor several log files. This can make debugging and analyzing request handling a very time consuming task.
Most developers quickly realize that the ability to look and query a central log file that contains log errors from all system services will greatly help in debugging and tracing how requests get handled. A Graphical UI tool that displays log events as they occur can also be very useful.
Technical Solution
So how do we quickly achieve this, centralization of logs and convenient UI for viewing log events ?
Fortunately, most of the hard workis already done if you use the Apache Log4j logging framework and its companion application Chainsaw (we will assume that you have Log4j version 1.2.15 and Chainsaw v2). There are two approaches that can be taken
All the service nodes should log to a shared file. The shared log file will thus be an aggregated log file and Chainsaw clients tail it.
The service nodes publish log events to central log server. The central log server aggregates the logs and also forwards them to Chainsaw clients.
The following sections will illustrate and discuss both of these approaches.
Approach 1 (Logging to a shared file)
This approach involves configuring all the service nodes so that they log to a shared log file. Chainsaw can then tail the shared log file to display the events. The idea is illustrated in the following diagram:
Chainsaw clients that are tailing the shared log file will render the log events in Chainsaw UI. The user will see a layout similar to one shown below:
We can be achieve the above illustrated setup by utilizing the following Log4J and Chainsaw features:
The standard Log4j RollingFileAppender for formatting log events and writing them out to a file.
Chainsaw UI application graphically renders log events. It has various receivers for sourcing those events.
Chainsaw provides a FileReceiver. This tails a specified file, reads every new line to a log event using the specified format pattern.
Application/Service Node Log4j Setup
# Set root logger level to DEBUG and its only appender to A1.
log4j.rootLogger=INFO, A1
# A1 is set to be a RollingFileAppender.
log4j.appender.A1=org.apache.log4j.RollingFileAppender
log4j.appender.A1.File=Z:/log/trade-server.log
log4j.appender.A1.layout=org.apache.log4j.PatternLayout
log4j.appender.A1.layout.ConversionPattern=%d{yyyyMMdd HH:mm:ss,SSS} [%t] %p %c %x - %m%n
The above log4j setup creates a Rolling File Appender that logs to a shared file (Z: is a mapped drive representing a shared location). All the service nodes will log to this shared log file.
Chainsaw Setup
Chainsaw can be utilized to view system log events as follows:
Start chainsaw UI by executing the .bat file.
Create receivers manually as follows.
Using Receivers panel, create a FileReceiver with the following options:
Chainsaw client will start tailing the specified shared file. It will read every new line and parse it according to the specified log format to get a log event. The events will be displayed in a new event log tab. Multiple Chainsaw clients can tail the shared file and receive events.
Limitations
There are several issues / limitations with this approach:
Every service node has to use the SAME layout pattern in log4j configuration. A consistent log file pattern is needed so that Chainsaw can parse lines in the log file to log events.
All service nodes MUST have the same system time, otherwise the aggregated shared log file will display inconsistent time stamps. Most production system are setup so that system times and timezones are the same on all boxes. So it should not be a big issue, but if it is not the case then you have a problem.
Typically log files get rolled over. In Log4J we can roll over files on a daily basis or after they reach a specific size based on the configuration. If all service nodes are logging to the same shared file this configuration MUST be the same for all service nodes. Otherwise you may have a situation where one node rolls over to new log file while the other is logging to an old one; so some log events go to one file and the other to another file. One way to ensure this is to have a common log4j config file that can be shared.
Even if you ensure the above three, your shared log file will have timestamps that are out of order. That's because of messages are being logged by different processes. The log4j file appender inside a JVM process can only guarantee that logs generated by that process are written in the right order. It doesn't know other processes are logging to same file and cant co-ordinate with them. Seeing timestamps out of order can be very confusing. It is possible to distinguish messages from different processes and reduce confusion by including service node identifier in the log message. For example, the following snippet shows logging by 2 server nodes TradeServer1 and TradeServer2
20090415 10:44:33,607 TradeServer1 [main] INFO com.riskfocusinc.process.xpath.TradeProcessorImpl - Loaded Restricted Parties
20090415 10:44:33,609 TradeServer1 [main] INFO com.riskfocusinc.process.xpath.TradeProcessorImpl - Loaded Entity Hierarchy
20090415 10:44:33,607 TradeServer2 [main] INFO com.riskfocusinc.process.xpath.TradeProcessorImpl - Loaded Restricted Parties
Chainsaw FileReceiver is configured by specifying the complete log file URL. The receiver then opens a read lock on the file. As long as there is a Chainsaw client and FileReceiver running, you wont be able to delete, move or rename the log file because of the lock. This is a big irritant. If a user leaves Chainsaw open, the system admin cant delete, move or rename log file.
As mentioned in previous points, log files typically get rolled over and a Chainsaw client FileReceiver is configured by specifying the complete log file URL. If you have a client with a FileReceiver running to monitor events and the log file gets rolled over, you will stop receiving events. You wont realize whats happened unless your confirm log file gets rolled over. You will have to shut down this receiver, create a new one pointing to new log file to receive events. Most often, log file rolling is done based on size. If all service nodes log to the same file, it likely that log file will reach max size and get rolled over quite frequently. In that case, the chainsaw client users will have to frequently reset/restart receivers.
The final issue is the Performance Factor. If you have multiple applications logging to same shared file and the shared file being tailed and monitored by Chainsaw clients, then there will be overheads related to
Accessing the file located on shared drive or file server.
File IO Synchronization and Blocking, as multiple service nodes update the file by appending logs and multiple clients read by tailing the file.
Due to the above issues/limitations, we do not recommend this approach. The first six issues do not exist in the alternative approach (involving the use of a central log server) that is discussed in the next section.
Regarding the Performance Factor mentioned in last point, it is very hard to measure it since it will depend on Hardware/OS setup, number of service nodes and chainsaw clients and volume of concurrent logging. For instance, using a NAS drive (or a file server) will be more efficient than a mapped network drive in Windows. However, we believe that for most applications its best to avoid concurrent file I/O and that it is more efficient for service nodes to publish log events to a central server over a socket.
Hence, the desired technical solution is the approach discussed in the following section.
Approach 2 (Using a central Log Server)
This approach involves setting up the logging system such that multiple service nodes will publish events to a central log server. The central log server will aggregate the logs and forward log events to any connected chainsaw clients. Multiple Chainsaw clients (and hence several users) will be able to connect and subscribe for log events. The idea is illustrated in following diagram:
The Chainsaw application clients connected to the log server will receive log events and will render the events in the Chainsaw UI. The user will see a layout displaying the log events similar to the one shown below:
We can achieve the above illustrated setup by utilizing the following Log4j and Chainsaw features:
Log4j has a class called SimpleSocketServer. This class is capable of receiving log events over a socket and processing them through any of the supported Log4j Appenders.
Log4j provides a SocketAppender. This appender opens a socket connection to a log server and then publishes log events over the connection.
Log4j also provides a SocketHubAppender. This creates a server socket to accept client connections and then publishes events to clients over accepted client connections.
Chainsaw UI application graphically renders log events. It has various receivers for sourcing those events.
Chainsaw provides a SocketHubReceiver. It opens a connection to server socket and reads log events over the connection.
Application/Service Node Log4j Setup
log4j.rootLogger=INFO, A1, Socket
# A1 is set to be a RollingFileAppender.
log4j.appender.A1=org.apache.log4j.RollingFileAppender
log4j.appender.A1.File=log/trade-server.log
log4j.appender.A1.layout=org.apache.log4j.PatternLayout
log4j.appender.A1.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n
# The Socket Appender
log4j.appender.Socket=org.apache.log4j.net.SocketAppender
log4j.appender.Socket.Threshold=INFO
log4j.appender.Socket.RemoteHost=<Log Server Host>
log4j.appender.Socket.Port=<Log Server Port>
log4j.appender.Socket.ReconnectionDelay=5000
log4j.appender.Socket.LocationInfo=true
log4j.appender.Socket.Application=<Service Node Identifier>
The Log4j setup described above creates a Socket Appender that will publish log events with level 'INFO' and to a log server running at a specified host and port. This represents the location of the single central log server and these values should be the same for all the service nodes. In addition, there is the Application property used to hold the service node identifier. Each service node should have a unique value for this property, identifying the node.
Log Server Setup
On your <Log Server Host> machine, we need to start the log server that will listen at port <Log Server Port> in order to listen to log events published by the various service nodes. This can be done using the following command
java org.apache.log4j.net.SimpleSocketServer <Log Server Port> config/logserver.txt
The second argument in the above command specifies the log4j config file that defines how the log events received by the server will be processed. The contents of the logserver.txt file should be as follows:
# Set root logger level to INFO
log4j.rootLogger=INFO, F1, SocketHub
# F1 is set to be a RollingFileAppender (Optional used only for producing an aggregated log file)
log4j.appender.F1=org.apache.log4j.RollingFileAppender
log4j.appender.F1.File=log/log-server.log
log4j.appender.F1.layout=org.apache.log4j.PatternLayout
log4j.appender.F1.layout.ConversionPattern=%-4r [%t] %X{application} %-5p %c %x - %m%n
# The Socket Hub Appender (Used for servicing Chainsaw clients)
log4j.appender.SocketHub=org.apache.log4j.net.SocketHubAppender
log4j.appender.SocketHub.Threshold=INFO
log4j.appender.SocketHub.Port=<Chainsaw Port>
The above log4j file has two appenders for processing log events:
The first appender, F1 is a file appender that writes out all the log events received by the log server to a file. Effectively, it creates an aggregated log containing all the log messages published by the various service nodes. For each log message, the pattern prints the source service node id.
The second appender is a SocketHubAppender. It creates a server socket listening at <Chainsaw Port> to service Chainsaw clients. Chainsaw clients can connect to the log server and the log server will forward all events it receives to the connected clients.
Chainsaw Setup
Chainsaw can be utilized to view system log events as follows:
Start chainsaw UI by executing the .bat file.
Create receivers manually as follows.
Using Receivers panel, create a SocketHubReceiver with following option:
name = LogServerReceiver
host = <Log Server Host>
port = <Chainsaw Port>
Chainsaw client will now start receiving events forwarded by the log server. The events will be displayed in a new event log tab. Multiple Chainsaw clients can connect to the log server and receive events.
As per Chainsaw documentation, the application property is used as an identifier for the event log tab. So intuitively in our setup, events from different service nodes should go to different tabs since they will have different values for the application property. This is highly desirable since it informs the user the source of log event.
However, when we tested this setup we noticed that Chainsaw was not routing events to different tabs. All the events were being sent to the same tab. We have not been able to identify why this is happening. As always we are open to suggestions and explanations.
Conclusion
This post discussed how Log4J and Chainsaw can be used to to create a centralized aggregated log file with messages from all service nodes. The log events can be viewed in a graphical UI using Chainsaw.
There are two ways to achieve. Both of them were discussed. The recommended approach is to use a central log server.
However, even with the recommended solution we do have one drawback. Chainsaw routes events from all the service nodes to the same tab. This behavior is unexpected since events from different service nodes have different values for the application property. Why its happening needs to be further investigated.
Most software systems have a number of services running on multiple hosts, or they may have numerous instances of a service running as part of cluster. This results in multiple log files with each service or service instance writing to its own log file.
Under these circumstances, debugging an error or tracing how a request has been serviced by a system may require the developer to check and monitor several log files. This can make debugging and analyzing request handling a very time consuming task.
Most developers quickly realize that the ability to look and query a central log file that contains log errors from all system services will greatly help in debugging and tracing how requests get handled. A Graphical UI tool that displays log events as they occur can also be very useful.
Technical Solution
So how do we quickly achieve this, centralization of logs and convenient UI for viewing log events ?
Fortunately, most of the hard workis already done if you use the Apache Log4j logging framework and its companion application Chainsaw (we will assume that you have Log4j version 1.2.15 and Chainsaw v2). There are two approaches that can be taken
All the service nodes should log to a shared file. The shared log file will thus be an aggregated log file and Chainsaw clients tail it.
The service nodes publish log events to central log server. The central log server aggregates the logs and also forwards them to Chainsaw clients.
The following sections will illustrate and discuss both of these approaches.
Approach 1 (Logging to a shared file)
This approach involves configuring all the service nodes so that they log to a shared log file. Chainsaw can then tail the shared log file to display the events. The idea is illustrated in the following diagram:
Chainsaw clients that are tailing the shared log file will render the log events in Chainsaw UI. The user will see a layout similar to one shown below:
We can be achieve the above illustrated setup by utilizing the following Log4J and Chainsaw features:
The standard Log4j RollingFileAppender for formatting log events and writing them out to a file.
Chainsaw UI application graphically renders log events. It has various receivers for sourcing those events.
Chainsaw provides a FileReceiver. This tails a specified file, reads every new line to a log event using the specified format pattern.
Application/Service Node Log4j Setup
# Set root logger level to DEBUG and its only appender to A1.
log4j.rootLogger=INFO, A1
# A1 is set to be a RollingFileAppender.
log4j.appender.A1=org.apache.log4j.RollingFileAppender
log4j.appender.A1.File=Z:/log/trade-server.log
log4j.appender.A1.layout=org.apache.log4j.PatternLayout
log4j.appender.A1.layout.ConversionPattern=%d{yyyyMMdd HH:mm:ss,SSS} [%t] %p %c %x - %m%n
The above log4j setup creates a Rolling File Appender that logs to a shared file (Z: is a mapped drive representing a shared location). All the service nodes will log to this shared log file.
Chainsaw Setup
Chainsaw can be utilized to view system log events as follows:
Start chainsaw UI by executing the .bat file.
Create receivers manually as follows.
Using Receivers panel, create a FileReceiver with the following options:
Chainsaw client will start tailing the specified shared file. It will read every new line and parse it according to the specified log format to get a log event. The events will be displayed in a new event log tab. Multiple Chainsaw clients can tail the shared file and receive events.
Limitations
There are several issues / limitations with this approach:
Every service node has to use the SAME layout pattern in log4j configuration. A consistent log file pattern is needed so that Chainsaw can parse lines in the log file to log events.
All service nodes MUST have the same system time, otherwise the aggregated shared log file will display inconsistent time stamps. Most production system are setup so that system times and timezones are the same on all boxes. So it should not be a big issue, but if it is not the case then you have a problem.
Typically log files get rolled over. In Log4J we can roll over files on a daily basis or after they reach a specific size based on the configuration. If all service nodes are logging to the same shared file this configuration MUST be the same for all service nodes. Otherwise you may have a situation where one node rolls over to new log file while the other is logging to an old one; so some log events go to one file and the other to another file. One way to ensure this is to have a common log4j config file that can be shared.
Even if you ensure the above three, your shared log file will have timestamps that are out of order. That's because of messages are being logged by different processes. The log4j file appender inside a JVM process can only guarantee that logs generated by that process are written in the right order. It doesn't know other processes are logging to same file and cant co-ordinate with them. Seeing timestamps out of order can be very confusing. It is possible to distinguish messages from different processes and reduce confusion by including service node identifier in the log message. For example, the following snippet shows logging by 2 server nodes TradeServer1 and TradeServer2
20090415 10:44:33,607 TradeServer1 [main] INFO com.riskfocusinc.process.xpath.TradeProcessorImpl - Loaded Restricted Parties
20090415 10:44:33,609 TradeServer1 [main] INFO com.riskfocusinc.process.xpath.TradeProcessorImpl - Loaded Entity Hierarchy
20090415 10:44:33,607 TradeServer2 [main] INFO com.riskfocusinc.process.xpath.TradeProcessorImpl - Loaded Restricted Parties
Chainsaw FileReceiver is configured by specifying the complete log file URL. The receiver then opens a read lock on the file. As long as there is a Chainsaw client and FileReceiver running, you wont be able to delete, move or rename the log file because of the lock. This is a big irritant. If a user leaves Chainsaw open, the system admin cant delete, move or rename log file.
As mentioned in previous points, log files typically get rolled over and a Chainsaw client FileReceiver is configured by specifying the complete log file URL. If you have a client with a FileReceiver running to monitor events and the log file gets rolled over, you will stop receiving events. You wont realize whats happened unless your confirm log file gets rolled over. You will have to shut down this receiver, create a new one pointing to new log file to receive events. Most often, log file rolling is done based on size. If all service nodes log to the same file, it likely that log file will reach max size and get rolled over quite frequently. In that case, the chainsaw client users will have to frequently reset/restart receivers.
The final issue is the Performance Factor. If you have multiple applications logging to same shared file and the shared file being tailed and monitored by Chainsaw clients, then there will be overheads related to
Accessing the file located on shared drive or file server.
File IO Synchronization and Blocking, as multiple service nodes update the file by appending logs and multiple clients read by tailing the file.
Due to the above issues/limitations, we do not recommend this approach. The first six issues do not exist in the alternative approach (involving the use of a central log server) that is discussed in the next section.
Regarding the Performance Factor mentioned in last point, it is very hard to measure it since it will depend on Hardware/OS setup, number of service nodes and chainsaw clients and volume of concurrent logging. For instance, using a NAS drive (or a file server) will be more efficient than a mapped network drive in Windows. However, we believe that for most applications its best to avoid concurrent file I/O and that it is more efficient for service nodes to publish log events to a central server over a socket.
Hence, the desired technical solution is the approach discussed in the following section.
Approach 2 (Using a central Log Server)
This approach involves setting up the logging system such that multiple service nodes will publish events to a central log server. The central log server will aggregate the logs and forward log events to any connected chainsaw clients. Multiple Chainsaw clients (and hence several users) will be able to connect and subscribe for log events. The idea is illustrated in following diagram:
The Chainsaw application clients connected to the log server will receive log events and will render the events in the Chainsaw UI. The user will see a layout displaying the log events similar to the one shown below:
We can achieve the above illustrated setup by utilizing the following Log4j and Chainsaw features:
Log4j has a class called SimpleSocketServer. This class is capable of receiving log events over a socket and processing them through any of the supported Log4j Appenders.
Log4j provides a SocketAppender. This appender opens a socket connection to a log server and then publishes log events over the connection.
Log4j also provides a SocketHubAppender. This creates a server socket to accept client connections and then publishes events to clients over accepted client connections.
Chainsaw UI application graphically renders log events. It has various receivers for sourcing those events.
Chainsaw provides a SocketHubReceiver. It opens a connection to server socket and reads log events over the connection.
Application/Service Node Log4j Setup
log4j.rootLogger=INFO, A1, Socket
# A1 is set to be a RollingFileAppender.
log4j.appender.A1=org.apache.log4j.RollingFileAppender
log4j.appender.A1.File=log/trade-server.log
log4j.appender.A1.layout=org.apache.log4j.PatternLayout
log4j.appender.A1.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n
# The Socket Appender
log4j.appender.Socket=org.apache.log4j.net.SocketAppender
log4j.appender.Socket.Threshold=INFO
log4j.appender.Socket.RemoteHost=<Log Server Host>
log4j.appender.Socket.Port=<Log Server Port>
log4j.appender.Socket.ReconnectionDelay=5000
log4j.appender.Socket.LocationInfo=true
log4j.appender.Socket.Application=<Service Node Identifier>
The Log4j setup described above creates a Socket Appender that will publish log events with level 'INFO' and to a log server running at a specified host and port. This represents the location of the single central log server and these values should be the same for all the service nodes. In addition, there is the Application property used to hold the service node identifier. Each service node should have a unique value for this property, identifying the node.
Log Server Setup
On your <Log Server Host> machine, we need to start the log server that will listen at port <Log Server Port> in order to listen to log events published by the various service nodes. This can be done using the following command
java org.apache.log4j.net.SimpleSocketServer <Log Server Port> config/logserver.txt
The second argument in the above command specifies the log4j config file that defines how the log events received by the server will be processed. The contents of the logserver.txt file should be as follows:
# Set root logger level to INFO
log4j.rootLogger=INFO, F1, SocketHub
# F1 is set to be a RollingFileAppender (Optional used only for producing an aggregated log file)
log4j.appender.F1=org.apache.log4j.RollingFileAppender
log4j.appender.F1.File=log/log-server.log
log4j.appender.F1.layout=org.apache.log4j.PatternLayout
log4j.appender.F1.layout.ConversionPattern=%-4r [%t] %X{application} %-5p %c %x - %m%n
# The Socket Hub Appender (Used for servicing Chainsaw clients)
log4j.appender.SocketHub=org.apache.log4j.net.SocketHubAppender
log4j.appender.SocketHub.Threshold=INFO
log4j.appender.SocketHub.Port=<Chainsaw Port>
The above log4j file has two appenders for processing log events:
The first appender, F1 is a file appender that writes out all the log events received by the log server to a file. Effectively, it creates an aggregated log containing all the log messages published by the various service nodes. For each log message, the pattern prints the source service node id.
The second appender is a SocketHubAppender. It creates a server socket listening at <Chainsaw Port> to service Chainsaw clients. Chainsaw clients can connect to the log server and the log server will forward all events it receives to the connected clients.
Chainsaw Setup
Chainsaw can be utilized to view system log events as follows:
Start chainsaw UI by executing the .bat file.
Create receivers manually as follows.
Using Receivers panel, create a SocketHubReceiver with following option:
name = LogServerReceiver
host = <Log Server Host>
port = <Chainsaw Port>
Chainsaw client will now start receiving events forwarded by the log server. The events will be displayed in a new event log tab. Multiple Chainsaw clients can connect to the log server and receive events.
As per Chainsaw documentation, the application property is used as an identifier for the event log tab. So intuitively in our setup, events from different service nodes should go to different tabs since they will have different values for the application property. This is highly desirable since it informs the user the source of log event.
However, when we tested this setup we noticed that Chainsaw was not routing events to different tabs. All the events were being sent to the same tab. We have not been able to identify why this is happening. As always we are open to suggestions and explanations.
Conclusion
This post discussed how Log4J and Chainsaw can be used to to create a centralized aggregated log file with messages from all service nodes. The log events can be viewed in a graphical UI using Chainsaw.
There are two ways to achieve. Both of them were discussed. The recommended approach is to use a central log server.
However, even with the recommended solution we do have one drawback. Chainsaw routes events from all the service nodes to the same tab. This behavior is unexpected since events from different service nodes have different values for the application property. Why its happening needs to be further investigated.
When unit testing code with JMS you'll typically want to avoid the overhead of running separate proceses; plus you'll want to increase startup time as fast as possible as you tend to run unit tests often and want immediate feedback.
The ability to run code with JMS without the need to start a separate JMS server is also very useful, when writing demos. You typically want potential useres to be able to quickly setup and execute demos. If the demo requires the installation of an additional messaging software and a separate process to be run, then its setup can be a big hassle for users.
This issue can be overcome by using Apache ActiveMQ with an embedded broker. There are several ways to do this
The procedure indicated in the above link involves creating a message broker inside your program. This requires a code modification. That is not good approach. Ideally you would want to define all JMS connection settings in a config file. If you are using Spring you would want to define all the JMS resources inside Spring context.
The above link mentions two ways for creating embedded broker using Spring
Using BrokerFactoryBean and defining the JMS settings in separate XML configuration file that is referenced by the BrokerFactoryBean
Embedding the settings inside regular Spring xml file without requiring the factory bean. This approach however will need an additional jar dependency xbean-spring 2.6
If your settings are fairly simple, then there is another way to configure the embedded broker using Spring. In this approach, we are essentially simulating the same API calls that were mentioned in the very first link in this blog. This can be done by the following Spring XML configuration code
Developers can use any one of the approaches using Spring in order to define the JMS settings and create an embedded broker. No code changes need to be made and code can be tested for any release.
When unit testing code with JMS you'll typically want to avoid the overhead of running separate proceses; plus you'll want to increase startup time as fast as possible as you tend to run unit tests often and want immediate feedback.
The ability to run code with JMS without the need to start a separate JMS server is also very useful, when writing demos. You typically want potential useres to be able to quickly setup and execute demos. If the demo requires the installation of an additional messaging software and a separate process to be run, then its setup can be a big hassle for users.
This issue can be overcome by using Apache ActiveMQ with an embedded broker. There are several ways to do this
The procedure indicated in the above link involves creating a message broker inside your program. This requires a code modification. That is not good approach. Ideally you would want to define all JMS connection settings in a config file. If you are using Spring you would want to define all the JMS resources inside Spring context.
The above link mentions two ways for creating embedded broker using Spring
Using BrokerFactoryBean and defining the JMS settings in separate XML configuration file that is referenced by the BrokerFactoryBean
Embedding the settings inside regular Spring xml file without requiring the factory bean. This approach however will need an additional jar dependency xbean-spring 2.6
If your settings are fairly simple, then there is another way to configure the embedded broker using Spring. In this approach, we are essentially simulating the same API calls that were mentioned in the very first link in this blog. This can be done by the following Spring XML configuration code
Developers can use any one of the approaches using Spring in order to define the JMS settings and create an embedded broker. No code changes need to be made and code can be tested for any release.
The Drools Rule Language allows you to access nested properties directly while declaring rule conditions in the rule file (drl file). However, there are few things about nested properties and domain object model that Drools developer should understand.
In Drools, accessing nested properties in rules has some performance overhead. Thats because Drools internally uses indexes for its pattern matching. These indexes are built on direct properties not the nested ones.
Also when you modify a nested property, Drools will not know if that the object changed. It tracks changes for optimizations.
So should you be accessing nested properties ?
And is there a way to avoid accessing nested properties altogether ?
There is no simple answer to these questions. The performance overhead really depends upon the complexity of rules and the domain object model being used. It may be negligible or it may be significant. Only stress testing your system can confirm whether performance is satisfactory or not.
What is important is that developers and users conceptually understand what Drools is best designed for.
Drools has been designed to work best when there are large number of objects with relationship between them expressed via references rather than have a few composite objects. Let's understand this through an example.
Lets say we have a Trade object that can have multiple allocations. For the sake of illustration, lets assume that there is a Trade object instance that has 5 allocations
There are two approaches towards modeling this relationship. I will describe them first and then compare and discuss their suitability for use with Drools
Here Trade is a composite object containing list of Allocations. The size of that list will be 5. We add the Trade object to the working memory and run rules on it.
The rules will have to access nested properties in order to get Allocation details
The Trade class does not have an Allocation List. On the other hand, the Allocation class to have a reference of parent trade (through the parent property).
Please note we could also have designed the Allocation class to hold the parent trade id, as follows
In either case, the effect is the same. There will be 1 Trade object and 5 Allocation objects (referencing the trade). A total of 6 objects will be added to the working memory and rules will be evaluated against them.
Comparison of 2 Approaches
Drools documentation and examples suggest that it is designed to work best if Approach 2 is used for object modeling. The pattern matching algorithm works fast using internal property indexes and rules written in crisp and concise (avoiding use of Java code for casting).
That doesn't mean Approach 1 is not supported. Rules can still be written even if Approach 1 is used for object modeling. You pay in efficiency and the cleanliness with which rules can be written in drl file.
In our real world object design, we tend to favor containment relationship (similar to Approach 1). In many business systems, the domain model has deep containment hierarchy and many levels of nesting.
This is certainly true if one were to use FpMl along with JAXB. The resulting object schema has a deep containment hierarchy and using Drools with it can potentially result in a non-negligible performance overhead. Also business rules in rules file don't look clean since it becomes necessary embed Java in order to do object casting.