Saxon is an integrated XML processor that offers tree based XML representation, schema validation, XPath and XQuery evaluation and XSLT processing. Saxon provides the several APIs to aid in programming and provide maximum flexibility to developers. The various options to the developer are listed below:
For XML parsing, the developer can choose from the following:
- Working directly with Configuration object to parse and build documents
- Using Saxon's new s9api interface
For XSLT processing, the developer can choose from the following:
- Using JAXP Transformation API
- Using Saxon's new s9api interface
For schema validation, the developer can choose from the following:
- Using JAXP Schema Processing
- Using Saxon's new s9api interface
For XQuery evaluation, the developer can choose from the following:
- The legacy XQuery API (no longer recommended)
- Invoking XQuery using the XQJ API
- Using Saxon's new s9api interface
For XPath evaluation, the developer can choose from the following:
- JAXP 1.3 API (extended to support XPath 2.0)
- Using Saxon's new s9api interface
The primary motivation for using standardized interfaces like JAXP and XQJ is that it makes your code portable.
However, using Saxon's new s9api interface has several advantages. The s9api interface is designed to provide a uniform, integrated approach to XML processing capabilities supported by Saxon, taking advantage of the type safety offered by generics in Java 5. Unlike JAXP, it includes support for XSLT 2.0 capabilities. The API is simple to use and is intuitive.
The following code snippets illustrate how s9api interface can be used for XML processing.
The example below evaluates an XQuery that takes a parameter and returns the result node represented by XdmValue.
Processor proc = new Processor(false); XQueryCompiler comp = proc.newXQueryCompiler(); XQueryExecutable exp = comp.compile("declare variable $n external; for $i in 1 to $n return $i*$i"); XQueryEvaluator qe = exp.load(); qe.setExternalVariable(new QName("n"), new XdmAtomicValue(10)); XdmValue result = qe.evaluate(); ...
The example below evaluates an XQuery against the data contained in file books.xml. The result is printed to the console.
Processor proc = new Processor(false); XQueryCompiler comp = proc.newXQueryCompiler(); XQueryExecutable exp = comp.compile("<copy>{//ITEM[1]}</copy>"); XQueryEvaluator qe = exp.load(); SAXSource source = new SAXSource(new FileInputStream("data/books.xml")); qe.setSource(source); Serializer out = new Serializer(); out.setOutputStream(System.out); qe.run(new Destination(out)) ...
The example below shows an XML file books.xml being parsed and loaded into memory. After that an XPath is evaluated and the result sequence is iterated in order to print data to console. This example also illustrates API for accessing Saxon's in- memory XML tree.
public static void main(String[] args) { Processor proc = new Processor(false); DocumentBuilder builder = proc.newDocumentBuilder(); builder.setLineNumbering(true); builder.setWhitespaceStrippingPolicy(WhitespaceStrippingPolicy.ALL); XdmNode booksDoc = builder.build(new File("data/books.xml")); XPathCompiler xpath = proc.newXPathCompiler(); xpath.declareNamespace("saxon", "http://saxon.sf.net/"); // find all the ITEM elements, and for each one display the TITLE child XPathSelector selector = xpath.compile("//ITEM").load(); selector.setContextItem(booksDoc); QName titleName = new QName("TITLE"); for (XdmItem item: selector) { XdmNode title = getChild((XdmNode)item, titleName); System.out.println(title.getNodeName() + "(" + title.getLineNumber() + "): " + title.getStringValue()); } ... } // Helper method to get the first child of an element having a given name. // If there is no child with the given name it returns null private static XdmNode getChild(XdmNode parent, QName childName) { XdmSequenceIterator iter = parent.axisIterator(Axis.CHILD, childName); if (iter.hasNext()) return (XdmNode)iter.next(); else return null; }
As illustrated in the above examples, the starting point while using the s9api interface is the creation of the Processor object. The Processor class encapsulates the configuration environment for XML processing. Another thing that should be noted is that in Example 3 is that same instance of Processor class is used for parsing the XML document and for XPath evaluation.
Saxon actually requires that the same Processor instance be used for parsing an XML document and for any subsequent processing like schema validation, XPath or XQuery evaluation or XSLT processing. If you try using a different Processor instance for any subsequent processing of XML document, Saxon will throw an exception. The Processor class is thread safe allowing a single instance to be shared across multiple threads.
For most software systems, a single instance of the Processor class representing XML environment setting is all that is needed. The instance can be created in the beginning and then reused. There are several ways in which this can be achieved.
- If you are using Spring, you can define a Processor bean that can be used elsewhere.
- On application startup, you could create a Processor instance and add it to a cache. Subsequently the application should only access the cache to get an instance of the Processor.
To conclude the ease, elegance and uniform approach to accessing Saxon's XML processing capabilities makes Saxon's s9api interface very appealing. Developers should definitely consider its use.
Using XML in any software system requires you to parse and build documents. We have frequently used parsers like Xerces (default XML parser in current JDK distribution) or Crimson, in order to load XML documents into memory.
When we use any of the above mentioned parsers to load documents into memory, the XML content is represented in memory by a Document Object Model (DOM). DOM is cross-platform and language-independent convention, standardized by the World Wide Web Consortium (W3C) for representing and interacting with objects in HTML, XHTML and XML documents.
Although standardization allows for interoperability, the W3C DOM and its API has been found to be unnecessarily complicated and difficult to use by Java developers and developers of other high level Object Oriented languages. This has led to the development of several competing document object models.
Java developers can choose from the following alternatives to W3C DOM
JDOM was the first major alternative to W3C DOM. JDOM is not a wrapper on W3C DOM but an independent Java-based "document object model". It provides a way to represent XML document for easy and efficient reading, manipulation, and writing. It has a straightforward API, is a lightweight and fast, is optimized for the Java programmer and is much easier to use than W3C DOM.
It is important to note that JDOM is NOT an XML parser, like Xerces or Crimson. It is a document object model that uses XML parsers to build documents. JDOM's SAXBuilder class uses the SAX events generated by an XML parser (like Xerces) to build a JDOM tree. The default XML parser used by JDOM is the JAXP-selected parser (Xerces is the default JAXP-selected parser in current JDK version). However JDOM can use nearly any parser.
Both dom4j and XOM were inspired by JDOM. Like JDOM, they are independent document object models that use parsers like Xerces to build their respective xml tree structures. Although dom4j and XOM share some features with JDOM, their API has evolved along their own an independent trajectory.
dom4j was created by developers who were dissatisfied with JDOM and wanted to add to it. They added features like that they thought were good in W3C DOM. In particular the API is based on Java interfaces, so that plug and play document object model implementations are allowed. Although dom4j implements various DOM interfaces, it is not fully compliant with W3C (expected since it is not W3C DOM). Most developers find the API more involved than JDOM.
XOM was created by developer named Elliotte Rusty Harold, who was dissatisfied with JDOM and wanted to subtract from it. XOM emphasises correctness, simplicity, and performance in its tree-based API for XML representation. In the development of its API, it retained ideas that were good in JDOM and also included ideas believed to be elegant in W3C DOM and XPath.
Saxon was born out of the efforts of Michael Kay. It is an integrated XML processor that offers tree based XML representation, schema validation, XPath and XQuery evaluation and XSLT processing. The earlier versions of Saxon (prior to 7.2) used to have a built-in XML parser (called Aelfred XML Parser) that was used for XML parsing. However, subsequent versions removed the bundled parser and now Saxon (like JDOM, dom4j and XOM) relies on JAXP-selected parser for XML parsing. Saxon represents XML in its own xml tree structure that is very simple to use and is optimized for XPath, XQuery and XSLT.
Java developers should consider the above alternatives to W3C DOM when they want to parse and load documents into memory. Depending upon the software needs, the alternatives can potentially speed up development and result in smaller and easier to maintain code.
A number of times I have had the same conversation, which usually starts with my partner in the conversation stating "With the addition of the 1.5 concurrency package, there is no need to use intrinsic locks anymore and its use should be deprecated". Sadly, they fail to understand the nuances of the two mechanisms. First, a quick look at intrinsic vs. explicit locking.
Intrinsic locking is the use of the synchronized key word. It can take a number of familiar forms, such as
public synchronized void getCount() { ... } public void incrementCount() { synchronized(this) { ... } }
While explicit locking relies on the new Lock interface
private final Lock lock = new ReentrantLock(); public void incrementCount() { lock.lock(); try { ... } finally { lock.unlock(); } }
Observing the code at hand, its quite obvious what one of the drawbacks of using an explicit lock is, the use of a try/finally block. It can be the source of insidious and annoying bugs when a developer forgets to release the lock.
Setting the syntaxual difference aside, the JVM treats the two mechanisms quite differently when it comes to Thread States and Optimizations.
Thread States
When profiling the JVM for performance issues, its quite common to check for liveliness issues by doing a full thread dump:
- ctrl-break on windows in console jvm
- or kill -3 on linux.
The deadlock detection algorithms are quite robust and will explicitly report deadlocks caused by intrinsic locking. However, the same cannot be said for explicit locks. Not even 1.6_12 has the ability to detect a deadlock using explicit locks.
Looking at Thread states, you can see that a deadlock caused by synchronization keyword is reported as BLOCKED, while for Locks, WAITING.
Its not impossible to deduce that there is a deadlock, but we are on our own. So, when looking at liveliness issues, its important to know if the code base is dependent on explicit locks.
Locking Optimizations
Another difference between the two mechanisms is how much control you give to the JVM. If you use an explicit lock, you take away any ability of the JVM to optimize the locking strategies. The following 3 optimizations have been introduced for intrinsic locks:
- Lock Elision - The jvm can opt to remove synchronization when references are analyzed to never be accessed by another thread.
- Adaptive Locking - Spin wait vs. thread suspension
- Lock coarsening - Merge adjacent locks together
A great resource regarding these topics are Java theory and practice: Synchronization optimizations in Mustang by Brian Goetz and Java Concurrency in Practice.
What is MarkLogic
MarkLogic is an industry leading XML Server with capabilities to store, manage, enrich, search, navigate, and dynamically deliver content. The server, designed and optimized for XML content, indexes XML structure and text and provides a wide range of query capabilities, such as
- XML and Full Text Search (Boolean, phrase, range, proximity, wild card, stemming etc)
- Complete W3C standard XQuery implementation
- XQuery Extensions for inserts, updates, profiling, debugging etc.
MarkLogic also features flexible content management. This includes support for
- Transforming and reformatting content for different uses (html, sms, xsl-fo for pdf)
- Assembling content from different sources
- Delivering content over multiple channels, feeds, devices and protocols
Basic MarkLogic Training
We recently had the opportunity to attend a 3-day basic training on MarkLogic. The training required familiarity with XML and XML schema as a prerequisite. However, no prior familiarity with XQuery or Mark Logic was expected, although a person will find such a background very helpful. The training was split evenly between XQuery and Mark Logic specific materials and emphasized a hands on philosophy with equal time for lectures and labs.
- Server Overview and Core Architectural Components
- Installation and Basic Configuration to get started
- XPath Expressions and Syntax
- XQuery Basics
- Data Model and Types
- Functions and Operators
- FLWOR expressions and Modules
- Basic MarkLogic XQuery Extensions
- Loading, deleting and updating documents and document collections
- Exception handling and logging
- Utility functions for nodes and atomic types like dates and strings
- Database and Application Server Configuration
- Setting up databases for security, schema, modules and triggers
- Managing forests and XML Data fragmentation
- Indexing
This included a brief discussion on the following- Text controls for case sensitive, stemmed search, fast phrase/word search, wild card search, position search
- Phrase Controls for selectively omitting or ignoring markup for phrasing purposes
- Range Indexes for ordered values
- Setting up XDBC, HTTP and WebDAV servers
- Marklogic XQuery Extensions for Search (relevance based search heuristics)
- Basic Administrative Tasks
- Discussion on content storage in Marklogic and associated settings
- Merge tasks and policies for optimizing on disk space and speed
- Doing backups and restores, canceling queries
Our first impressions of MarLogic from the training were very positive and we plan on further investigating the product features and how we can utilize them in financial services domain.
NOTE: The training was a beginner's training did not cover everything about MarkLogic. There are numerous advanced topics that were not discussed. These included
- Advanced Search and Indexing
- Fields
- Word Lexicons (database, element, attribute)
- Value lexicons (element or attribute)
- Document-URI and Collection-URI lexicons
- Thesaurus API and Spelling API
- Geo-spatial Indexes and Diacritic sensitive indexes
- Query Parsing, Rewriting and Reverse Queries
- Content Processing Framework (for updates and transformation of content)
- Transactions and Locking
- Triggers
Further Links
For W3C specifications on XPath, please click
For the latest W3C specifications on XQuery, please click W3C XQuery
To see a comprehensive MarkLogic documentation, please go to MarkLogic Developers Site
If you are interested in a MarkLogic training (either basic or any specific advanced topic), please go to MarkLogic Training
Technical Problem/Challenge
Most software systems have a number of services running on multiple hosts, or they may have numerous instances of a service running as part of cluster. This results in multiple log files with each service or service instance writing to its own log file.
Under these circumstances, debugging an error or tracing how a request has been serviced by a system may require the developer to check and monitor several log files. This can make debugging and analyzing request handling a very time consuming task.
Most developers quickly realize that the ability to look and query a central log file that contains log errors from all system services will greatly help in debugging and tracing how requests get handled. A Graphical UI tool that displays log events as they occur can also be very useful.
Technical Solution
So how do we quickly achieve this, centralization of logs and convenient UI for viewing log events ?
Fortunately, most of the hard workis already done if you use the Apache Log4j logging framework and its companion application Chainsaw (we will assume that you have Log4j version 1.2.15 and Chainsaw v2). There are two approaches that can be taken
- All the service nodes should log to a shared file. The shared log file will thus be an aggregated log file and Chainsaw clients tail it.
- The service nodes publish log events to central log server. The central log server aggregates the logs and also forwards them to Chainsaw clients.
The following sections will illustrate and discuss both of these approaches.
Approach 1 (Logging to a shared file)
This approach involves configuring all the service nodes so that they log to a shared log file. Chainsaw can then tail the shared log file to display the events. The idea is illustrated in the following diagram:

Chainsaw clients that are tailing the shared log file will render the log events in Chainsaw UI. The user will see a layout similar to one shown below:

We can be achieve the above illustrated setup by utilizing the following Log4J and Chainsaw features:
- The standard Log4j RollingFileAppender for formatting log events and writing them out to a file.
- Chainsaw UI application graphically renders log events. It has various receivers for sourcing those events.
- Chainsaw provides a FileReceiver. This tails a specified file, reads every new line to a log event using the specified format pattern.
Application/Service Node Log4j Setup
# Set root logger level to DEBUG and its only appender to A1.
log4j.rootLogger=INFO, A1
# A1 is set to be a RollingFileAppender.
log4j.appender.A1=org.apache.log4j.RollingFileAppender
log4j.appender.A1.File=Z:/log/trade-server.log
log4j.appender.A1.layout=org.apache.log4j.PatternLayout
log4j.appender.A1.layout.ConversionPattern=%d{yyyyMMdd HH:mm:ss,SSS} [%t] %p %c %x - %m%n
The above log4j setup creates a Rolling File Appender that logs to a shared file (Z: is a mapped drive representing a shared location). All the service nodes will log to this shared log file.
Chainsaw Setup
Chainsaw can be utilized to view system log events as follows:
- Start chainsaw UI by executing the .bat file.
- Create receivers manually as follows.
- Using Receivers panel, create a FileReceiver with the following options:
- name = LogFileReceiver
- fileURL = file:///Z:/log/trade-server.log
- logFormat = TIMESTAMP [THREAD] LEVEL LOGGER - MESSAGE
- tailing = true
- timestampFormat = yyyyMMdd HH:mm:ss,SSS
Chainsaw client will start tailing the specified shared file. It will read every new line and parse it according to the specified log format to get a log event. The events will be displayed in a new event log tab. Multiple Chainsaw clients can tail the shared file and receive events.
Limitations
There are several issues / limitations with this approach:
- Every service node has to use the SAME layout pattern in log4j configuration. A consistent log file pattern is needed so that Chainsaw can parse lines in the log file to log events.
- All service nodes MUST have the same system time, otherwise the aggregated shared log file will display inconsistent time stamps. Most production system are setup so that system times and timezones are the same on all boxes. So it should not be a big issue, but if it is not the case then you have a problem.
- Typically log files get rolled over. In Log4J we can roll over files on a daily basis or after they reach a specific size based on the configuration. If all service nodes are logging to the same shared file this configuration MUST be the same for all service nodes. Otherwise you may have a situation where one node rolls over to new log file while the other is logging to an old one; so some log events go to one file and the other to another file. One way to ensure this is to have a common log4j config file that can be shared.
- Even if you ensure the above three, your shared log file will have timestamps that are out of order. That's because of messages are being logged by different processes. The log4j file appender inside a JVM process can only guarantee that logs generated by that process are written in the right order. It doesn't know other processes are logging to same file and cant co-ordinate with them. Seeing timestamps out of order can be very confusing. It is possible to distinguish messages from different processes and reduce confusion by including service node identifier in the log message. For example, the following snippet shows logging by 2 server nodes TradeServer1 and TradeServer2
20090415 10:44:33,607 TradeServer1 [main] INFO com.riskfocusinc.process.xpath.TradeProcessorImpl - Loaded Restricted Parties 20090415 10:44:33,609 TradeServer1 [main] INFO com.riskfocusinc.process.xpath.TradeProcessorImpl - Loaded Entity Hierarchy 20090415 10:44:33,607 TradeServer2 [main] INFO com.riskfocusinc.process.xpath.TradeProcessorImpl - Loaded Restricted Parties
- Chainsaw FileReceiver is configured by specifying the complete log file URL. The receiver then opens a read lock on the file. As long as there is a Chainsaw client and FileReceiver running, you wont be able to delete, move or rename the log file because of the lock. This is a big irritant. If a user leaves Chainsaw open, the system admin cant delete, move or rename log file.
- As mentioned in previous points, log files typically get rolled over and a Chainsaw client FileReceiver is configured by specifying the complete log file URL. If you have a client with a FileReceiver running to monitor events and the log file gets rolled over, you will stop receiving events. You wont realize whats happened unless your confirm log file gets rolled over. You will have to shut down this receiver, create a new one pointing to new log file to receive events. Most often, log file rolling is done based on size. If all service nodes log to the same file, it likely that log file will reach max size and get rolled over quite frequently. In that case, the chainsaw client users will have to frequently reset/restart receivers.
- The final issue is the Performance Factor. If you have multiple applications logging to same shared file and the shared file being tailed and monitored by Chainsaw clients, then there will be overheads related to
- Accessing the file located on shared drive or file server.
- File IO Synchronization and Blocking, as multiple service nodes update the file by appending logs and multiple clients read by tailing the file.
Due to the above issues/limitations, we do not recommend this approach. The first six issues do not exist in the alternative approach (involving the use of a central log server) that is discussed in the next section.
Regarding the Performance Factor mentioned in last point, it is very hard to measure it since it will depend on Hardware/OS setup, number of service nodes and chainsaw clients and volume of concurrent logging. For instance, using a NAS drive (or a file server) will be more efficient than a mapped network drive in Windows. However, we believe that for most applications its best to avoid concurrent file I/O and that it is more efficient for service nodes to publish log events to a central server over a socket.
Hence, the desired technical solution is the approach discussed in the following section.
Approach 2 (Using a central Log Server)
This approach involves setting up the logging system such that multiple service nodes will publish events to a central log server. The central log server will aggregate the logs and forward log events to any connected chainsaw clients. Multiple Chainsaw clients (and hence several users) will be able to connect and subscribe for log events. The idea is illustrated in following diagram:

The Chainsaw application clients connected to the log server will receive log events and will render the events in the Chainsaw UI. The user will see a layout displaying the log events similar to the one shown below:

We can achieve the above illustrated setup by utilizing the following Log4j and Chainsaw features:
- Log4j has a class called SimpleSocketServer. This class is capable of receiving log events over a socket and processing them through any of the supported Log4j Appenders.
- Log4j provides a SocketAppender. This appender opens a socket connection to a log server and then publishes log events over the connection.
- Log4j also provides a SocketHubAppender. This creates a server socket to accept client connections and then publishes events to clients over accepted client connections.
- Chainsaw UI application graphically renders log events. It has various receivers for sourcing those events.
- Chainsaw provides a SocketHubReceiver. It opens a connection to server socket and reads log events over the connection.
Application/Service Node Log4j Setup
log4j.rootLogger=INFO, A1, Socket
# A1 is set to be a RollingFileAppender.
log4j.appender.A1=org.apache.log4j.RollingFileAppender
log4j.appender.A1.File=log/trade-server.log
log4j.appender.A1.layout=org.apache.log4j.PatternLayout
log4j.appender.A1.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n
# The Socket Appender
log4j.appender.Socket=org.apache.log4j.net.SocketAppender
log4j.appender.Socket.Threshold=INFO
log4j.appender.Socket.RemoteHost=<Log Server Host>
log4j.appender.Socket.Port=<Log Server Port>
log4j.appender.Socket.ReconnectionDelay=5000
log4j.appender.Socket.LocationInfo=true
log4j.appender.Socket.Application=<Service Node Identifier>
The Log4j setup described above creates a Socket Appender that will publish log events with level 'INFO' and to a log server running at a specified host and port. This represents the location of the single central log server and these values should be the same for all the service nodes. In addition, there is the Application property used to hold the service node identifier. Each service node should have a unique value for this property, identifying the node.
Log Server Setup
On your <Log Server Host> machine, we need to start the log server that will listen at port <Log Server Port> in order to listen to log events published by the various service nodes. This can be done using the following command
java org.apache.log4j.net.SimpleSocketServer <Log Server Port> config/logserver.txt
The second argument in the above command specifies the log4j config file that defines how the log events received by the server will be processed. The contents of the logserver.txt file should be as follows:
# Set root logger level to INFO log4j.rootLogger=INFO, F1, SocketHub # F1 is set to be a RollingFileAppender (Optional used only for producing an aggregated log file) log4j.appender.F1=org.apache.log4j.RollingFileAppender log4j.appender.F1.File=log/log-server.log log4j.appender.F1.layout=org.apache.log4j.PatternLayout log4j.appender.F1.layout.ConversionPattern=%-4r [%t] %X{application} %-5p %c %x - %m%n # The Socket Hub Appender (Used for servicing Chainsaw clients) log4j.appender.SocketHub=org.apache.log4j.net.SocketHubAppender log4j.appender.SocketHub.Threshold=INFO log4j.appender.SocketHub.Port=<Chainsaw Port>
The above log4j file has two appenders for processing log events:
- The first appender, F1 is a file appender that writes out all the log events received by the log server to a file. Effectively, it creates an aggregated log containing all the log messages published by the various service nodes. For each log message, the pattern prints the source service node id.
- The second appender is a SocketHubAppender. It creates a server socket listening at <Chainsaw Port> to service Chainsaw clients. Chainsaw clients can connect to the log server and the log server will forward all events it receives to the connected clients.
Chainsaw Setup
Chainsaw can be utilized to view system log events as follows:
- Start chainsaw UI by executing the .bat file.
- Create receivers manually as follows.
- Using Receivers panel, create a SocketHubReceiver with following option:
- name = LogServerReceiver
- host = <Log Server Host>
- port = <Chainsaw Port>
Chainsaw client will now start receiving events forwarded by the log server. The events will be displayed in a new event log tab. Multiple Chainsaw clients can connect to the log server and receive events.
As per Chainsaw documentation, the application property is used as an identifier for the event log tab. So intuitively in our setup, events from different service nodes should go to different tabs since they will have different values for the application property. This is highly desirable since it informs the user the source of log event.
However, when we tested this setup we noticed that Chainsaw was not routing events to different tabs. All the events were being sent to the same tab. We have not been able to identify why this is happening. As always we are open to suggestions and explanations.
Conclusion
This post discussed how Log4J and Chainsaw can be used to to create a centralized aggregated log file with messages from all service nodes. The log events can be viewed in a graphical UI using Chainsaw.
There are two ways to achieve. Both of them were discussed. The recommended approach is to use a central log server.
However, even with the recommended solution we do have one drawback. Chainsaw routes events from all the service nodes to the same tab. This behavior is unexpected since events from different service nodes have different values for the application property. Why its happening needs to be further investigated.
When unit testing code with JMS you'll typically want to avoid the overhead of running separate proceses; plus you'll want to increase startup time as fast as possible as you tend to run unit tests often and want immediate feedback.
The ability to run code with JMS without the need to start a separate JMS server is also very useful, when writing demos. You typically want potential useres to be able to quickly setup and execute demos. If the demo requires the installation of an additional messaging software and a separate process to be run, then its setup can be a big hassle for users.
This issue can be overcome by using Apache ActiveMQ with an embedded broker. There are several ways to do this
The following link explains API that can be used set up an embedded broker. http://activemq.apache.org/how-to-unit-test-jms-code.html
The procedure indicated in the above link involves creating a message broker inside your program. This requires a code modification. That is not good approach. Ideally you would want to define all JMS connection settings in a config file. If you are using Spring you would want to define all the JMS resources inside Spring context.
The following link explains how you can create an embedded broker using Spring. http://activemq.apache.org/spring-support.html
The above link mentions two ways for creating embedded broker using Spring
- Using BrokerFactoryBean and defining the JMS settings in separate XML configuration file that is referenced by the BrokerFactoryBean
- Embedding the settings inside regular Spring xml file without requiring the factory bean. This approach however will need an additional jar dependency xbean-spring 2.6
If your settings are fairly simple, then there is another way to configure the embedded broker using Spring. In this approach, we are essentially simulating the same API calls that were mentioned in the very first link in this blog. This can be done by the following Spring XML configuration code
<!-- create an embedded ActiveMQ Broker --> <bean id="mqBroker" class="org.springframework.beans.factory.config.MethodInvokingFactoryBean"> <property name="staticMethod" value="org.apache.activemq.broker.BrokerFactory.createBroker"/> <property name="arguments"> <list><value>broker:(tcp://localhost:61616)?persistent=false&useJmx=false</value></list> </property> </bean> <bean id="mqBrokerStart" class="org.springframework.beans.factory.config.MethodInvokingFactoryBean"> <property name="targetClass" value="org.apache.activemq.broker.BrokerService"/> <property name="targetMethod" value="start"/> <property name="targetObject" ref="mqBroker"/> </bean> <bean id="mqConnectionFactory" class="org.apache.activemq.ActiveMQConnectionFactory" depends-on="mqBrokerStart"> <property name="brokerURL" value="tcp://localhost:61616"/> </bean>
Developers can use any one of the approaches using Spring in order to define the JMS settings and create an embedded broker. No code changes need to be made and code can be tested for any release.
The Drools Rule Language allows you to access nested properties directly while declaring rule conditions in the rule file (drl file). However, there are few things about nested properties and domain object model that Drools developer should understand.
In Drools, accessing nested properties in rules has some performance overhead. Thats because Drools internally uses indexes for its pattern matching. These indexes are built on direct properties not the nested ones.
Also when you modify a nested property, Drools will not know if that the object changed. It tracks changes for optimizations.
So should you be accessing nested properties ?
And is there a way to avoid accessing nested properties altogether ?
There is no simple answer to these questions. The performance overhead really depends upon the complexity of rules and the domain object model being used. It may be negligible or it may be significant. Only stress testing your system can confirm whether performance is satisfactory or not.
What is important is that developers and users conceptually understand what Drools is best designed for.
Drools has been designed to work best when there are large number of objects with relationship between them expressed via references rather than have a few composite objects. Let's understand this through an example.
Lets say we have a Trade object that can have multiple allocations. For the sake of illustration, lets assume that there is a Trade object instance that has 5 allocations
There are two approaches towards modeling this relationship. I will describe them first and then compare and discuss their suitability for use with Drools
Approach 1
Trade
{
String tradeId;
List<Allocation> getAllocations();
...
}
Allocation
{
String allocationId;
String getAllocProperty();
...
}
Here Trade is a composite object containing list of Allocations. The size of that list will be 5. We add the Trade object to the working memory and run rules on it.
The rules will have to access nested properties in order to get Allocation details
Approach 2
Trade
{
String tradeId;
...
}
Allocation
{
String allocationId;
Trade parent;
String getAllocProperty();
...
}
The Trade class does not have an Allocation List. On the other hand, the Allocation class to have a reference of parent trade (through the parent property).
Please note we could also have designed the Allocation class to hold the parent trade id, as follows
Allocation
{
String allocationId;
String parentTradeId;
String getAllocProperty();
...
}
In either case, the effect is the same. There will be 1 Trade object and 5 Allocation objects (referencing the trade). A total of 6 objects will be added to the working memory and rules will be evaluated against them.
Comparison of 2 Approaches
Drools documentation and examples suggest that it is designed to work best if Approach 2 is used for object modeling. The pattern matching algorithm works fast using internal property indexes and rules written in crisp and concise (avoiding use of Java code for casting).
That doesn't mean Approach 1 is not supported. Rules can still be written even if Approach 1 is used for object modeling. You pay in efficiency and the cleanliness with which rules can be written in drl file.
In our real world object design, we tend to favor containment relationship (similar to Approach 1). In many business systems, the domain model has deep containment hierarchy and many levels of nesting.
This is certainly true if one were to use FpMl along with JAXB. The resulting object schema has a deep containment hierarchy and using Drools with it can potentially result in a non-negligible performance overhead. Also business rules in rules file don't look clean since it becomes necessary embed Java in order to do object casting.
In Drools, the data against which rules are evaluated are Java Beans instances. The Rule Language syntax allows you to access properties of these beans while declaring rule conditions in the rule file (drl file).
There are few things about the way bean properties are accessed that a Drools developer needs to be aware of.
- Drools uses reflection at runtime to access properties. It uses the bean property getters and setters for accessing the properties.
- Drools reflection is based on the declared return type of the getters. That is, Drools assumes object returned by getter is same as the declared type, even though the actual object returned at runtime may be a subclass of the declared type. If you try accessing the properties of the returned object, only those properties defined by the declared type will be directly available
This has the following important implications
- In case of Generics, the generic type is not available at runtime due to erasure. Drools will see returned type as raw types. Hence, all getters of collections are assumed to return an Object only. And hence only the properties of class Object are available for further reading
- While accessing nested properties, only the properties of the declared type are accessible. Imagine that you have a class that has a method getProduct that returns type Product. Lets further assume that actual object returned at runtime is CDSProduct, a subclass of Product. Although the returned object is an instance of CDSProduct only properties declared in Product are directly available
In order to access the properties of actual returned object, the developer will have to do an explicit cast. This requires embedding Java code either in an eval expression or in combination with a function.
The following drl code snippet illustrates the use of eval expression. The code casts the object returned by get method of an ArrayList property to type Party, in order to access properties specific to Party class.
rule "Restricted Counterparty Test" when $tc : TradeCreated($parties : party) RuleParams($restricted : restrictedParties) eval($restricted.contains(((Party)$parties.get(0)).getPartyName())) then System.out.println("Restricted Counterparty Test"); TradeActions.doRejectRestrictedParty($tc); end
The following drl code snippet illustrates the use of a function. The code uses eval to invoke a function, called isRestricted. The function contains Java code. Please note Java code does not use Generics. The rule syntax does not allow the use of Generics in drl file.
function boolean isRestricted(List restricted, List parties) { boolean result = false; for (Object obj : parties) { String test = ((Party) obj).getPartyName(); if (restricted.contains(test)) result = true; } return result; } rule "Restricted Counterparty TestAll" when $tc : TradeCreated($parties : party) RuleParams($restricted : restrictedParties) eval(isRestricted($restricted, $parties)); then System.out.println("Restricted Counterparty TestAll"); TradeActions.doRejectRestrictedParties($tc); end
Using any of the above approaches, a developer should be able to work around the limitation that is imposed by Drools in the way it access bean properties. However, it does make the rules file more verbose.
While using Maven, its often necessary to unpack dependencies in order to execute a program or to run some test cases. Typically its because some resource or data files needs to be extracted to the file system.
Maven downloads all the dependency jars declared in the POM file to the local repository. These dependency jars can contain classes and resource files.
Maven constructs a project classpath by including all these dependency jars. This classpath is used for doings builds, running tests and other programs. For instance the following command executes a Java program (the main class, TradeServer is searched in the Maven constructed project classpath)
mvn exec:java -Dexec.mainClass=com.riskfocusinc.process.xpath.TradeServer -Druleparams.file=.\target\resources\data\ruleparams.dat
It is very common for a Java program to access resource files. The two most common approaches for accessing resource files are
- Look up resource files in the classpath
- Look up resource files on the file system
From a packaging and deployment point of view, resource files should be included in some dependency jar or perhaps the source jar.
If the program uses the classpath to access resource files, you will be able to execute the program since dependency jars are included in the Maven constructed classpath.
However if the program does a file system lookup for accessing resources, the resources will have to be extracted from the dependency jar to the appropriate location on the file system, before trying to run the program.
This can done by unpacking the dependency jar using the dependency plugin. The following is a snippet from a Maven POM file for extracting resources from a dependency.
<build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-dependency-plugin</artifactId> <executions> <execution> <id>unpack</id> <phase>process-resources</phase> <goals> <goal>unpack</goal> </goals> <configuration> <artifactItems> <artifactItem> <groupId>com.rfi</groupId> <artifactId>drools-demo</artifactId> <version>1.0</version> <type>jar</type> <classifier>sources</classifier> <overWrite>false</overWrite> <outputDirectory>${project.build.directory}/resources</outputDirectory> <includes>rules/**,config/**,data/**</includes> </artifactItem> </artifactItems> </configuration> </execution> </executions> </plugin> </plugins> </build>
The above snippet indicates that resources are being extracted from the source jar as part of the process-resources phase. The unpacking can be invoked by the following command
mvn process-resources
Now that the needed resources have been unpacked, the program can be executed using the exec plugin, as demonstrated in the very first code snippet.
JUnit 4.4 has a number of annotations which can be used to tag methods, which can control the order of method execution.
Check out the documentation
- @Before - causes the method to be run before the Test method
- @BeforeClass - causes the method to be run once before any the tests have run
- @After - causes the method to be run after the Test method
- @AfterClass - causes the method to be run once after all the tests have run
- @Ignore - causes the test to be ignored. Use this annotation instead of commenting out the method.
- @Test - instructs junit to execute the method as a test.
I noticed that when using the annotation @Before to initialize resources in my test case from within IntelliJ, there were no issues. However, when executing the test case from the command line using maven, it fails.
mvn test -Dtest=ApplicationContextManagerUnitTest
My first guess is that its not being executed, as if its not understanding the annotation. My registration of junit in the pom looks good
<dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.4</version> /* Test scope dependency */ <scope>test</scope> </dependency>
Even though the maven-surefire-plugin is automatically linked from the super pom, I have added it to the plugins section.
<build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> </plugin> ... </plugins> ... </build>
The way to determine the actual pom which is being used is by executing the command
>mvn help:effective-pom
My guess was correct, its not executing the @Before. I validated this by adding a fail() to the annotated method.
@Before public void setUp() { /* Test the @Before under Maven */ fail(); System.setProperty(ContextStart.CONTEXT_XML, "ContextStartup.xml"); } @Test public void testApplicationContext() { ConfigurableApplicationContext context = (ConfigurableApplicationContext)ApplicationContextManager.getContext(); assertTrue(context.isActive()); }
The next idea, determine the dependencies which the test case is running with by executing the command
>mvn dependency:resolve
Sadly, doesn't look like any junit other that version 4 is being included. I'll try commenting out the dependency on jmock as it has an odd junit esq package name: jmock-junit4.
There have been a number of posts on this topic on the net. The surefire plugin is actually running the test case, meaning that it has to interpret the annotations itself. I am not certain which version I have been using, but once I upgraded the version explicitly to 2.4, the annotations were processed correctly.
One last thing I should mention. It goes to the fact that the surefire plugin needs to be instructed the pattern of the tests to include and exclude, since it doesnt really care about the @Test annotation. Since I have designated a pattern of *UnitTest.java for all my test cases, I will include the configuration
<includes> <include>**/*UnitTest.java</include> </includes>
I just tried upgrading from Log4J 1.2.14 to 1.2.15 and I was amazed at the dependencies included in the pom.xml file. It now requires jms, jmxtools, jmxri and a few others. While I can certainly see the added value of having JMS and JMX support included in Log4J, one of the attractive features of Log4J was the fact it was light weight.
I have to wonder if its worth the upgrade. I will play around with it in the coming weeks and report back.
While I'm not certain since when Spring has supported the factory-method, its certainly present in version 2.5. Its a handy way to inject the desired class as opposed to the factory managing it into the target class.
Referencing the Singleton JAXBContext article, there is something that was bothering me about the implementation.
To get a (un)marshaller, I would need to do one of the following:
- call the ApplicationContetManager to get the bean DefaultJaxbContextProvider
- inject the DefaultJaxbContextProvider into the target class.
Neither is really appealing as there is no significant motivation to have the managing class referenced from within the executable code. We can use the factory-method to inject the required (un)marshaller.
I declared the instance factory methods in the interface JaxbContextProvider
public Marshaller newMarshaller() throws Exception; public Unmarshaller newUnMarshaller() throws Exception;
With the concrete implementation of DefaultJaxbContextProvider.newMarshaller() appearing as
public Marshaller newMarshaller() throws Exception { return jaxbContext.createMarshaller(); }
follow it up with a registration in the context.xml and we can have bean declarations which will call the factory method each time it is called
<bean id="contextProvider" class="com.riskfocusinc.core.jaxb.DefaultJaxbContextProvider"/> <bean id="marshaller" factory-bean="contextProvider" factory-method="newMarshaller"/>
anytime we want a new marshaller, the bean with id "marshaller" should be wired in.
In hindsight, it seems like a particularly painful experience. Whenever I would need to mock or stub out an interface, I would do it myself. Either using dynamic proxies or an anonymous inner class in the form of
ApplicationContext context = new ApplicationContext() { /* Implement empty methods for all */ }
Painful and worse; ugly. So, I finally broke down and kicked the tires on jmock. Awesome.
For now, I have used the most basic functionality: call a method on the mock and expect a result. Check out the article on Spring Context Lifecyle to understand the full context of what is being tested. For now, its enough to know that to finish the initialization of the context, an ApplicationEvent message will be published and the ApplicationListener will consume it. As in many Observable implementations, the event will contain the source; the ApplicationContext.
The ContextStartupListener will require the ApplicationContext to lookup the beans of type ApplicationInitializable
for(Object obj : context.getBeansOfType(ApplicationInitializable.class).values())
It would be really ugly to stub out the context in the "traditional" way. JMock to the rescue.
Taking advantage of the dynamic proxy and scripting expectations mechanism, I am able to craft the interaction points for the contained object.
mockContext = new JUnit4Mockery(); applicationContext = mockContext.mock(ApplicationContext.class); mockContext.checking(new Expectations() {{ oneOf(applicationContext).getBeansOfType(ApplicationInitializable.class); will(returnValue(map)); }}); ContextRefreshedEvent event = new ContextRefreshedEvent(applicationContext); ContextStartupListener listener = new ContextStartupListener(); listener.onApplicationEvent(event); mockContext.assertIsSatisfied();
Quite powerful and fairly low barrier to entry.
Having looked at NMaven as a build and dependency management tool for .NET projects, i found that it was highly unstable and built against an older version of NMaven. I then stumbled upon Ivy which provided transitive dependency management for java projects that used Ant. Turns out that Ivy can be used with NAnt. So i decided to investigate how i could use Ivy + NAnt, however as i found out it wasn't as straightforward as it is for java projects. To run Ivy from Nant, Ivy needs to be called in standalone mode and that documentation to use Ivy in standalone mode was insufficient, nor did i find any tutorial on this subject on the web. So for this reason, i have written a tutorial on how to start using Ivy + NAnt in a .NET environment.
