Coffea Works

Fast and Lightweight Object Model for XML

A Introduction to AXIOM, the Open Source API for Working with XML

by Eran Chinthaka

Any application that aims to cater to high demand XML processing encounters memory and performance barriers, the main culprit being the memory-intensive object model used inside those applications. AXIOM stemmed from the Axis2 effort, as a new lightweight and efficient object model for representing XML. It has been specifically engineered to be less memory-intensive, using differed building.

Why and What is AXIOM?
The current approaches for XML processing can be categorized into two broad categories:
  • Tree-based approach - in this approach, whis is followed by DOM-like APIs, the whole XML file is loaded into the memory, which tends the size of the object model to be larger than the source XML. Therefore, this method is not appropriate for memory constrained environments (J2ME, for example) ot for systems handling large documents (a Web Services engine, for instance).
  • Event-based approach - this approach processes the source XML in chunks and do not need to build complex memory structures. It can start working with the receipt of the first byte of the source. But developer control over this approach is minimal as event-based APIs feed the content as and when they see the document, regardless of whether the application is ready to receive data or not.

One of the aims of newly introduced Streaming API for XML (StAX API - JSR 173) was to overcome this developer control problem. (Click here for more information about StAX ). But you cannot easily navigate forward and backward using an event-based approach. That is the reason why the tree-based approach is preferred over event-based approach amongst developers, although it is memory inefficient.

AXIS Object Model, a.k.a. OM (AXIOM) is introduced to get the best from both worlds. AXIOM depends on the StAX API to input and output data. The important point here is that, this has differed building support, which none of the existing object models have. That is, this model will not build the document until it is absolutely required by the application. The object model contains features that are already built only and the rest is still kept in the stream. Moreover, AXIOM can provide StAX events from any given point of the received document, whether it has already been built or not. Further, AXIOM has the option to give these StAX events with or without building the memory model for later access. This is called caching in AXIOM.

Even though AXIOM provides "flavors" from both ends, care has been taken not to compromise performance due to this enhancement.

AXIOM API has been developed to be straight forward for Java Programmers.

One of the major requirements of Axis 2, the next generation of world-renowned Web service engine, was to give it the ability to have a low memory foot print, yet very fast object model. AXIOM is the baby born to cater to that requirement.

AXIOM Architecture


Fig. 1: AXIOM Accessing the XML Stream


As can be seen in Figure 1, AXIOM accesses the XML stream through the StAX interface. AXIOM doesn't come with a StAX parser. (See Resources for downloading StAX implementation.) Since most of the data binding tools support SAX based interfaces, AXIOM comes with an adapter to be used between StAX and SAX.

Figure 2 provides a deeper look into AXIOM.


Fig. 2: Object Model Architecture


AXIOM "sees" the XML input stream through the StAX stream reader, which is being wrapped by a builder interface provided. The current implementation has two builders, namely:
  • OM Builder - this will build full XML info-set supported general XML model. But the current implementation lacks the support for Processing Instructions and DTDs.
  • SOAP Builder - this will build SOAP XML specific object model. Object model will contain SOAP-Specific objects like the SOAPEnvelope, SOAPHeader etc.

There is an ongoing effort to develop an MTOMBuilder to enable in-built MTOM support for AXIOM.

Each of the builders provides support for differed building and caching. The user has the option of building the memory model (or not). She can control this via setting the cache to ON or OFF.

The AXIOM API works on top of the builder interface and provides the user with a convenient, yet powerful, API. It will provide the highest flexibility as one can change builders and object model implementations completely independent of one another. AXIOM has a defined set of APIs and you can implement your own memory model based on that.

Currently, AXIS 2 comes with a linked list based implementation of those set of APIs. (There was an effort to build another AXIOM API implementation on a table-based model. It is now on hold.) Therefore one can find a factory to create AXIOM objects, which will help to switch between different implementations of object model. The factory is designed such that if no specific AXIOM implementation is given it will automatically pick the default one from the class path.


Figure 3: OM API and OM Factory

Using AXIOM
AXIOM implementation is in M1 release together with Axis 2. At the time of writing, AXIOM doesn't have Processing Instructions, DTD, and Comments support, but efforts areunderway to include these features.

To make things moreconvenien, AXIOM can be customized to different XML object models. For example, the StAXSOAPModelBuilder will make AXIOM, a SOAP object model builder. The current release supports all information items support to implement a complete SOAP object model.

Getting AXIOM binaries
The easiest way to obtain the AXIOM binary is to download the Axis2 binary distribution. The lib directory contains the axis2-om-M1.jar.

Adventurous users can build the AXIOM from source. The installation guide found in the User's guide explains, building AXIOM from source.

Let's take the following XML as our example:

<ContributorInformation xmlns:ns1="http://www.axiom.org/article/">
<ns1:Contributor>
<Name>Eran Chinthaka</Name>
<Company registration="12345">Arbitrary Software (pvt) Ltd.,</Company>
<Location>
<Country>Sri Lanka</Country>
<City>Ambalangoda</City>
</Location>
</ns1:Contributor>
<ns1:Contributor>
<Name>Ajith Ranabahu</Name>
<Company registration="54321">Rane Soft</Company>
<Location>
<Country>Sri Lanka</Country>
<City>Kuliyapitiya</City>
</Location>
</ns1:Contributor>
<ns1:Contributor>
<Name>Thushari Damayanthi</Name>
<Company registration="11223344">Trans-Soft</Company>
<Location>
<Country>Sri Lanka</Country>
<City>Colombo</City>
</Location>
</ns1:Contributor>
</ContributorInformation>

Getting a Builder
The following snippet demonstrates how to create a builder:

FileReader soapFileReader = new FileReader(fileName);
XMLStreamReader parser = XMLInputFactory.newInstance().createXMLStreamReader(soapFileReader);
StAXOMBuilder builder = new StAXOMBuilder(OMFactory.newInstance(), parser);

First, create an instance of the XMLStreamReader class from the input XML file or stream. Next, create an instance of either StAXSOAPModelBuilder or StAXOMBuilder. Note that we are passing a reference to the OMFactory to the builder. This OMFactory will enable to switch between different AXIOM API implementations, without changing a single line of code. Even though you have created a builder and a reader so far, not a single model of the received XML is created in the memory.

Accessing Element Information in the XML
The following snippet can be used to access element information in the XML file. First the document element has to be taken from the builder. This will be an instance of OMElement. Then you can ask for any of the children:

QName contributorQName = new QName("http://www.axiom.org/article/", "Contributor");
OMElement contributorInformation = builder.getDocumentElement();
OMElement firstContributor = (OMElement)
contributorInformation.getChildWithName(contributorQName);

Note that AXIOM is highly concerned about namespaces, so you have to provide a QName to retrieve a child. getChildWithName(QName) will return the first matching node, whilst getChildrenWithName(QName) will return an iterator.

Iterator contributorInfoIter =
contributorInformation.getChildrenWithName(contributorQName);
while (contributorInfoIter.hasNext()) {
OMElement contributor = (OMElement) contributorInfoIter.next();
System.out.println("Contributor Name = " +
contributor.getFirstElement().getText());
}

The beauty of the parser here is that the iterator returned does not have information, until it is being asked for. The iterator asks the builder to build iff iterator needs information. There are lots of enhancements like this within AXIOM, to make it as light weight as possible, yet not compromising performance.

We have called contributor.getFirstElement() to get the first element. But the method contributor.getFirstChild() may return a node of type text, if there are leading spaces before the children of contributor element. The getText() method returns all texts that are direct children of an element, irrespective of its location. These two features have been introduced to preserve the full info set as required by most of the security implementations.

Let's see how we can play with attributes:

Iterator contributorInfoIter2 =
contributorInformation.getChildrenWithName(contributorQName);

while (contributorInfoIter2.hasNext()) {

OMElement contributor = (OMElement) contributorInfoIter2.next();
OMElement company = (OMElement) contributor.getChildWithName(new
QName("Company"));

OMAttribute registrationAttr = company.getAttributeWithQName(new
QName("registration"));

System.out.println("registrationAttr.getValue() = " +
registrationAttr.getValue());
}

Accessing an attribute is same as accessing a child element. Here, we use company.getAttributeWithQName(QName). Further, AXIOM uses company.getAttributes() to give out an iterator with all the attributes.

Outputting
AXIOM gets input from a StAX API, and outputs the result using StAX API's XMLStreamWriter class. OMElement class has the serialize method built in to it, which can throw StAX events starting from that element, till the end of the same element.

XMLStreamWriter xmlStreamWriter = XMLOutputFactory.newInstance().createXMLStreamWriter(System.out);

contributorInformation.serialize(xmlStreamWriter, false);

Note that serialize method has a Boolean argument to tell the builder whether to build the AXIOM model in the memory or not. As explained earlier, this is called caching in AXIOM. If set to TRUE, AXIOM will build the object structure while printing the XML. If it is set to FALSE, no object model will be built, and AXIOM will only print the XML. This is a major feature in AXIOM and the reason why AXIOM is regarded as having the best of the current XML technologies.

AXIOM can serialize any element, irrespective of whether that element has been fully built or not. AXIOM hides the complexity and shows users as if the complete object model as if it has been built from the start.

Modifying XML
The XML file can be modified like this:

OMNamespace anotherNamespace = OMFactory.newInstance().createOMNamespace("http://anotherURI.org", "Prefix2");
OMElement element = OMFactory.newInstance().createOMElement("SomeElement", anotherNamespace);
OMElement elementTwo = OMFactory.newInstance().createOMElement("SomeElement", "http://anotherURI.org", "Prefix2");
parent.addChild(element);
parent.addChild(elementTwo);

You can add children to any of the elements. You have the option of giving a namespace with that. If the given namespace has already been declared, AXIOM will use that. You could also find a namespace of a given element with findInScopeNamespace(uri, prefix), or using findDeclaredNamespace(uri, prefix).

An Element can be created as given above. Note that the factory has been used everywhere to enable smooth transition from one AXIOM API implementation to the other. The contents of element and elementTwo are same, even though they are declared using two different methods.

Namespace Handling
AXIOM has its own OMNamespace class to handle namespaces. But this does not prevent you from adhering to the conventional method of using QName. You have the option of using the OMNamespace to declare a namespace or to use either createOMElement(localName,namespaceURI,namespacePrefix) or createOMElement(qname, OMElement parent) methods.

Namespaces can be declared in an OMElement using declareNamespace(uri,prefix) or declareNamespace(OMNamespace) method. It can also be created using the createOMNamespace(uri,prefix) factory method for later assignment.

Customizing AXIOM
One of the main advantages of AXIOM is that it can be customized to any schema. For example, if we want AXIOM to behave like a SOAP object model, we can do that easily. The current AXIOM-M1 release contains customization to SOAP, as AXIOM is the core object model in Axis 2.

You can use the StAXSOAPModelBuilder to act as a SOAP specific builder. The AXIOM team has implemented a SOAP specific API on top of AXIOM API, to make the SOAP developer's life easier:

FileReader soapFileReader = new FileReader(fileName);
XMLStreamReader parser =
XMLInputFactory.newInstance().createXMLStreamReader(soapFileReader);
StAXSOAPModelBuilder builder = new StAXSOAPModelBuilder(OMFactory.newInstance(),
parser);


QName soapHeaderQName = new QName("http://schemas.xmlsoap.org/soap/envelope/",
"Header");
SOAPEnvelope envelope = (SOAPEnvelope) builder.getDocumentElement();
SOAPHeader soapHeader = envelope.getHeader();

QName soapBodyQName = new QName("http://schemas.xmlsoap.org/soap/envelope/",
"Body");
SOAPBody soapBody = envelope.getBody();

Iterator headerIterator =
soapHeader.examineMustUnderstandHeaderBlocks("http://www.soap.org/someActor");

while (headerIterator.hasNext()) {
SOAPHeaderBlock soapHeaderBlock = (SOAPHeaderBlock) headerIterator.next();
System.out.println("soapHeaderBlock.getLocalName() = " +
soapHeaderBlock.getLocalName());

}

You can cast the getDocumentElement() to SOAPEnvelope as the StAXSOAPModelBuilder will return an instance of a SOAPEnvelope, which is a subclass of OMElement.

Note the methods that have been provided with the SOAPEnvelope class. You can directly call getHeader() or getBody() without using the more generic methods like getChildWithName(QName). Further, soapHeader.examineMustUnderstandHeaderBlocks(actorURI) will return the must understand headers with a given actor. All the customized methods provide for layering the new API on the base AXIOM API, without compromising performance or memory.

Resources


back

top

print

recommend