fleXiParse Manual

Sebastian Marsching


Table of Contents

1. Introduction
2. Parser
2.1. Parser Types
2.2. Using a Parser
2.3. Excluding Fragments of the XML Tree from the Parsing Process
3. Handler XML Configuration Format
3.1. General Syntax
3.2. Run order
3.3. Handler Dependencies
4. Object Tree
4.1. Concept
4.2. Attaching Objects
4.3. Retrieving Objects
4.4. Initial Objects
4.5. Excluding Fragments of the XML Tree from the Parsing Process
5. XML to Object Mapping
5.1. Concept
5.2. General Configuration Syntax
5.3. Element Mappings
5.4. Root Element mappings
5.5. Nested Element Mappings
5.6. Attribute mappings
5.7. Text mappings
5.8. Using Collections
5.9. Using Maps

Chapter 1. Introduction

fleXiParse is a framework for writing XML parsers. It is based on Java's DOM (Document Object Model) and XPath implementation. It uses a visitor pattern, where the user can define NodeHandlers whose handleNode is called for each DOM node matching one of the XPathExpressions provided by the NodeHandler's configuration.

Handlers can store objects in a tree managed by the parser framework. The framework creates a node in the object tree for each element in the source document. Different handlers can communicate with each other by storing object in the tree and retrieving them from it. The final state of the object tree represents the result of the parsing process and is therefore returned to the code calling the parser.

Chapter 2. Parser

2.1. Parser Types

All parsers implement the com.marsching.flexiparse.parser.Parser interface. This interface contains the methods needed to parse a document and to add NodeHandlers to the parser. SimpleParser is a simple implementation that is used as a base for more complex implementations.

XMLConfiguredParser extends the SimpleParser class with methods for adding NodeHandlers that are configured in an XML file. ClasspathConfiguredParser reads these XML files from a specified location within in the class path and is therefore well suited for building an extensible parser. New handlers can be simply added by placing a JAR in the classpath that contains the handlers and the XML configuration file at the right location.

2.2. Using a Parser

In the following example we will show how to instantiate and use a ClasspathConfiguredParser. The process is basically the same for other parser implementations. However, handlers have to be added explicitly when using another implementation.

Parser parser = 
    new ClasspathConfiguredParser("com/example/myhandlers.xml");
ObjectTreeElement result = parser.parse(new File("test.xml"));

This code first creates an instance of ClasspathConfiguredParser using the configuration path com/example/myhandlers.xml. This means that the handler configuration is expected in a file called myhandlers.xml that is in the com.example package of the class path. If there is more than one file (e.g. same file name in same package in different JARs) all of the files found will be used. Thus the set of handlers can be extended by modules just by placing the configuration file in the right package.

Then the parsers's parse method is called to parse a file called test.xml and the root node of the resulting object tree is assigned to the result variable which can be used to get objects from the tree that have been attached by the handlers during the parsing process.

2.3. Excluding Fragments of the XML Tree from the Parsing Process

See Section 4.5, “Excluding Fragments of the XML Tree from the Parsing Process”.

Chapter 3. Handler XML Configuration Format

3.1. General Syntax

The XML file containing the handler configuration has a very simple format:

<configuration
  xmlns="http://www.marsching.com/2008/flexiparse/configurationNS"
  xmlns:x="http://www.example.com/exampleNS"
  xmlns:xsi="http://http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="
    http://www.marsching.com/2008/flexiparse/configurationNS 
    http://www.marsching.com/2008/flexiparse/flexiparse-configuration.xsd
  "
>
  <handler class="com.example.MyHandler">
    <match>/x:addressbook/x:person/x:address</match>
  </handler>
</configuration>

This configuration defines a handler using the class com.example.MyHandler that will be invoked for each element matched by the XPath expression. As you can see, the the XPath expression uses namespace prefixes that are defined in the context of the match tag.

3.2. Run order

When the parser walks through the DOM tree, there are two different points for each node where the corresponding handlers can be called: Either before the child nodes of the node are processed or after the child nodes have been processed. This might be relevant if there are handlers attached to one of the child nodes as this handler might either expect some data that has been attached the object tree by the handler for the parent node or might itself attach some data to the object tree that is then used by one of the parent node's handlers. Therefore you can define a run level for each handler. Valid run levels are start, end and both (start is the default). Handlers with run level start will be called before the child nodes are processed. Handlers with run level end are called after the child nodes have been processed. Handlers with run level both are called before as well as after the child nodes have been processed.

<configuration
  xmlns="http://www.marsching.com/2008/flexiparse/configurationNS"
  xmlns:x="http://www.example.com/exampleNS"
  xmlns:xsi="http://http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="
    http://www.marsching.com/2008/flexiparse/configurationNS 
    http://www.marsching.com/2008/flexiparse/flexiparse-configuration.xsd
  "
>
  <handler class="com.example.PersonHandler" run-level="end">
    <match>/x:addressbook/x:person</match>
  </handler>
  
  <handler class="com.example.AddressHandler">
    <match>/x:addressbook/x:person/x:address</match>
  </handler>
</configuration>

In this example the handler com.example.PersonHandler will be called after the child nodes have been processed and might therefore collect objects from the object tree that have been created by the com.example.AddressHandler.

3.3. Handler Dependencies

Each handler may specify an id using the id attribute. If no explicit id is specified, the handle's class name is implicitly used as the id. The id has to be unique, that is there must not be more than one handler using the same id.

Other handlers may specify run order dependencies related to other handlers using the preceding-handler and following-handler tags. These dependencies are only considered if both handlers are acting on the same node. Otherwise a dependency will be silently ignored.

<configuration
  xmlns="http://www.marsching.com/2008/flexiparse/configurationNS"
  xmlns:x="http://www.example.com/exampleNS"
  xmlns:xsi="http://http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="
    http://www.marsching.com/2008/flexiparse/configurationNS 
    http://www.marsching.com/2008/flexiparse/flexiparse-configuration.xsd
  "
>
  <handler class="com.example.PersonHandler" run-level="end">
    <match>/x:addressbook/x:person</match>
  </handler>
  
  <handler class="com.example.AddressHandler" id="com.example.SomeOtherId">
    <match>/x:addressbook/x:person/x:address</match>
  </handler>
  
  <handler class="com.example.VerificationHandler">
    <preceding-handler>com.example.PersonHandler</preceding-handler>
    <preceding-handler>com.example.SomeOtherId</preceding-handler>
    <match>/x:addressbook/x:person/x:address</match>
    <match>/x:addressbook/x:person</match>
  </handler>
</configuration>

In this example the VerificationHandler is called after the PersonHandler or AddressHandler (depending on the node being processed) have been called. Thus the verification handler may use objects from the object tree that have been placed there by the other handler for the same node. Placing <following-handler>com.example.Verificationhandler</following-handler> in the configurations of the PersonHandler and AddressHandler instead of the preceding-handler declarations in the VerificationHandler configuration would have the same effect. In fact even both could be present at the same time, because these constraints do not conflict. If there are conflicts (circular dependency graph), an exception is thrown by the parser.

Chapter 4. Object Tree

4.1. Concept

The object tree is used to store the result of the parsing process and to share data between different handlers (or several invocations of the same handler). The object tree has one root node for the document being parsed and one child node for each XML element in the document (including the root element). The tree structure of these nodes reflects the tree structure of the parsed document.

Each time a handler is invoked, a reference to an object tree element is passed. This object tree element corresponds to the XML element (or the parent XML element for non-element nodes) being processed at this time.

The handler can use this object tree element to attach or retrieve Java objects created in the context of the current node or to navigate through the object tree and retrieve Java attached to other object tree elements.

4.2. Attaching Objects

A handler can attach an arbitrary Java object to an object tree element by invoking the addObject(Object object) method. By convention, a handler should usually only attach objects to the object tree element corresponding to the current context XML element, although there is no technical restriction requiring or enforcing this behavior.

4.3. Retrieving Objects

The method getObjects() returns all objects attached to the object tree element regardless of their type. The objects are returned in the order they have been attached to the object tree element.

The method getObjectsOfType(Class type) returns all objects attached to the object tree element that are sub-types of the type specified by the parameter type. The objects are returned in the order they have been attached to the object tree element.

The method getObjectsOfTypeFromSubTree(Class type) returns objects of the given type (or a sub-type) that are attached to the object tree element or one its descendant elements. The order is parent elements before child elements, child elements on the same level in the order of the corresponding XML elements in the source document and objects attached to the same element in the order they have been added to the element.

The method getObjectsOfTypeFromTopTree(Class type) returns objects of the given type (or a sub-type) that are attached to the object tree element or one its ancestor elements. The order is child element before parent element and objects attached to the same element in the order they have been added to the element.

4.4. Initial Objects

When invoking the parse method of the Parser interface, arbitrary objects can be passed as parameters. These objects are attached to the root element of the object tree before the parsing process begins. In this way parameters can be passed from the code invoking the parser to the parsing handlers.

4.5. Excluding Fragments of the XML Tree from the Parsing Process

If an instance of com.marsching.flexiparse.objecttree.DisableParsingFlag is attached to an XML element's object tree element before the child nodes of the XML element have been processed, these child nodes are excluded from the parsing process. This mechanism can be used in order to exclude certain parts of the XML tree from the parsing process based on runtime parameters.

Chapter 5. XML to Object Mapping

5.1. Concept

While fleXiParse's handler concept provides a maximum of flexibility and extensibility, writing a hander for every tag can be a time-consuming task. Therefore fleXiParse provides a facility for the automatic mapping of XML data to objects. This facility can be used by using configuration tags from the http://www.marsching.com/2008/flexiparse/xml2objectNS namespace.

5.2. General Configuration Syntax

<configuration
  xmlns="http://www.marsching.com/2008/flexiparse/configurationNS"
  xmlns:t="http://www.example.com/exampleNS"
  xmlns:xo="http://www.marsching.com/2008/flexiparse/xml2objectNS"
  xmlns:xsi="http://http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="
    http://www.marsching.com/2008/flexiparse/configurationNS 
    http://www.marsching.com/2008/flexiparse/flexiparse-configuration.xsd
    http://www.marsching.com/2008/flexiparse/xml2objectNS
    http://www.marsching.com/2008/flexiparse/flexiparse-xml2object.xsd
  "
>
  <xo:element name="t:test" target-type="com.example.ExampleObjectA">
    <xo:attribute name="a" target-attribute="a"/>
    <xo:attribute name="b" target-attribute="b" target-type="com.example.ExampleObjectB" occurrence="1" />
  </xo:element>
</configuration>

XML to object mapping configurations use the same configuration files but a different namespace than handlers. There are three kinds of mappings: Mappings for elements, attributes, or text nodes.

5.3. Element Mappings

The mapping for an XML element is defined using the element tag. The name attribute takes a local or qualified name. If a local name is given, the mapping matches elements with this name using no namespace. If the qualified form is used, the mapping matches elements with this name and the namespace bound to the prefix in the context of the configuration element.

The target-type attribute specifies the name of the Java type the XML element is mapped to. This has to be either a Java primitive (e.g. int, boolean) or fully qualified type name (e.g. java.lang.String - the default). The type has to have a default constructor.

The target-attribute attribute specifies the name of the attribute in the parent Java object the object created for this mapping should be saved in. This attribute is only used if the corresponding tag is encountered within another tag handled by this XML to Object facility. There has to be a setter method (adhering the Java Beans convention) for the specified attribute in the parent object. If the special name !mapentry is used, the parent object has to implement java.util.Map and the object has to implement java.util.Map.Entry. In this case the object will be added to the parent map. If the special name !collectionentry is used, the parent object has to implement java.util.Collection and the object will be added to the parent collection. If the special name !parent is used, the parent object will be replaced by this object.

If the deep-search attribute is set to true, the XML to Object mapper will not restrict the search to direct child elements, but will use all descendant elements when looking for elements matching the child mapping configurations.

5.4. Root Element mappings

Root element mappings have no parent mappings.This kind of mapping is used for every element which matches the specified namespace and local name. When processing the child nodes of a mapped element, the mappings nested inside the parent definition will be used first. However, if no matching mapping is found, the root mappings with the target-attribute set will be used, too.

The occurrence attribute can be set to either 0..1, allowing a maximum of one instances per context, or 0..n allowing an unlimited number of instances per context.

5.5. Nested Element Mappings

Nested element mappings are children of other nested or root element mappings. The target-attribute attribute is mandatory for this kind of mappings.

The occurrence attribute may be set to either 0..1 (default), 0..n, 1 or 1..n.

5.6. Attribute mappings

Attribute mappings basically support the settings described in Section 5.3, “Element Mappings”. However, the target Java type has to have a constructor taking a single parameter of type java.lang.String instead of a default constructor. The deep-search attribute does not exist for attribute mappings.

The occurrence attribute can be set to either 0..1 (default) or 1.

5.7. Text mappings

Text mappings have the same options as attribute mappings, however there is no name attribute and the occurrence attribute may be set to either 0..1 (default), 0..n, 1 or 1..n.

If the append attribute is set to true, all text nodes in a given context are concatenated creating one single string.

If the ignore-white-space attribute is set to true, text nodes that contain white space only are ignored.

5.8. Using Collections

Objects can be added to collections using the special target attribute !collectionentry.

<xo:element name="addressbook" target-type="java.util.HashSet">
  <xo:element name="person" target-attribute="!collectionentry" target-type="com.example.Person">
    ...
  </xo:element>
</xo:element>

5.9. Using Maps

Map entries can be added to a map using the special target attribute !mapentry. The entries have to implement java.util.Map.Entry. If this interface is specified as the target type of a mapping, fleXiParse will use an internal implementation which has a key and value attribute.

<xo:element name="parameters" target-type="java.util.HashMap">
  <xo:element name="parameter" target-attribute="!mapentry" target-type="java.util.Map.Entry">
    <xo:attribute name="name" target-attribute="key"/>
    <xo:text target-attribute="value"/>
  </xo:element>
</xo:element>