RelaxNG - XML Schema language - SysAdmin
Overview
The definition of the structure (schema) of a document and its validation plays an important role for the work with XML documents. censhare 4 uses its own schema concept that is based upon the possibilities and the syntax of the DTD (document type definition). This makes it more difficult to relay XML documents outside from censhare. Beside that, the DTD and its possibilities for structuring documents are no longer state of the art. As of the development of censhare 5 it would have been necessary to implement a new censhare-4-Schema. Therefore, the search for another solution started, too.
RelaxNG (REgular LAnguage for XML Next Generation) and the XMS-Schema W3C, also named XSD (XML Schema Definition), were candidates for the replacement.
The reasons for RelaxNG in a nutshell |
|
XSD shows its strengths at the exchange of data between programs. In opposite, RelaxNG is a very powerful language for the description of documents. In this area RelaxNG is also the better solution because the schemas are much more readable then the ones with XSD. Both standards are proven in reality.
The main difference between XSD and RelaxNG is when it comes to the validation. In the opposite to XSD a RelaxNG based tool can deliver suggestions for elements at all the different places of a document. This was one of the main requirements when it came to the evaluation because it's mostly the user and not a program that is using the XML documents.
This is the reason why censhare counts on the standard RelaxNG beginning from version 5. The censhare-RelaxNG-Schema still confirms with standard if there are own extension. Using that schema customers can validate XML documents, distribute them simply and adapt them according to their needs. As a well accepted standard RelaxNG makes a contribution to the future of censhare.
RelaxNG -The better choice for documents
Introduction
First came the Internet and than the mobile clients: As of that, companies can always reach less their customers less than one channel. Therefore, the goal is using different output channels with the least effort in parallel. Therefore, XML is used for creating documents. They are then prepared for the different channels. The necessary structure of the XML documents defines a XML schema. There are different schema languages available to define this structure. The capabilities of these schema determine the possibilities to describe the structure of the content as precise as possible. For this reason, the selection of the XML schema language is a main point for censhare.
The initial situation
So far, censhare has used the DTD (document type definition) for describing the structure of XML texts within the content editor. As the content editor needs further information for styles, templates and localization the DTD structure has been extended. Unfortunately, the DTD does not allow to extend the standard, for instance, for localization of element names. This is the reason why a syntax has been developed that diverges from standard using supplements. So, the censhare-4-Schema differs from the DTD standard. The validation of the censhare-4-Schemas is only possible within censhare. This makes it more difficult to distribute XML documents that are using the censhare-4-Schema.
The opportunity for reorientation
Version5 of censhare required to implement a new censhare-4-Schema and the belonging validation engine. A validator checks the compliance of an XML document with an XML schema. As of the new implementation the censhare AG decided to search for alternatives for the existing solution based upon the DTD. There were three candidates which were available: DTD, XSD (XML Schema (W3C)) and RelaxNG (Regular Language for XML Next Generation).
The requirements for a new solution
For the use of XML schema, there are two major application areas: the description of the structure of documents on the one side and on the other side the data-exchange formats for programs and databases. The requirements for both applications areas are very different. As of that, a XML schema language for the description of an exchange format is not automatically suited for the description of documents and vice versa. With censhare, the focus is on the description of documents.
A validator for a new XML schema language has to allow an in-depth inspection of the structure of an XML document. This includes tracking an error to the appropriate element in the document. This allows a program to show a user exactly the place in the document that does not comply with the structure. Beside that, it is possible to give the user a list of the allowed XML elements in the actual context when he is working with the document.
It is also important that the XML schema language can be extended to user-defined elements like styles, templates or localization. But external XML editors must also be able to read the XML schema and documents that are using this schema.
DTD as the oldest of three XML schema languages
As with RelaxNG and XSD, the DTD separates the layout, the structure and the text itself. It originates from SGML and is comparatively old. May be, this is a reason why DTD, unlike XSD and RelaxNG, no longer corresponds to the current state of XML schema description languages. Compared to the younger standards XSD and RelaxNG the DTD syntax is not based on a well formed XML syntax that can be validated.
At DTD exists no native support for namespaces. There is only one data type, text string, in the DTD standard. To extend the data types on your own is not possible. Therefore, it is not possible to define the type of an individual element or check that.
XSD for exchanging data
In comparison to RelaxNG XSD is more known. This is because it is used intensively, especially in the field of web programming. XSD is very useful to describe formats for exchanging data between applications. Appropriate tools produce automatically schemas for exchange formats, for example, from databases. Other tools can use code with annotations to create XSD schemas. Annotations itself are not part of the actual code. They are evaluated, for example, by compilers or XSD tools.
Unlike the DTD and RelaxNG XSD has an extensive set of data types. However, XSD data type definitions can be incorporated into RelaxNG. With version 1.1 (adopted in 2012) XSD receives the option of using assertions to formulate conditions and to verify their compliance. A condition can look like this: An element A can either have an attribute B or a child element C, but not both. RelaxNG has this possibility from the beginning. Here you can describe such a condition much easier and also read more easy compared to XSD.
This leads to a further disadvantage of XSD: schema descriptions are comparatively hard to read for humans. For the typical application range of XSD in programming this is not a special requirement. Mainly programs, not people, work with the schemas. For reliable data exchange, it is important to describe the structures as clearly as possible. Ambiguities as expressly allowed in RelaxNG, are not helpful here. But they offer more flexibility.
XSD still has a drawback: Validators can identify a faulty point in an XML text. However, under certain circumstances, they cannot provide the list of allowed elements at this point. For a program that is negligible. It will reject the document and re-request. However, for a user, it is much easier to correct the error if a list of allowed elements is available at the faulty point .
RelaxNG for working with documents
While XSD is the likely tool for data-centric applications, RelaxNG has its strengths when working with documents. So RelaxNG is significant more powerful in describing structures. The RelaxNG schema can be read well, this makes it easy for users to create or adapt schemas.
The power of RelaxNG is the flexibility to describe complex structures. XSD has difficulties to describe certain constructs in RelaxNG or cannot do it all. For instance, RelaxNG allows you to have any order of elements ("interleave"). For certain applications, this is not necessary or desirable to do so. XSD, however, always requires an explicit order, which is important for data applications. Another example is the so called "Mixed Content". The schema defines a certain order of elements. In between, there can be any continuous text. For example, in the brief description of a wine there should always appear certain properties in the same order like type of grape or region. This also possible with XSD but it is more expansive and associated with certain restrictions.
Another strength of RelaxNG is the possibility of modularization. RelaxNG schemas can be expanded or adapted for various application scenarios. This can also be used for more clarity in large RelaxNG structures. For example, a company can design schemas for different areas and then combine them into a single structure. This makes it more easy to create an overall schema. Or you have a basis schema that is always used. Depending on the customer or the application case there will be definitions added or overwritten in the basic schema.
Unlike XSD RelaxNG is based on a formal mathematical specification. This allows you to determine formally if the change of a schema is backward compatible to the initial schema. The combination of two RelaxNG schemas is again a RelaxNG schema. This is not always true with XSD. It is also possible to convert RelaxNG schemas into other schema description languages.
Unlike RelaxNG a validator for XSD cannot always determine incorrect elements or suggests allowed elements if the XML document is invalid.
Although, RelaxNG does not have the name recognition of XSD, it has proven itself well in practice. Evidence are the use in DocBook, Adobe InDesign/InCopy Markup Languages IDML/ICML, DITA (Darwin Information Typing Architecture) or EPUB. oXygen from Syncro Soft as a widespread XML editor supports the RelaxNG standard.
As RelaxNG has more options for defining schemas, validation is somewhat more complicated than XSD. But this is a challenge only for the developers. The users themselves do not experience this complexity, for instance, if they edit a text that is based upon a RelaxNG schema.
Appendices
Appendix 1: FAQs
Why do you use the relatively unknown RelaxNG to describe XML schemas?
XSD is exspecially more well known because it is widely used in web programming. In document-oriented applications such as Docbook RelaxNG is against common practice. It significantly offers more possibilities than XSD to describe structures clearly, flexibly and understandable. These are the applications censhare is made for. The strength of XSD are data applications. Another advantage of RelaxNG over XSD in the area of document-oriented applications is: RelaxNG can validate at element level. Therefore, it is always possible to point to defect elements directly and propose permitted elements contextually. There are some difficulties with XSD to do that. Beside that, RelaxNG schemas are more easy to read than XSD schemas.
With RelaxNG the censhare AG decided for a language that is much more complex than XSD?
The opposite is true. RelaxNG schemas are much more easy to read and understandable. Only, the technical implementation is more complicated.
Can I store own RelaxNG schemas in censhare and use them?
Yes, this functionality is analogously planed to the Content-Editor in censhare 4.
Can I convert standard DTDs and XSDs into RelaxNG?
Appropriate XML tools like oXygen from Syncro Soft are able to do that or can assist.
In addition to the XML syntax a compact representation exists for a RelaxNG schema. How does the support looks like?
The compact syntax represents a more compressible RelaxNG schema than the XML syntax. However, since the compact syntax has no XML structure it cannot be processed by an XML parser. For this reason, censhare does not support the compact syntax. However, it is always possible to convert the XML structure in the compact syntax with the help of relevant tools like oXygen of Syncro Soft and back.
Appendix 2: Examples of the flexibility of RelaxNG
Flexibility in the order of attributes and elements
In RelaxNG you can easily define dependencies when using attributes and elements. Such a dependency can look like this: An element may have either an attribute or specific elements as children, but not both at the same time.
<element=“Addresses“
<zeroOrMore
<element name=“Person“
<element name=“Name“
<text/
</element
<choice
<attribute name=“Contact “
<ref name="Person"/
</attribute
<group
<element name=“Street“
<text/
</element
<element name=“Location “
<text/
</element
</group
</choice
</element
</zeroOrMore
</element>
Example 1: Definition of dependencies in RelaxNG
In Example 1 a "Person" may either have an attribute "Contact" with a reference to another "Person" or the two elements "Street" and "Location". Both together is not allowed. The condition "choice" ensures this. The element "group" combines the elements "Street" and "Location" together. Therefor, both elements can only occur together.
It also demonstrates that RelaxNG attributes and elements are treated equally close related to the syntax. This simplifies the description unlike XSD which requires a clear separation.
Readability of schemas in RelaxNG and XSD
A schema description in RelaxNG is easier to read than in XSD. The following example will illustrate this. A list of people can have an attribute either "Name" or "Alias", but not both. For better readability the schema defines a "Person_object" to which the list definition then refers.
<define name=“Person_object“
<choice
<attribute name=“Name“/
<attribute name=“Alias/“
</choice
</define
<element=“Persons“
<zeroOrMore
<element=“Person“
<ref name="Person_object"/
</element
</zeroOrMore
</element
Example 2: schema description for a list of persons in RelaxNG
<xs:complexType name="Person_object"
<xs:attribute name="Name" type="xs:string"/
<xs:attribute name="Alias" type="xs:string"/
<xs:assert test="count(@Name | @Alias) eq 1"/
</xs:complexType
<xs:element name="Persons"
<xs:complexType
<xs:sequence
<xs:element name="Person" type= "Person_object" maxOccurs="unbounded"/
</xs:sequence
</xs:complexType
</xs:element
Example 3: Schema description for a list of persons in XSD
With „<xs:assert test="count(@Name | @Alias) eq 1"/>“ XSD assures in the Example 3 that either „Name“ or „Alias“ is used. This is a so-called assertion in XSD. RelaxNG uses the element pair "<choice> ... </choice>“ in Example 2 to guarantee the constraint .
Assertions are also a problem in XSD when it comes to the validation. The validator treats assertions as a mathematical equation which he evaluates. The assertion is true or not. But he cannot infer conversely using the assertion which elements or attributes are allowed at a certain incorrect place in the document. Therefore, an application cannot deliver a list of allowed elements respectively attributes.
Ambiguities in XSD and RelaxNG
As the goal of XSD is the description of data for exchange between programs, the unambiguity of the structure is an important principle in XSD. Therefore, an element in an XSD schema must be unique regardless of its attributes or the content. It may occur only once in total. The XSD-Standard calls this the „Unique Particle Attribution“. This leads to problems with the description of structures for texts. The following simple example illustrates this. It is about an XML schema for a book. This may consist of a sequence of odd and even pages and ends, if necessary, with an odd page.
<xs:group name="pages"
<xs:sequence
<xs:sequence minOccurs="0" maxOccurs="unbounded"
<xs:element ref="odd-page"/
<xs:element ref="even-page"/
</xs:sequence
<xs:element ref="odd-page" minOccurs="0"/
</xs:sequence
</xs:group
Example 4: The XSD schema definition in is not valid because it allows an ambiguity.
The XSD schema definition in Example 4 leads to the error: "non-deterministic content model". It is not a valid schema. In many cases it is possible to get a valid XSD schema by modifying original one. However, this leads to a more complex expression which also reduces the readability. This way is not possible for the book task in the Example 4. To come to an valid definition, there is also the possibility to insert an "anyElement".
<xs:group name="pages"
<xs:sequence
<xs:sequence minOccurs="0" maxOccurs="unbounded"
<xs:element ref="odd-page"/
<xs:element ref="even-page"/
</xs:sequence
<anyElement/
</xs:sequence
</xs:group
Example 5: Using an "anyElement" in order to receive a valid XML schema for the book task
Using "anyElement" the XSD schema becomes valid. However, the solution leads away from the target to specify a structure for the book that is as exact as possible. After all, the use of the "anyElement" causes the opposite: The code in Example 5 allows almost any structure.
<zeroOrMore
<ref name="odd-page" /
<ref name="even-page" /
</zeroOrMore
<optional
<ref name="odd-page" /
</optional
Example 6: Valid RelaxNG schema for the specified book structure
In the opposite to XSD it is possible in RelaxNG to express the book structure short and elegant, shown in the Example 6.
Appendix 3: Sources
Title: An algorithm for RELAX NG validation
Author: James Clark
Link: www.thaiopensource.com
Title: Documents vs. Data, Schemas vs. Schemas Author: Bob DuCharme
Link: www.snee.com
Title: How to define an XSD element with either one of the two attributes?
Website: IBM developerWorks, Forums, Q & A
Link: www.ibm.com
Title: OASIS - RELAX NG Tutorial
Author: James Clark, Makoto Murata
Link: relaxng.org
Title: Relax NG
Website: Wikipedia
Link: en.wikipedia.org
Title: RelaxNG-Book
Author: Eric van der Vlist
Link: books.xmlschemata.org
Title: RELAX NG home page
Author: Makoto Murata
Link: relaxng.org
Title: Taxonomy of XML Schema Languages using Formal Language Theory
Author: Makoto Murata, Dongwon Lee, Murali Mani, Kohsuke Kawaguchi
Title: XML Schema: how to declare complexType that has either attribute or child with the same name
Website: Stack Overflow, Q & A
Link: stackoverflow.com
Title: XML schema languages
Website: Wikipedia
Link: en.wikipedia.org
Title: XML Schema (W3C)
Website: Wikipedia
Link: en.wikipedia.org
Title: XML Schema Part 1: Structures Second Edition
Author: Henry S. Thompson, David Beech, Murray Maloney, Noah Mendelsohn
Website: W3C
Link:www.w3.org