Data in Motion

Wednesday, March 24, 2010

XML Schemas and Changing XML Technologies

In the past, XML schemas were used to define guidelines for the structure of data interchanged between parties. These guidelines were flexible to allow significant latitude in adoption. Developers used the guidelines to design systems that exchange XML messages conforming to the guidelines and partner agreements. The developer bound the message syntax to application code typically using either a transform or a Document Object Model (DOM) or Simple API for XML (SAX) parser.

Today, it is common to use XML Technology to directly develop the messaging portions of applications. XML schemas are “bound” to software classes using binding tools such as Java Architecture for XML Binding (JAXB) and JiBX. The messages are directly validated against the schemas using standardized XML software and hardware. Using XML binding technology is allows developers to quickly create classes directly from the guidelines and to easily update those classes. This direct relationship between the standard schemas and application code creates new challenges for schema developers.

The XML standards have always provided schema developers considerable latitude to increase adoption by balancing accuracy and precision against flexibility and longevity. In the past system developers preferred longevity because it minimized the amount of work needed to craft and maintain the message handling portion of their application. Today, however, developers prefer precision because it minimizes the effort to bind the schemas to their applications
The use of technologies which directly use XML Schemas to create portions of software applications and to pre-process the messages has changed the metrics by which schema based standards are judged. If IATA XML technology, including XML Schemas, Best Practice Guides and Dictionary, are to maximize their ability to be leveraged by industry, then they must be judged by the following characteristics:

• The schemas must represent a clear and consistent vocabulary.
• How reusable are the schema components that describe the vocabulary terms.
• How easily precise messages can be created from the components.

Monday, March 22, 2010

Schemas for 2010 and beyond

How and should how we design XML Schemas change because of how developers use them? Does the broad acceptance of JaxB, Jibx and other binding tools mean we should change?

I am trying to change how standards bodies produce their schemas to take direct binding into account -- and would welcome feedback, suggestions, etc.

Wednesday, December 2, 2009

Welcome - Data In Motion Blog

I often refer to "data in motion" to make the content of messages distinct from data in databases ("data at rest"). SOA and EDA share the characteristic that they free data from its containing resource application/database resource, formatting it (hopefully in XML), and put that data in motion between processing contexts. Freeing data and putting it in motion gives us a great opportunity to do things that we can't to data at rest.

First, a couple of things this blog is not trying to cover:

the details of XML - while I am deeply entrenched in XML, that subject is covered elsewhere
the details of infrastructure - that also well covered elsewhere
the pros and cons of SOA and EDA
semantic technologies and the semantic web

This blog is intended to explore what can we do with flowing data that we can't do with data entombed in applications. Topics of interest:

how can validation improve a SOA/EDA architecture
when is the right time to canonical-ize data
how can SOA/EDA impact data proliferation issues ("data puddling")
when are standard schemas useful and when do they get in the way
relationships between service provider, system of reference, and system of record and how SOA/EDA can improve those relationships
semantic veracity - in SOA that which we call a rose by any other name would not smell as sweet because we would not find it.
the interplay of context, data and behavior - data in motion carries with it more context than data at rest when normalized. How do we take advantage of this?

I am sure there are many more topics, but this should be a good starting list.