Overview of the XML and binary converter

Overview.

How to use the data converter.

Storage format commands for the converter.

Examples.

The actual data converter page.

What is new.



This is a companion project to the XML form editor. In fact it is envisioned before the form editor. I figured that the form editor would be useful to this project, so I wrote that first. It turns out that the editor took up so much time that only now I work on the XML and binary conversion.

The idea is not to have a binary representation of XML data, rather it is to have a XML representation of binary data. The reason is that there are lots of legacy file out there which needs special program to read the content. If we have have a XML representation then it can be read without special software. In order to map the binary data to XML, we need a description that describes the data so it can be converted. This is known as data format description Language (DFDL), and the specification is being worked on by the DFDL-WG. However I have not see much progress. A partial implementation of a variant that tries to stay close to the latest DFDL draft can be found in "Virtual XML Garden". However, Microsoft has own specification in BizTalk. A few other companies probably are probably doing their own way.

My own interest does not start from these DFDL works. Rather I got my inspiration from the the Macintosh world, where binary resource data are described in a somewhat similar manner. However there are lots of problem, you have Rez/DeRez if you want to do it in batch mode and you have learn a new language. If you want to do it interactively, you used to have the Mac OS8 ResEdit program which has a completely different template system.

Years ago I wrote some software that can be used to map binary data into scripting language data structure. So I have been thinking that if I can map binary data to XML, it will be a language familiar to many people, supported by many tools including XML specific editors. Combining XML binary conversion and an interactive XML editor, you can a easy way edit a binary file. So before I do a web based XML binary conversion tool, I should do the web based XML editor. Now that the editor is running for the most part, I turn my attention to the data conversion.

However, there is one problem. With the editor, I just accept XML schema as a standard. It is a very complicated standard and not easy to implement it all, but at least it is a obvious standard. With data conversion, there is no standard that I am willing to commit to at this time. This is one reason I am reluctant to start the project. Finally I decide to use my own rules so I can proceed. The goal is not to establish yet another standard. Rather it is to establish what I considered to be the basic features, then I can try to see if I can fit it inside any of the standards. It seems that most works out there are mostly interested in legacy business data, mostly written by mainframes using COBOL. Obviously that is where the money are. However this is not my target. Would any company want to post their customer records to my web site to convert to XML data. This is not very likely. And would I want to have megabytes of data post to me? I would abort as soon as anyone try to do so. I am mainly interested in converting and editing of small configuration file. Therefore I would concentrate on certain types of data file. It would be unable to work well or work at all with many other type of data files, as least for the immediate future. For example, I don't deal with EBCDIC, the support of decimal type is very weak. So this is not the place to deal with business data files. The target audience are people who occasionally need to convert small amount of data. They do not want to download any converter. They want a converter that gives instant feedback so they can experiment with it.

One should keep in mind that this is about XML representation of binary data, not the other way around. This means that we would try have a design that most binary file can have a XML representation. Failure means that there is need for improvement. However, we would not strive for a binary representation of every XML file. It is perfectly acceptable that there are some XML data that we fail to convert to binary. If the purpose is to have a XML representation of legacy data, then only conversion in one direction is needed. However, since my goal is to be able to edit binary file by XML editing, we need conversion in the other direction. So the rule is that any XML data generated from a binary file must be able to be convert back to binary. For any other XML data there can be no guarantee. Having said that, I would try to have a binary representation for most XML files, even if the generated binary is unlikely to corresponding to any legacy file.

I also believe that the generated XML data should look like XML written by some one knowledgeable in XML. It should carry little baggage from the original binary file. An analogy is that an English essay should look like a written essay, and not what see when you hit "translate this page" in Google search. Here is an example of the sort of translation we want to avoid. It is the goal of this project to enable converting to XML data with good style possible.

When we are converting XML data, we make an assumption that the XML data are valid. We do some checking of the data, but it is minimal. Therefore the user should make sure the XML data is correct, perhaps using validation with some other tools, and there are quite a few validation tools out there. More data checking will be added in future implementation, but that is not a goal in this design stage.

Anyway this is version 0.1. There is a lot that is unimplemented, and there are quite a few bugs. The design is not yet stable. It is meant to be a platform to experiment with. While the data convertor would be a great working partner with the form editor, the converter at this stage is in a state of flux, so no attempt will be made to tightly integrate between the two now. Still there are enough to see what this project is about.

Binary data can also be edited by using both the converter and the form editor. The process would go like this:

converter -> form generator -> editor -> converter -> newly edited binary data.

First you have the binary data in the converter, and you choose the binary to form generator option. The generated XML and the schema would put into the form generator. You can then choose the options in the form generator, although most likely you can just accept the default. Then you can generate the editor and start editing the XML data. When you are done with the editing, you click the generate XML. Instead of getting XML data directly, the XML data would go to the converter. And from there you can choose the options and get back the newly edited binary data. See here for more details.

A tighter integration would eliminate two of the steps and look like this

converter -> editor -> newly edited binary data.

That would require more work, and then there is the issue of how to choose the options differing from the default. This is to be solved in the future.

As we are still in the design stage, any suggestion is appreciated.

Send feedback to feedback AT datamech DOT com

Send bug report to bugs AT datamech DOT com


Change Log

2006/04/21:

Allows conversion result to go back to the converter textbox. Note that 64 bit integer does not work because my ISP is still using Perl 5.6.1 while my test machine using 5.8. I am looking into work around.

2006/04/05:

Allows multiple occurrence of elements in the last element of a sequence stored as csv.

2006/04/01:

Allows transfer of data between converter and form editor generator.

2006/03/20:

Pre-release 1.