Mapping between HTML form and XML data


What is new.

The actual form generator page.

Modify the structure of the XML document.

DTD and general schema language.

W3C XML Schema.

How to edit binary data with XML editor with help from XML converter.

Information about the XML converter.

Introduction

It is possible to design a HTML form structure such that it embeds all data content and structure of some XML data such that the XML data are displayed in the fields of the HTML form. The HTML form and the XML data are then logically equivalent in terms of the contents of the XML data. It is then possible to convert automatically between them from a web browser with a single program for any XML data. In other words, given some data-centric XML data, we can generate a HTML form which can act as an data entry form or a simple low-end web based XML editor. However it is not meant to be a professional XML editor, rather the design goal is an editor that is good for quick entry of small amount of XML data.

Currently the implementation is not complete. Afterall there are a lot of features in XML schema and it takes time to do all of them. However, enough has been done so that we can get a feel of the central idea, and it should be a useful tool for a lot of people even though it is not yet completed.

Data Entry and Advanced Editing

There are two different ways to do the XML editing. One way is to treat it as simple data entry. Only the data are being modified, the structure of the XML remains unchanged. All the data fields are fixed and there is no need to insert extra data fields or delete them. No JavaScript is used so there can not be any data verification. When it is done this way, we can have a simple HTML form that has very little demand on the browser. It is just a web form that should be usable on almost any browser.

For more advance editing, we want to be able to change the structure of the XML document. In our approach, the HTML form is a representation of the XML document. So we try to change the XML document we need to make corresponding change to the HTML form. This is implemented with heavy use of JavaScript and DOM. This is the reverse of the case of data entry. We need the latest browser and sometimes even that is not good enough because of all the browser bugs. What we have is a web application which is a XML editor.

Even though these two approaches are very different, nonetheless we can use the same HTML form element structure for both of them. We shall discuss the data entry mode first because the information will be valid for both mode of editing. Afterwards you can go and read about editing the structure of XML document.

First let us go at web form for XML data entry.

A Quick Tutorial on Web Form for XML Data Entry

As an example, we take the PO XML data from W3C's XML Schema Part 0: Primer.

<?xml version="1.0"?>
<!-- Sample data from W3C XML Schema Part 0: Primer -->
<purchaseOrder orderDate="1999-10-20">
<shipTo country="US">
<name>Alice Smith</name>
<street>123 Maple Street</street>
<city>Mill Valley</city>
<state>CA</state>
<zip>90952</zip>
</shipTo>
<billTo country="US">
<name>Robert Smith</name>
<street>8 Oak Avenue</street>
<city>Old Town</city>
<state>PA</state>
<zip>95819</zip>
</billTo>
<comment>Hurry, my lawn is going wild!</comment>
<items>
<item partNum="872-AA">
<productName>Lawnmower</productName>
<quantity>1</quantity>
<USPrice>148.95</USPrice>
<comment>Confirm this is electric</comment>
</item>
<item partNum="926-AA">
<productName>Baby Monitor</productName>
<quantity>1</quantity>
<USPrice>39.98</USPrice>
<shipDate>1999-05-21</shipDate>
</item>
</items>
</purchaseOrder>


Now we put the data into form generator. The schema part is left empty because if we are not doing any sturcture modification or data verification, it is not needed. We also reproduce that generator screen here in case you are reading from a hard copy. This is an actual form (minus some features) so you can use it directly.

Automatic generation of HTML Form from XML data

Title:

Style (or enter file name if you want an external style sheet):

Put the XML data here (or the URL of your XML file):


Put the XML Schema here (or the URL of your schema):

Form output will be in popup window.
Form will output XML plain text HTML, using script on server client
Label will be same as tag separate words
Form fields will be in outline form to reflect the XML structure.
Form will be used in older browsers that do not support fieldset, legend and label.
Allow insertion and deletion, require browser supporting DOM.
Check Schema during editing.
Always conform to Schema during editing.


Now click on the generator button, if you choose the outline option you will get

purchaseOrder
shipTo





billTo






items
item




item






And if you choose the no outline option, you get the following form. This is more suitable to someone who just want to do data entry and do not care to know about XML data structure. The two forms will diverge further in future when we try to add insertion/deletion.



























Anyway if you edit the data with either form and click the Generate XML button, you will get the modified XML data.

More User Friendly Labels

Since no space is allow in XML tag, multiple words are written together. In the example there are tags like billTo and shipDate. in the form generator, if you pick the option to break down tag into words in labels, then the editor would try its best to do it and here is how the editing document will look like.
purchase order
ship to





bill to






items
item




item





You can compare with the previous form to see the difference.

The algorithm is roughly like this. If "_" is in the label then it is changed to space. If not, then we try the same with ".". Next to try is "-". Next we use case change to separate the words. So shipTo becomes "ship to" because of the Upper case T follows the lower case p. The capitalization of each word follows the first letter of the tag. This is not perfect and that is why USPrice becomes "US Price". In future we may need more options to have finer control.

Other Options in the Form Generator

Instead of entering XML data directly, you may also enter the URL of your data file.

If an error is detected while parsing your XML data, often you will be told the line number where the error occur. If you have XML data in the text field, you may hit the line number button to help to locate the data.

The XML output can be in a separate window, and they can be in XML format, HTML format or just plain text. XML is the better choice because Mozilla and IE would display the XML structure. However other browsers may not support XML.

If you are using a really old browser that does not support fieldset, you may choose not to use it.

The bottom options in the form generator is for modification of XML data structure, and it will be discussed on a different chapter.

XML Form and CSS

The above forms look really ugly. No attempt has been made to produce an aesthetically pleasing form. The styles are just the browser default. However you can customize the form to your taste by using CSS. You can put the style information in the style box of the generator form. In this example, the only style used is

.XMLattribute { margin-left: 10px; font-style: italic; }

This set all the attribute label to be offset of 10 pixel and display as italic.

The classes are XMLcomplexContent, XMLmixedContent, XMLsimpleContent, XMLsimpleType, XMLattribute and you can give them style attributes. Each element or attribute name is also a class, so in this example the classes are purchaseOrder, shipTo, billTo, items, item, name, street, city, state, zip, orderDate, country, comment, partNum, productName, quantity, USPrice, shipDate.

So if you want to highlight the productName fields with the color yellow, you can use

productName.drink { background-color: yellow; }

If you do not want to show the productName fields, you can use

.productName { display: none; }

So you can see that you can have a lot of control over the display of the form. The style information can be stored in an external file so that you can reuse it every time.

Modify the Input Fields of the Form

You can save the source of the form and then you can change it yourself for further customization. You can certainly change the labels. You can change the size of the text field and even change it to a textarea. Actually if there is a lot of text in a single text field, the form generator will make a textarea instead. However you should never change the order of the fields because that would affect the XML data generated. Here is another form that is generated and which we are going to modify later.
family

member




member




member






Here is the same form that has been extensively modified, yet it would generate the same XML result. Note that with schema information, it is possible to generate these non-text field automatically. Radio button has some special issues. If all the radio buttons have the same name then they are all consider to be in the same radio group. So you have to append the name by a comma and a number so that the groups are all distinct. Here is the new modified form.


family

member

malefemale
coffee tea milk juice
member

malefemale
coffee tea milk juice
member

malefemale
coffee tea milk juice

Another idiosyncrasy of the checkbox or multiple select is that if the child prefers both milk and juice, multiple drink elements will be generated as show below.

<family familyname="Wong">
<member>
<name role="father" sex="male">Alex</name>
<drink>coffee</drink>
</member>
<member>
<name role="mother" sex="female">Mary</name>
<drink>tea</drink>
</member>
<member>
<name role="child" sex="male">John</name>
<drink>milk</drink>
<drink>juice</drink>
</member>
</family>

Client Side or Server Side Operation

In the examples so far, the transformation between XML data and HTML forms are all done by server side CGI. However you can do both from the client side too. When we do it in JavaScript, we can only write it out in an HTML document. However it has the advantage over CGI that we can now get around the bug in non-IE browser that requires it be done from a snap-shot.

If every time some user wants to edit some XML data, he has talk to my server, then I can never provide the bandwidth. If you are editing some XML financial data, you do not want to send it to my server just for generating the data. So client side JavaScript is the obvious way to go in XML data generation. Furthermore, generate XML data is usually only the first step of your operation. For example, you may want the form to be the front end of web services. And as we show earlier, once you generate the form, you can save it and then change it to do whatever cool stuff you come up with. It should be pointed out that if you choose client side operation in your editor, then your editor will from that point on be independent of the server. The server can be down and it would have no effect on you. You can do all the editing and XML generation even if you are off line.

As for the process of generating forms, you may want the server to do it for you. The server CGI would offers more options and the latest software, and once you generate the form you can save it and no longer needs the server, so the load is not too heavy. However, this is not the only way to do it. If you want you can do it by hand, following some simple guideline. Another interesting option is XSLT. I have just started working on it and has a primary version that generates a subset of the CGI generated form. I would finish it when the design is finalized because I don't want to keep synchronizing the XSLT programming to the CGI perl script. If you want to try it out now, test.xml is the same XML sample data used in the examples above. If you open it with IE6 (Mozilla would not work because they do not support disable-output-escaping), you will see the data in the HTML form fields. This means that in future we can take any XML file, add in the reference to the XSL stylesheet and then start editing when we open it in the browser. However this is based on an earlier design and I have not yet update it to reflect the latest design.

Limitation

This is not an editor designed for the professional. This is designed for quite and simple editing of XML data that can be done from a web browser.

A large XML file would translate to a form with many input elements, and that will really push the browser. Editing the structure of the XML document is done with JavaScript, not exactly suited for large amount of computation. So it is not surprising that it would not work well except for smaller XML files.

I suppose that if we only bring a slice of the XML file into the XML Form at a time, then we can handling larger files. That would be a completely different animal, requiring a lot of work and there is no immediate plan to do it.

So only use this for what it is designed for, quick editing of small amount of XML data.

Browser Compatibility

Since DOM is used heavily, you should try to use a recent version of the browser.

IE: IE6 or IE5.5 (Window only, not the Macintosh version) usually works fine, with a lot of extra work in Javascript.

Opera: Opera7 mostly works fine except it would not display the generated XML file so you have to output it as HTML. Opera6 does not work.

Mozilla: mostly tested with 1.2, 1.6 and the Firefox 0.8. It works fine with minor problems here and there. However if you try to generate the XML data, often it would fail due to a bug in Mozilla. If you take a snap shot and then generate the XML data from the snap shot, then it should generate the XML data correctly. However you would not have this problem if you generate the data with client side JavaScript.

Safari: has the same problems as Mozilla and more, saving the file would not help. Also Safari would not display XML file so you have to output it as HTML. Taking snap shot does not work so you cannot use the work around . It cannot handle radio buttons. There are enough problems that currently you should avoid using Safari with this editor. For the Mac the best browser to use for this editor would be Mozilla/Firefox.

See also section on keyboard navigation to see more incompatibilities.

The Future

This project is only at the beginning stage. A lot of works remain to be done. Feedback are welcomed and would be very helpful.

With prerelease 2, we include editing the structure of the XML document.

With prerelease 3, we include support of DTD.

With prerelease 4, we include partial support of XML schema. Data verification is now fully exposed.

With prerelease 5, we include support of parameter entity in DTD.

If you have any comment and feed back, send it to

If you want to report bugs, send it to


Change Log

2006/3/28:

There is a new prerelease 6. However this is only to be used together with the binary to XML converter. For now it is safer to use prerelase 5.

2006/3/3:

A companion project to convert binary data to/from XML has been released. In future there will be some integeration between the two projects.

2006/2/11:

For attribute, the use default should be optional. There is a bug so this only works if the attribute is declared globally, and it appears as required if declared locally. This is now fixed. In particular, the orderDate attribute in the purchase order example should now show up as optional.

2005/4/30:

Allow generation of XML data from client-side JavaScript.

2004/9/30:

Pre-release 5. Add support of parameter entity, include and ignore in DTD. This allows support of DocBook. Use cache for public DTD to keep performance level to a more reasonable level.

2004/11/21:

Fixed bug that the new element command and clear nil element command fails to generate nested elements.

With large schema, new elements may have too many nested child elements and takes too long to generate and the browser to handle. In such case we would really restrict the child elements to avoid the problem. In future we may use better heuristics to generate a better new element instance.

2004/11/11:

Provide fixed menu for IE

Remove menu and submit button on printing.

2004/11/05:

Menu is now always on screen, so it can be accessed even if the document has been scrolled to the bottom.

Substitute group in W3C XML Schema is now supported.

Nillable element in W3C XML Schema is now supported.

When multiple language is available in documentation, the best language is picked rather than list them all.

Fix problem that wrong XML is generated when all buttons in a radio group is deselected.

2004/10/21:

Fixed problem that going from prerelease 3 to 4, verification of content model no longer works on IE.

Label may now be multi-worded by breaking down the tag into words.

A more user friendly message in the display of content model no longer uses regular expression.

In IE the F1 help key would show information about the selected element.

Rearrange the menu item so that "verify Against Schema" can be reached without scrolling in IE.

2004/10/17:

Fixed problem that annotation will break schema parsing. Use documentation in annotation for to provide information about elements in the editor.

When an input field has keyboard focus, the corresponding element is considered to be selected.

Fixed problem that non-global new element cannot be created. This is a partial fix because if name is defined in multiple places, we need the user to pick which element with that name will be created.

Fixed problem that complex content attributes not in the restricted type does not show up.

2004/10/4:

Fixed verification of float/double. Reports unsupported datatype. Add verification of ID/IDREF/IDREFS.

2004/9/30:

Pre-release 4. Add partial support of XML schema. Data verification from pre-release 3 can now be specified using schema. Schema can also be guessed from the data.

2004/6/4:

Pre-release 3. Add support of DTD. DTD can be used to verify the data entry. Blank document can be generated from the DTD. Selection list can be made from enumeration.

Add support of most of the predefined datatype and facets in XML schema.

Add preference dialog to turn on and off DTD data verification.

2004/4/29:

Add snap shot command to make HTML source reflecting the current content of the web page.

2004/4/21:

Add navigation using the keyboard.

2004/4/18:

Fixed problem that select list or radio buttons cannot be cleared to no selection.

Fixed problem that when attribute is paste into an element, it becomes the first attribute rather than the last attribute.

The Submit button is now also on the menu, the reset button is removed because it is not done right.

Repeating elements can now be sorted.

2004/4/13:

Allows operation to change text field into select list or radio buttons.

2004/4/7:

Pre-release 2. Allows modification of XML document structure through insertion and deletion.

2004/1/26:

Pre-release 1. Allows editing of values in a XML document.



Go back to the top.