Recently Mike Herrick asked a question on his blog about what formats should be supported for Form Data and storing Content for archival. James McGovern appears to have misinterpreted the question and asked which of these formats ECM Vendors should support and why they can’t convert between formats out of the box. His question has merit, so I am going to address both.
File Format Support
James first. Every ECM system that I have worked with not only provided built in support for well over 100 formats, but they allowed you to define a custom format. I once defined an Windows Internet Shortcut as a format as a cheap and easy way to provide links to external websites without customizing the user interface.
All that is really needed for any file format is an easy way to read it once it is defined. In my above example, I just defined is as the same MIME type as HTML and the user’s default browser took it from there. It took only a few minutes using the Administration interface. Using this approach, most ECM systems can support any format that you can think of using.
As for converting between formats, James should check out Content Transformation Services from Documentum. They also support XML transformations with XSLT directly in the DFC so you can write other transformations to use within the system. I’m sure several other vendors have products as well, but I’m unfamiliar with them. If anyone out there knows, add a comment so we can all know.
Archiving Documents and Form Data
Now, the real question, how do we store archived material and form data? I’ll start with archived information as it is easy to answer. I’m a big fan of PDF/A. It is incorporated in the ISO standard 19005-1. To make it easy, Adobe Acrobat and other tools can create the format and there is a free way for anyone to view it with Adobe Reader. Even the National Archives accepts it as a valid electronic format. They do have some rules and guidelines for using PDF, such as embedding fonts, but it is accepted, even without the original format.
So that leaves us with Forms. Mike is correct, there are several formats out there. I’ve always been a fan of saving the form data outside of the actual form. Documentum works with Adobe to publish forms in either HTML or PDF and store it as XML or as structured data in a database. Adobe works also with IBM’s FileNet for this functionality.
However, Adobe aside, I typically have used forms just to collect the data. The data is stored and preserved separately. I keep copies of the form that was used upon creation to give it context. I’ve had clients that needed the data burned into the form to meet their Records Management requirements. In that situation, I’ve burned it into a PDF document, which takes us back to the PDF/A standard.
The real question with Form Data is, “How are you going to use it once captured?” That will dictate what you do. Of the standards listed by Mike, XFA is most interesting. Not because of what it says, but because JetForm proposed it, changed their name to Accelio, and then Adobe acquired them. Support for that standard appeared in Acrobat 6.0.
So a couple of questions for Mike. When you talk about Form Data, what kind of forms are you talking about? What do you need to really store?