Storing Form Data and Records for the next 100 Years

Recently Mike Herrick asked a question on his blog about what formats should be supported for Form Data and storing Content for archival. James McGovern appears to have misinterpreted the question and asked which of these formats ECM Vendors should support and why they can’t convert between formats out of the box. His question has merit, so I am going to address both.

File Format Support

James first. Every ECM system that I have worked with not only provided built in support for well over 100 formats, but they allowed you to define a custom format. I once defined an Windows Internet Shortcut as a format as a cheap and easy way to provide links to external websites without customizing the user interface.

All that is really needed for any file format is an easy way to read it once it is defined. In my above example, I just defined is as the same MIME type as HTML and the user’s default browser took it from there. It took only a few minutes using the Administration interface. Using this approach, most ECM systems can support any format that you can think of using.

As for converting between formats, James should check out Content Transformation Services from Documentum. They also support XML transformations with XSLT directly in the DFC so you can write other transformations to use within the system. I’m sure several other vendors have products as well, but I’m unfamiliar with them. If anyone out there knows, add a comment so we can all know.

Archiving Documents and Form Data

Now, the real question, how do we store archived material and form data? I’ll start with archived information as it is easy to answer. I’m a big fan of PDF/A. It is incorporated in the ISO standard 19005-1. To make it easy, Adobe Acrobat and other tools can create the format and there is a free way for anyone to view it with Adobe Reader. Even the National Archives accepts it as a valid electronic format. They do have some rules and guidelines for using PDF, such as embedding fonts, but it is accepted, even without the original format.

So that leaves us with Forms. Mike is correct, there are several formats out there. I’ve always been a fan of saving the form data outside of the actual form. Documentum works with Adobe to publish forms in either HTML or PDF and store it as XML or as structured data in a database. Adobe works also with IBM’s FileNet for this functionality.

However, Adobe aside, I typically have used forms just to collect the data. The data is stored and preserved separately. I keep copies of the form that was used upon creation to give it context. I’ve had clients that needed the data burned into the form to meet their Records Management requirements. In that situation, I’ve burned it into a PDF document, which takes us back to the PDF/A standard.

The real question with Form Data is, “How are you going to use it once captured?” That will dictate what you do. Of the standards listed by Mike, XFA is most interesting. Not because of what it says, but because JetForm proposed it, changed their name to Accelio, and then Adobe acquired them. Support for that standard appeared in Acrobat 6.0.

So a couple of questions for Mike. When you talk about Form Data, what kind of forms are you talking about? What do you need to really store?

8 thoughts on “Storing Form Data and Records for the next 100 Years

  1. Mark says:

    I wouldn’t call Content Transformation Services out of the box though. Isn’t it a Documentum module?




  2. You are correct, it is a module, or to be precise, a set of modules. You license and install the one(s) needed to perform the transformations that you require. It all depends on your definition of out of the box. Content Transformation Services installs easily and can go straight to work.

    Is it core functionality? –> No.

    Is it something that we can install and works without having to write code or make changes to it? –> Yes.

    I used the second definition. Sorry for any confusion that this may have caused.

    – Laurence


  3. I did see that question. My gut says “yes”. However, I’m researching it in more detail so I can give a better answer that I feel confident about. Look for that in the next week.



  4. Hey thanks for the post.

    To answer your q:
    When you talk about Form Data, what kind of forms are you talking about? What do you need to really store?

    I’m talking about static forms, variable forms (i.e., merged with data). The form data is likely XML, but perhaps other things. Ideally we’ll store that data and a history of all form versions separately. But due to other requirements we will have to store an image of what is mailed. In general we need to store everything that is mailed. Today we do this in various formats – TIFF, Metacode, etc. As we look towards the future, the format that we store things in seems important. PDF/A seems like the right choice to me.

    I also agree that XFA is the most interesting. Adobe is doing some interesting things with it. It has a ways to go until it becomes widely supported by vendors other than Adobe I think, but it seems to be on a good trajectory. One thing that is interesting to me about XFA – and what Adobe is doing is that you can rendor an XFA as a PDF, HTML, or within Flash. The Flash bit may be useful for cases where you can fill a form 80% of the way with data from a system of record, but the remaining 20% requires a human to fill it in.


Comments are closed.