Recently, Sumanth Molakala sent me a comment regarding the best place to store metadata.
Here is the relevant portion of his comment:
…one of the key items I am trying to address in it is managing meta-data. I am sure you are aware of the two schools of thought – One – save content in content server and meta-data outside of the content server in a “custom” meta-data repository (assuming that the world doesn’t revolve around Documentum). Two – the traditional approach to save content and meta-data in the content server…
My quick answer…It depends! Before you shoot me, read on…
Why Have Metadata?
Before we answer this question, it is important to remember why we have metadata in an ECM system. Originally, metadata in the ECM world was used primarily to find information in document management systems. Part of the whole Knowledge Management problem is finding information. Metadata helped to address that problem.
As those document management systems evolved into the current crop of ECM systems, metadata began storing other information. Business applications where built on top of ECM systems. Sometimes they stood-alone, other times there was some integration, tightly coupled, with another system. Metadata began to store information about the business functions in which the content was involved. Sometimes, it just replicated information from another system so that the user could work effectively in both systems.
Of course, the world is changing. So the question remains, where should it reside?
Metadata in the ECM 2.0 World
In the ECM 2.0 world, the ECM systems are the backbone of an Enterprise Architecture. It stores content for multiple systems and provides content specific services such as Records Management. In this world, there are several questions to ask.
- Will users access the content directly from the ECM system? If no, then only the information needed to support the access of data is required. This isn’t as simple as an case number or object id. If users are going to use a search from the business application, then it is important to automatically populate the relevant fields from the application to the ECM system in order to facilitate search. If yes, then you need more information depending on what tasks the user will be performing when they directly access the ECM system.
- Are you going to enforce any compliance rules on the content? If yes, then all the information necessary to allow the retention policies to make determinations on how the content is to be treated. In no, get a good lawyer. 😉
- Are you going to just use the ECM vendor for every business solution? If yes, then it needs everything, for performance and CYA if nothing else. This is also ECM 1.0 thinking. That still works, and is necessary for a while longer, but isn’t the world that people are moving towards. Oh, you will also need some good ECM consultants, so drop me a note.
In the ideal world, the answers are No, Yes, and No. The nice thing about that approach, a lot of the metadata is automatic. The metadata comes from the business application automatically or is derived from the environment. The user select the type of document and the name and the system takes it from there.
I do have one system where we are going to have all of the metadata. We have a Web Service that receives metadata for every case and stores it in the repository. Then, as documents are added to the system, they are associated with a case and all the relevant metadata is there, as read-only. This allows users to research multiple cases in a stand-alone ECM portal with all data available to them. The case system is in flux, unlike our ECM system, and isn’t ready to surface the content directly in their application, so we took this approach. While not the long-term approach, it works well for now.
Back to the Question
My answer is simple. I am firmly against a stand-alone meta-data repository. I have no problem with the metadata being stored outside of the ECM system as described above. My point is that the metadata about the context should be stored in the business application and the document specific data should be in the ECM system. What is that document specific data?
- Audit information (dates, users)
- Security (need to keep it secure)
- Source information (for example, if scanned the where, who, and original location)
- Business Application link (may not be necessary, but is always useful)
- Searchable metadata fields (optional, but allows for better searching within the repository)
- Retention metadata fields (any fields that could impact the retention or records policy)
That may seem like a lot, but take a background investigation system. The system would have lots of information on the person being investigated, interviews being conducted, forms submitted, approvals, reviews, and notes from the investigator. The submitted forms, such as the fingerprints, might be scanned into the system and stored in an ECM system. The ECM system needs to know that they are fingerprints, the investigation number, and when the investigation is completed. The name doesn’t matter and could be automatically generated. The investigation completion date will trigger the beginning of the retention policy. Users should spend their time outside of the ECM system to do their work.
It was a little rambling, but those are my thoughts on the subject. I, and Sumanth, would love to hear what your experiences are.