I talked last week how the first, and more critical, step of any Content Management System (CMS) is to actually capture information in the system. After it has been captured, information needs to be organized and categorized, which leads to the proper application of Information Governance principles.
The question is, How do we organize and categorize information today?
The simple answer is, Poorly.
This isn’t an indictment against the newer cloud vendors like Box and Google that don’t have a way for people to categorize information outside of tags and a folder hierarchy. The problem goes deeper. This is a problem born from complex metadata schemes and the desire to categorize something into a system that can appear as complex as the Dewey Decimal system.
This is as much about process as technology.
Ease the Decisions
Most of the ways we can ease the burden of organization have been discussed in depth and most industry leaders agree to varying degrees.
One part is to reduce the metadata requirement. People don’t want to fill it out. It takes too long and is viewed as more work. One or two fields, in addition to the title, is usually all that people will gladly fill out. People can visualize how those fields will help them find the content later. As long as people see the benefit off doing something for them, they will gladly do it. Once that benefit fades, it becomes work.
The other part is to reduce the categories, also known as the big bucket approach. There has been a lot of talk about the big bucket approach for years. Simply put, it reduces the decision process for both people and systems.
The following is just an EXAMPLE of how things could break down. Reality will vary from industry to industry and organization to organization. The longer the period of retention, the easier it is to categorize the content.
- 3 months: Junk. I am not a big fan of this outside of email. Likely best for those items that people delete shortly after creation.
- 2 years: Default duration for items of business value. Most items have already been sucked dry for re-use at this point. If someone wants to keep something longer, they can request it, but they should have to make the request.
- 7 years: For client projects and financial information, there is a longer period that you should keep items for to be safe.
- 10+ years: There may be a special regulation around specific types of content. These are usually well understood in every company and rarely a problem identifying and enforcing.
Of course, simplifying the task of categorization is not enough. “Declaration” and considered work.
Categorizing the Universe
Okay, that is a little bold, but that is our goal. There is no single way to make sure that everything is categorized correctly. It will take a multi-layered approach with everything getting categorized into a default level.
- Workflow: This is the oldest way to auto-categorize information. If you have a process it is easy to categorize things in the course of the defined workflow. How much this will catch in an organization depends on the nature of their business. Let’s say 20-25% of content in the average organization to be conservative.
- Business Rules: This can be readily setup based upon metadata, object type, and location. For instance, you could keep everything in the Finance area for 7 years by default. Client project information could be placed on “hold” until the close date, at which point a 5 year retention would be applied. When properly used, this should be able to catch an additional 25-30% of content in an organization.
- Content Analytics: This is an emerging technology. Fresh off the heels of becoming a viable, legally accepted, solution for eDiscovery, content analytics vendors are looking to categorize content starting at their creation. I’ve seen software that can, with proper training, eadily hit accuracy rates greater than 90%.
- Manual Declaration: We have it, so keep it. This is perfect for exception handling or for generic CYA operations. The point is that it is no longer needed for normal business operations.
The first two methods, workflow and rules, will handle 50% of the content in an organization with a high level of confidence. When you apply content analytics to the remaining 50% you hit 95% accuracy rate, all without humans making a decision.
That is consistent and defensible. As content analytics become more advanced, people can re-process old content and increase the accuracy.
How many organizations can claim that 95% of their content is categorized correctly? When mixed with the increased capture rates from before, you are looking at an organization that no longer has a compliance problem.
There are a lot of other benefits as well. For now ask yourself this, Do you think you can do better?
If so, tell us how.