Taxonomies, Good, Bad, or Ugly?

Sumanth Molakala posted a great look at determining the amount of effort that should go into creating a taxonomy for a new Enterprise Content Management solution. This brings up a debate that I have had/observed among ECM professionals for years. Do we make Search the primary access method, or the second? I find that every professional has a leaning, and I have yet to find a solid predictor for any practitioners’ preference.

I prefer a good hierarchy, while Sumanth appears to favor searching. I find that the creation of a hierarchy helps me organize my thoughts and determine what is important about any given piece of content. Also, while Google may be trying to take the world over via the Internet, most users are more intimate with their old-fashioned network file structures. The ability to browse to a piece of content adds to user acceptance of their first Document Management application. Over time, many users transition into Search-first users. Until that happens and ECM becomes transparent, I believe that a good taxonomy is important.

Rules for a Good Taxonomy

Let’s assume that you are going to create a taxonomy. What rules do you need to follow to creating one? There are several viewpoints out there, and while none are definitive, their are some that I find necessary.

  • Intuitive. If the users can’t easily make sense of it, then it is worthless. Different groups of users may have different views on the same data. Most ECM systems allow you to link folders in multiple locations, but this requires some extra effort, depending on the setup.
  • Learn from the Past. Those network folders are a pain in the behind. However, they are also a nice insight into how users think. While replicating their structure is a bad idea, looking at the structure and getting the user’s opinion on the usability of them can lead to some time-saving measures.
  • Keep it Small. Too much navigation time leads to dissatisfied users. A widely used rule of thumb is log x, though I find log (x/2) slightly better in current ECM systems. For 2,000 documents, this is 3 levels max. Keep in mind, this assumes documents on the base level. In Documentum, when you dive into a Cabinet, you are in level 2. If a user can only see 10% of the documents due to access, your hierarchy should reflect that. When you hit a million documents, that is 5-6 levels, which is quite a few, so an alternate approach may be required.
  • Lock the Top. This is an approach that I have used in several places. The first few levels are set by the administrators. The levels below are controlled by the group of users that primarily store content in that location and can best create and adjust the structure over time as the environment changes. Remember that they will create at least one level, so take that into account when determining the number of fixed levels.

In a Web Content Management application, it is simple. Just mirror the structure of your website. If that proves difficult, then maybe your website needs to be restructured.

Virtual Folders

TaskSpace, coming out this fall as part of the D6 rollout, Documentum uses the concept Virtual Folders. This is just a canned, optimized search. Say you have a million or more documents, but they all have a common piece of meta-data, like account number. A Virtual Folder’s name would be the account id. When accessed, the system runs a query in the background on that attribute and retrieves the appropriate content. This gives the user the comfort of a folder without implementing one. This is best used for transactional content.

However, in the logical back-end, at least in Documentum, the content must officially reside in a folder. All of the content can reside in the same folder, but that becomes unworkable if you ever decide to browse for something. One could argue that you never will, but what if your Search Engine is down? Also, keep in mind that even on the physical storage devices, Documentum sets the amount of objects in a given folder to 256.

In this situation, I find that some automatically generated taxonomy makes life easier for administrators. On simple approach is to take every two characters of the ID in question and make that a level when imported (12 -> 1234 -> 123456…). This structure is created behind the scenes and allows simple browsing when necessary. It requires very little effort to create.

An Innovative Approach

A recent article in the July issue of Communications of the ACM, Collaborative structuring: organizing document repositories effectively and efficiently, talked about generating dynamic hierarchies. It was very interesting and worth a read if you can get your hands on a copy.

It proposes allowing each user to create a local hierarchy for the content that they use and contribute. They can share their hierarchies with other users, allowing others to see how they classify content, leading to the discovery of new content. Experts in different areas tend to spend time developing a solid hierarchy in order to develop and share their expertise with the user community. Users may look at Johnny Gee’s or my hierarchy to see how we classify data about the world of ECM.

The system looks at all of the local hierarchies and generates an enterprise-wide consensus hierarchy. As local hierarchies change over time, the consensus hierarchy can evolve as well. This can limit the chance of a hierarchy becoming stale or out-of-date over time.

There is a lot more to this methodology, including the linking of content together, so I encourage you to read the article if you are interested in more details.

The Ugly Truth

The unfortunate fact is that there is no one answer fits all statement can be made. The best that can be done is to collect best practices and to keep your eyes and ears open for exceptions. Every organization is different and it is crucial to take the pulse of each organization to determine the optimal approach.

6 thoughts on “Taxonomies, Good, Bad, or Ugly?

  1. Hi,
    What do you think about folksonomy?
    It’s another way to organize data. User can collaborate tagging document for them or the community. The sum of users will help organizing the content of the company.


  2. I’ve never had experience with folksonomy directly outside of simple ones like WordPress. The only problem I’ve ever had with similar approaches is getting all of the users to apply the appropriate tags. Just a few meta data tags has proven challenging at times, depending on the user community.

    However, I can see its value for more of a KM type system. That would be a pretty good approach if the users were committed to making it work.


  3. I really like your “rule of thumb” about folder depth — log base 10 of the number of documents. It feels right, and I believe applies to hierarchical structures besides folders — facets in search, site navigation, …

    However, here is my question: Where did you find this “widely used” rule of thumb? I’ve looked all over and can’t find the source (or other users of this rule).

    Thanks! -Bob


    • I have trouble finding it myself. I learned it a long time ago and when I saw it in the cited article, it wasn’t new information. The article does not cite it’s source for it, so I’m at a dead-end.

      It roughly equates to 8-13 per folder per level. My revised formula is 9-16 items per level. There is a lot of averaging here as some will fall under that range and others will fall above. The average works out usually.

      So if it isn’t widely-used and I was incorrect, then maybe it should be widely-used. 🙂



Comments are closed.