One of the highlights of the D6.5 release was the new features around enabling high volumes of content in the repository. There are two standard problems around storing lots and lots of content in an ECM system.
- Every object in an ECM system has overhead. In Documentum, it is takes 2-3K in database storage per object. That can add up quick. Just think of the emails in your organization or the images that a financial institution might generate from scanned documents.
- The very act of adding content into the Content Server takes several round trips to the database. Is this a valid ACL to assign? Does the containing folder exist? Those are just two of the questions asked during the process.
After learning about these features in more detail, in discussions with Victor (See a recording of his presentation on the Developer Network) and Ed, at EMC World 2008, I started making plans to use it on one of my projects. I later learned that there is a catch. Before we get to the catch, let’s review the highlights.
The Shiny New Features
There are several cool features in the High Volume Server (HVS). I’m going to cover them quickly, but they are outlined in the High Volume Server Development Guide (and in presentations above) which is part of the D6.5 download. There was an updated version with the service pack 1 release, so be sure you look at the latest version.
- Batching: This is what is sounds like. You group a bunch of actions into a batch, like the contents of a transaction, and then it is all submitted to the Content Server at once. Say I am ingesting 20 documents, the client would collect all of the information and submit it to the Content Server to be processed together.
- Scoping: This is something you would probably use with Batching. It turns off currency checking for some actions. For instance, if you know that the ACL you are using will exist, it won’t check. This saves a call to the Content Server, which in turn doesn’t issue a query against the database.
- Data Partitioning: Allows you to break a table over multiple physical partitions. This allows you to manage your database storage in much the same way you might manage your content storage. Older records can be place on cheaper storage. In addition, by querying specific data partitions, you can also improve system performance. Partition Exchange will even allow you to ingest content into one partition while avoiding all the mucking around with TBOs and audit trails.
- Lightweight SysObjects: This is the feature that everyone had been drooling over. It allows objects to share a majority of that 2-3K of database storage, keeping distinguishing metadata values unique to an object. They would all have the same version, ACL, and retention, but they would have different names. If one had to become different, you could assign it to a different parent or convert it to a full SysObject. When the situation reversed, you could undo it. Pretty sweet.
That is all very high-level, and hopefully my shortening of the wording didn’t distort anything. All-in-all, very cool stuff.
Some quick tips…batching optimizes at around 100 objects, though there are a lot of variables that will adjust that number. Also, some of the performance gains decrease as the content gets larger. This makes perfect sense as the larger the content, the longer the transfer time. That is something for the network engineers. Keep in mind though, that when you see performance measurements on these features, they may have been done with small documents, like email, and not large ones, like scanned images.
There is also a white paper, D6.5 HVS Archiving Performance & Scalability by Ed Bueche from September 1, 2008. There is also a High Volume Server Tuning Guide as well. All three documents are on Powerlink, but I think I am going to see if Alan would be willing to publish them on the Developer Network. I’d post them myself, but I’m not sure what rules they have about making them freely available (though documentation should be free to all!).
Now that your mouth is watering, let me present…
I like the features. After a recent detailed briefing and discussion, I liked them even more and had a great idea on how best to utilize them for my existing clients. One catch, HVS IS A SEPARATELY LICENSED PRODUCT!!!
Like Retention Policy Services and the Collaboration Edition, the feature is activated simply by entering a license key. There is no separate installation. There are just additional fees that are server (CPU) based. The license gives you the ability to use the features and have read-only access for your entire user base (see an account rep. for exact details). If you want users to have read/write access to content, you have to pay per user AND per CPU.
This just pisses me off. I’m all for a CPU-based license model as an optional way to license the product. It can be a more economical way to license an enterprise application when compared to the user pricing for a large user-base. The problem here is that it isn’t either/or, it is BOTH! If you have an existing system and want to add this functionality, you have to buy it. You can’t just enter in the license key and start making changes to your code or object model.
Now, there may be deals being cut out there, but I’m talking strictly standard pricing model. This is new functionality! Maintenance fees help pay for that. The number of line items needed to support some installation, just at a basic level, is getting too long.
The Captiva product line recently started simplifying their pricing model, bundling in many things that should, and now are, core features. This seems sensible. From what I understand, it cost a little more, but existing users don’t pay that, and the market will deal with the new price.
EMC is not the only company that does this, but some of their competitors are doing the opposite. EMC needs to take a minute to think things through.
I have to leave you now so I can see if I can equate the potential improvements in performance and decrease in administration costs with the additional licensing for HVS.