I was reading a post by Lopataru on his blog. For those that haven’t read his blog, Lopataru is working on his PhD research, focusing on Content Management. He is trying to determine what makes a Content Management system high-performance. I’m not going to analyze his thoughts, but I am going to add some independent thought to the issue.
Supporting Components
There are two components that act in a supporting role that should be taken out of the equation. They are the database and the full-text index engine. Why? Those items shouldn’t be provided by the content management vendor. In most products, they aren’t. Exceptions include SharePoint and Oracle’s Stellent. However, as they are separate products and licensed independently, it doesn’t count.
Of course there is technically the possibility of an exception. A company could create their own database and/or indexing engine for their repository. In that situation, you have to take it into account the database speed. However, I doubt that that they could match those vendors that utilize 3rd party databases like SQL Server and Oracle, except maybe in price/performance equation.
There is a need to eliminate the variable performance that can be introduced by the databases. There are two basic approaches:
- Use the recommended database vendor. This had the benefit of using the database for which the system is best designed.
- Average the results. Using the same database for each one isn’t fair as one may be optimized for Oracle while another may be optimized against SQL Server. I would take the performance measured against each and average it out. In theory, that will give you a comparative measure between different systems.
Index engines are trickier. Most only support one so you have to use them. Tests can’t be designed to minimize the full-text search when judging performance. That is one of the key features. So I think one should try and separate the search capabilities into a separate set of tests to try and isolate the performance. One day all index engines will be plug-and-play and we will be able to truly be able to measure the system.
The User Interface
Here is another problem. Are we measuring the performance of an ECM platform or a suite? If a suite, you need to test the standard User Interface. If platforms, then try backing them onto a custom UI hosted on Liferay or some other common platform.
Some people can design most impressive engines, it is the web interface that they fall apart upon.
So What is Left?
Well, first, load it with content. I’m thinking at least 100,000 different pieces of content. Half stored throughout a layered, deep, hierarchy and half in a small number of folders. Don’t forget to add several pieces of metadata. This will set things up for the tests.
What do you measure? Well, several things:
- Move large folder hierarchies around.
- Mass meta-data updates.
- Process query result sets.
- Anything else define in the system requirements (if a system does it, test it)
When all is said and done, add at least 50% more content (or remove half of it) and repeat. Performance can suffer as a repository grows, so it is important to test it large. When doing this, be sure to size all the systems as recommended for the amount of content. If a system tells you that it needs X and you give it X/2, of course it isn’t going to work well. If you through everything on the same size box, you are beginning to compare efficiency, not just performance.
Remember Our Purpose
Lopartu is designing a new system, so telling him how to compare existing systems doesn’t help him much. Contrary to the above indications, databases and index engines are critical in the design. A poor database design can cripple a system. Comparing two systems running on SQL 2005 will reveal those design flaws.
Performance also isn’t everything. If you can’t protect content, track it, or manage it through its lifecycle, it doesn’t matter how fast it is. That is another consideration. When designing a system for high-performance, it is critical to design it to handle all of the functionality that any target organization would need. In addition, don’t forget to design for all the feature that you don’t know.
That is the real trick. The older ECM platforms are burdened by legacy design concepts that are either dated or that never panned-out. The ease of upgrade from one version to another limits the ability to remove these design characteristics. Alfresco has the benefit of not being limited by its own history. However, if they are successful, they will be burdened by their own design when new ideas and features come forth.
Now, hit me with your best shot. I know you want to do so.
Speaking of ECM I think performance is indicated mainly by:
1. user experience (ui+speed+relevance)
2. reliability
3. scalability
My experience showed me that pure data manipulation performance means a lot of read queries and some small number of updates. Even in transactional CM.
I fully agree that current top ECMs are burdened with legacies and/or mixture of overlapping products.
I’ll love to see the evolution of startups.
LikeLike