The Documentum Audit Trail. It is one of the un-tamed beasts in many Documentum deployments. The requirements for auditing often come into conflict with the reality of the out-of-the-box functionality. To makes things useful for the average user, organizations tend to set earlier than desired audit purges and/or severely restrict what they audit which can lead to important actions being overlooked.
On the other hand, keeping everything makes finding what you want an expensive proposition.
Attack of the Audit Trail
The audit trail is very useful. It keeps just about everything you need in a few tables, with most of of the information residing in the dm_audittrail object. The problem is that as your audited object count grows into the millions, your audit trail grows into the tens of millions. A 10:1 ratio of rows to audited objects is not uncommon.
The problem is many layered. The larger the table, the longer it takes to add a new audit record. This is a slow growing problem. The faster growing problem is the delay in getting information out of the audit trail. To facilitate this, you can add an additional index or two to the underlying dbo.dm_audittrail_s table. I often have indexes on the following columns:
- r_object_id: The primary key that uniquely identifies every audit record. This is created automatically.
- audited_obj_id: This is the object id of the item that has generated the particular audit record. This is created automatically.
- id_2: This is an object id that is dependent on the event in question. This is created automatically.
- user_name or user_id: This is the user name and user object id, respectively, of the user that performed the audited action. Manually create this on the columns, picking the one that I am most likely to be utilizing when retrieving records. For simple queries, user_name is better, but for more intense processing, user_id works best.
- time_stamp or time_stamp_utc: The time stamp of the action in question in server local time and UTC time respectively. This is also manually created. This is important when checking for time periods which dramatically improves the performance.
Of course, adding an index increases both the database storage and degrades insert performance as the index has to be updated with each insert. Meanwhile, the users want to know why the application hands whenever they click on the History tab, regardless of the indexes created.
There reaches a point when either the original auditing requirements need to be compromised as mentioned above or a custom solution needs to be implemented.
Enter the Data Shack
The most common approach to solving this problem is to depend on backups of the tables to retrieve old information. This allows the Audit Purge job to run and remove items that are older than a reasonable time frame, such as 90 days. This works as long as the backups are solid and trustworthy and there is not an immediate need for information.
When that doesn’t work, try a Data Shack. That is a term that my teams use, informally, to describe where we put the data. The term comes from the fact that the new system holding the audit information isn’t big enough to be a Data Warehouse.
Our first thought was to create a job that was almost a mirror image of the Audit Purge job. It took the same parameters and would move the rows from the Documentum audit trail into a mirror set of tables residing on a separate system. Flaws appeared in this approach as the overhead we were trying to get rid of hampered the performance of the job.
The more aggressive we made the Audit Purge job in deleting records, the faster the new job ran, at the sacrifice of the usefulness of the information in the live system. The History tab is great but the more aggressive the purges, the less utility it, and other built-in audit retrieval methods in Documentum, provided.
Another problem was that the data, while technically complete, wasn’t enough.
We then decided to use stored procedures to move the data. This was dramatically faster. With the new speed, we could bring in the user tables and some basic object information to provide a broader picture to the user.
Finally, because we were feeling cocky, we add logs from the web service and user layers to reveal more detail about the overall use of the system and not just the ECM component.
Data Shack Implementation
Want to do a Data Shack of your own? It is a little more complicated than first glance. Some things to consider:
- Data changes, so plan to take into account updates and deletes to non-audit trail/logging information.
- Use the object ids. They are incremental, chronological, and the primary key. They work faster than dates.
- When pulling information from the live system, bring it all into the Data Shack and then sort/filter the data.
- Use streamlined tables for reporting purposes. If a full table is slow in the live system, it won’t be much better in the Data Shack. Move the desired data into one or more tables to make querying easier. These tables should only have the columns that are needed and be filtered to only the desired rows. This could be by object type.
There is a lot of work under all of that, but those thoughts will point you in the right direction.
Remember that all of the important data should stay in the unmodified copy of the audit trail tables. This will allow you to expand and refine your data model for reporting over time. Everything doesn’t have to be done at once. Just getting most of it out of the live system will go a long way to making things faster.