I recently ran into a situation that challenged one of my basic beliefs in the setup of the Documentum repository, object ids may not be sequential!!!
What you say? Impossible you say? Yet it happened.
What I encountered isn’t a widespread phenomenon, but it could happen to you.
What Happened
I designed a reporting system, our little Data Shack, that was culling the audit trail, and some relevant object information, on a nightly basis into a second database for reporting. There are a few hundred million audit trail records, so querying that live was a challenge. If a user clicked on “History” in DA or Webtop, you could go get lunch, in another state. The goal was to put it into its own database designed for efficient reporting. If a bad query was issued, nobody suffered.
To streamline the transfer process, the system used the r_object_id field as a key field for querying records for retrieval. It is the Primary key on every table and no other approach would be as fast. I checked. The concept was simple, grab the maximum r_object_id already archived and then grab every row with a bigger r_object_id and move it over.
What could be simpler? Nothing.
Well, after a while, the users were asking for a query regularly and it was decided to craft a Data Shack version of the query and add it to the standard reporting in the Data Shack. The initial results were different from the live data results. After some digging, it was determined that a very small percentage of audit trail records weren’t making it over every night. After some more research, it was determined that if you sorted by the time_stamp field, you got a different order of objects.
This was not good.
Three Times the Fun
What appeared to be happening is quite simple. We have three Content Servers supporting the same repository in Production. Apparently, each server would grab a range of potential IDs and dish them out as needed. This is an accepted Design Pattern, that I can’t find a reference to right now, which prevents duplicate IDs from being created or different processes from blocking each other. Grabbing blocks of ids is an important aspect when it is expected that a lot of them will be needed. It is a solid approach.
The only problem is that I wasn’t aware of this little feature when the Data Shack was designed. So every night, a few random rows were missed.
Now it is possible that different applications, like the Indexer and our Web Interface, which happen to reside on different machines but use the same Content Server, grabbed the IDs and not the Content Servers. I suspect that is not the case, but I can’t rule it out without some mind-numbing digging (or some luck).
Regardless of the culprit, the solution was fixed. Everything now works on the time_stamp field in the Audit Trail and not r_object_id. The Data Shack is up and humming again and all the numbers match.
So the tip is to depend on the dates for determining the order of events, not the object ids. They may be tempting to use as they are the primary keys on every table, but they are not the arbiter of succession.
Hi,
I know that ObjectId’s are not necessarily a hexadecimal representation of a number. The Object ID can contain any ASCII printable character (I learned that in the first Documentum training some long time ago… i don’t expect it changed).
Since I never seen that in real life… Did you?
Anyway, this might also pose problems sometimes when assuming ObjectID’s are something else but just a string of characters (eg. should not create filenames with objectid’s stored exactly inside… might get a illegal character there).
LikeLike
I must say, I’ve seen a lot of object ids, but they were all hexadecimal-ish. Not saying that it couldn’t/doesn’t happen, I just haven’t seen it. I wouldn’t want to use them in the file-name anyway, just ugly looking.
LikeLike
When I first saw this issue, I was also confused. You can try it by:
select max(r_object_id) from dm_audittrail
and
select r_object_id from dm_audittrail order by r_object_id desc enable(return_top 1)
are typically not returning the same. This is because the ordering logic: by default if you sort values descending then on the last character you can see that values are going from 9,8,7,…2,1,0,F,E,D,C,B,A and not as you would expect: F,E,D,C,B,A,9,8,…2,1
If I remember well you can control the sort order by NLS_SORT environment variable (at least with Oracle), but that would be too radical… So the select max(r_object_id) works fine for this situation.
Mike
LikeLike
It isn’t a database sort issue. There were large ranges (100+ items) of numbers that were sequential in their subset, but not in the overall scheme of things. Plus, when every comparison is using SQL Server sort orders, then the vast gaps are not a real issue.
LikeLike
We have seen this on old docbases that we migrated.
When we tried to reconstruct the versions based on the (highest)r_object_id.
LikeLike
So just to be clear and to check whether I understood well: there were bigger gaps in the r_object_id’s and this was caused by multiple content servers?
We are running into a similar situation: the History tab on the properties page is too sloooow. Pie, why did you decided to migrate the audit trail into another database? The history tab’s select statement cannot be optimized by an index? What did you do then? Hide the history from properties and created another application for “data mining”?
Thanks for your feedback…
LikeLike
We archived all but the last 90 days. Our table has a few hundred million rows, and the linking that takes place just takes time. We were able to get it down for most of our queries to a smaller subset by moving to another database, PLUS, we didn’t worry about security at that point because if you had access to the 2nd database, you were authorized to see the information.
Oh, and yes, the gap was larger that a handful. Once we started grabbing by dates, while slower, we had everything we needed.
As to why…this goes back to the Data Shack post that I linked to earlier. Things got slow. Someone clicking on the History tab could cause the whole system to slow to a crawl as the database made all the links and processed the conditionals. Taking away the tab didn’t help as the information was still needed. The only solution was to move it out to a new database. Indexes helped, but they still hurt performance.
LikeLike