Tips: Contentless Objects as Lookups, Even Less than Content

So the other day I was looking at why one of the lookups in my system was always providing me a blank value. The keyword DISTINCT seemed to keep it down to one, but it didn’t do more than limit the amount of blanks to one. It is a required field, so the user can’t save it with that value, but even with a valid default value, it is something that annoys me. So I began trying to banish it from my sight.

Populating Lookup Values

There are several ways to have lookup values for a piece of metadata in Content Server. One is a typed list. Simple, efficient, and fool-proof. However, the results sit in the data dictionary, making the lookup not very dynamic. The ability to make it conditional is also limited by the capabilities of Docbasic.

The second is to use a registered table. This is quite flexible, but there is no easy way to transfer this from one repository to another without using native database interfaces.

The third is to use a DQL Query against objects in the repository. This option seems to have everything. It is dynamic, so that as the data changes, the lookup values change. It is native to Documentum, which allows users to manipulate the data without knowing DQL or SQL. It can also be transferred quite readily from one Repository to another. However, there is one little problem, NULLS.

For comparisons on the pros and cons of each approach, read the article on the Developer Network and the related discussion.

How Object Data is Stored

Experienced Documentum professionals can skip this part.

Documentum is an object-oriented ECM platform. However, it relies on a relational database to store data, such as Oracle or SQL Server. Each “object” consists of two database tables, one for normal attributes and one for repeating attributes. In addition, for each object, data is stored in the tables for the parent type as well. So if I create a new document, rows are created in dm_sysobject_s, dm_sysobject_r, and dm_document_s. There are no repeating attributes specific to dm_document, so it doesn’t have a repeating attribute table. Actually, dm_document doesn’t have any specific attributes, so the dm_document_s table is just a list of Object Ids (r_object_id).

To allow users to query data, EMC created DQL and an invisible layer to translate any DQL statement into SQL. When I issue a DQL query for all documents, it performs all the database joins for me behind the scenes. Quite useful. This does lead to a lot of joins as your object model grows in depth, but I have heard that for those on SQL Server, the upgrade to SQL Server 2005 (supported as of Content Server 5.3 sp4) is well worth the upgrade from a searching performance perspective.

Queries Against Repeating Attributes

So, back to my dilemma. The original design created a custom object type, foo_lookup, that had one custom attribute that was repeating. The object, when created, would be named after a valid user group in LDAP and the repeating attribute would have the valid values that members of this group could select. Users could be in more than one group and the groups where hierarchical. As each higher group had more options and selected them, the lower groups lost their ability to edit the object’s metadata. However, everyone was seeing a blank in the lookup.

This appears to be an inherent problem with querying against repeating attributes. You go to the boards and you see people talk about it. I worked on how to get rid of it to no avail. Then I realized something. That wasn’t a blank, it was a NULL.

This table illustrates the default values for a standard dm_document in the dm_sysobject_r table assuming no optional metadata is stored. Note that there are two rows by default.

r_object_id i_position authors i_folder_id r_version_label [13 other fields]
08000d0580000121 -2 <NULL> <NULL> CURRENT <NULL>
08000d0580000121 -1 <NULL> 0c000d0580000106 1.0 <NULL>

For every object, there is at least one repeating attribute table, dm_sysobject_r. For my custom object, I had a second repeating table, foo_lookup_r. If I had four values in foo_attribute, then I had four rows in both of those table for that object. The repeating attributes with less than four values had a NULL stored in the value. When Documentum retrieved the values through the DFC, it ignored the NULLs and brought back only the valid values.

If you run a DQL query against a repeating attribute that has less values than the max repeating attribute count, the NULLs are returned. However, why was I having problems when I only had one repeating attribute? Simple. One group, to which everyone belonged, had only one valid value. However, there were two rows in my repeating tables. I finally found the culprit, r_version_label. As shown above, every object has two version labels by default, the numeric label (1.0) and the label CURRENT. That gave me two rows from the start. So any query returning a repeating attribute only one entered value returned a NULL value as well.

This table illustrates the values for a my foo_lookup object the foo_lookup_r table.

r_object_id i_position foo_attribute
08000d0580000121 -2 <NULL>
08000d0580000121 -1 Value1

You can see the NULL value that was the source of my consternation.

The Fix(?)

I cheated. I went to my single-valued lookup and duplicated the single entry. Instead of Value1 being in there once, I had Value1, Value1,replacing the NULL. When I queried with my lookup and used DISTINCT, I didn’t have any problems. A registered table would work more cleanly, and I think I have another way that will work better for my specific circumstance, but duplicating a valid value should work for everyone in this situation. It is simple, easy, and doesn’t require any design changes.

The lesson, nothings perfect, so remember to improvise.

[Edit: Be sure to check the comments for alternate approaches.]

10 thoughts on “Tips: Contentless Objects as Lookups, Even Less than Content

  1. Hi..i like ur blog very much.But this article of yours needed an illustration in a simpler manner.somethings are better explained by screen shots or diagrams.

    Like

  2. Greg Hand says:

    Here is a support note that describes workaround for this issue.
    https://solutions.emc.com/nsepn/webapps/stqv768481dmts46655278/emcsolutionview.asp?id=esg655

    Why do DQL queries sometime return blank duplicate rows?

    This may happen if you select from a repeating attribute (which is stored in the same underlying database table) and use the ORDER BY clause on the repeating column. Therefore, if one repeating attribute does not repeat as many times as another, it will have a “null” value for all rows where it has no value until it reaches the maximum number of rows for all repeating attributes of the object type.

    Note: Even if you specify “DISTINCT” in the query, there may appear to be duplicate rows, containing blank values as part of the repeating attribute column. However, this display is not the result of an error with the query. “DISTINCT” operates on all of the columns you specify within the “select” action; each of those columns in the result set, therefore is unique for the values in each row returned.

    When executing a DQL query, sorting on a repeating value (e.g. “r_folder_path”) may cause “duplicates” of the objects to be returned such as those returned by the sample query below:

    select object_name, r_folder_path from dm_folder ORDER BY r_folder_path or

    select object_name, r_version_label from dm_document ORDER BY object_name

    Returns “duplicate” rows of results in WorkSpace. The first half of the return values have the path field filled in. The second half of the returned rows do not. However, as explained above, the second half of the return can also be considered unique rows if the values in their corresponding columns are compared (e.g. a blank value is not the same as an actual value).

    This is a side effect of the way DQL/SQL joins or maps the single and repeating-value tables when you employ the “ORDER BY” clause on the repeating field. The way DQL was intended is that every object in the Docbase has both an explicit “r_version_label” (e.g.: ‘draft’, ‘approved’, ‘new’) and an implicit “r_version_label” (e.g.: 1.0, 2.0, etc.) When the single and repeating tables are joined in a one-to-many relationship, at least two rows will result for each object.

    To avoid this side effect, include in your “select” statement the “r_object_id” attribute and “order by r_object_id” clause. This pairing will force the master-detail relationship to be invoked since this attribute is common to both database tables and will result in only one row per object being returned. Notice that “r_object_id” in the example code below does not need to be first in either the “select” statement or the “order by” clause, but only be present somewhere in both elements:

    select object_name, r_object_id, r_version_label from dm_document where owner_name = ‘charless’ order by object_name, r_object_id

    This returns the object names, all sorted, with one object per row (no duplicates) and the “r_version” label information all concatenated onto one line. Instead of a row with “1.0” and another row with “CURRENT”, we now have “1.0,CURRENT” This is a feature of WorkSpace, which intelligently combines the results of the result set to remove duplicates if “r_object_id” is specified as an “order by” object.

    Note: This only works if a single-valued attribute or “r_object_id” is first in the “order by” clause. If the repeating value is first, then all the values, including duplicates or blanks, are displayed in the result set.

    Like

  3. Thanks Greg. I remember reading this a few months ago. Unfortunately, this didn’t work for my situation (Value Assistance), but it does help those working in code.

    Like

  4. Well, there’s a couple of different issues people are having here.

    First, how do you avoid getting the NULL values for the repeating attributes when doing a SELECT DISTINCT? It might be too simple, but you can just add the criteria ‘foo_attribute is not NULL / NULLSTRING / NULLDATE’ to the WHERE clause. However, you probably have to use the hint enable(row_based) to make Documentum treat the repeating attributes correctly. Anyone argue with this approach?

    Secondly, how do you use contentless objects as lookups, avoiding the problems described above? Also a simple answer, imagine that each custom object is a row in a regular relational database, and add custom attributes for all the columns you want to use for lookups and add custom attributes for the selection criteria.

    Take the example from this article:

    Create a custom object type called eg foo_lookup. This could either be a type without a supertype, or be a direct subtype of dm_sysobject. This custom object type would have one attribute for specifying which group it refers to (eg foo_group) and one for the dropdown values for that group (eg foo_value). It’s then very easy to create a query that always works, and you don’t have to add a criteria to not show empty values (NULLs).

    The query would look like:
    SELECT DISTINCT foo_value FROM foo_lookup WHERE foo_group=’group_name’ ORDER BY foo_value

    Another useful feature you can add: you can create a custom lifecycle with FUTURE, CURRENT and OBSOLETE states, and hereby control which values will show up in the dropdowns, just by promoting or demoting the objects that meet certain criteria. When you create the custom object type, you define this lifecycle as the default lifecycle for the object type, thereby, each time you create an object of that object type, it’ll get the lifecycle attached and the lifecycle will be in the base state (0=FUTURE).

    The query would then look like:

    SELECT DISTINCT foo_value FROM foo_lookup WHERE foo_group=’group_name’ AND r_current_state = 1 ORDER BY foo_value

    This would select all dropdown values for a specific group that are in the CURRENT lifecycle state.

    You can then easily keep a complete table of values that were once valid, current valid ones and ones that will be implemented in the future.

    To make a certain subset of values (eg fulfilling $CRITERIA$) CURRENT that are currently FUTURE, you would use a query like:

    UPDATE foo_value OBJECTS SET r_current_state = 1 WHERE $CRITERIA$ AND r_current_state = 0

    To make a subset of values OBSOLETE that are currently CURRENT, the query would be:

    UPDATE foo_value OBJECTS SET r_current_state = 2 WHERE $CRITERIA$ AND r_current_state = 1

    We have currently implemented the first point in our 3 production environments, even using the custom object types for cascading (linked) dropdown values. and it works perfectly. We will probably implement the lifecycles shortly.

    Does anyone have any comments on these approaches?

    /Per

    Like

  5. Per, meant to add this earlier. I have found that your approach of adding “NULL / NULLSTRING / NULLDATE’ to the WHERE clause” does not solve the problem. By having one value, the attribute is not NULL and you still get the NULL/blank. Obviously if you have a DQL that works in this, I will gladly try it out.

    Thanks for the details on implementing the lookup. I’m sure it will help others.

    Like

  6. I’ve used all three methods with lots of variations based on differing requirements and resources. A fourth approach (touched on by a commenter and used inside Documentum for things like inbox items–a list of sorts) uses objects themselves as items in the list, and these items get related to some aggregator object via dm_relation, virtual document, folder containment, query on a custom attribute, etc. Relations and virtual documents understand order and binding which can be useful. dm_relation objects also have type (the real innovation over dmr_containment) which can be useful too.

    I suppose you could also get cute with an XML application if your users are hooked up with XML editing software and are responsible for maintaining the data dictionaries themselves. A smart XML Schema could do all kinds of domain control and validation. Fun stuff; I wish more people used it.

    List items as dm_sysobjects can be a performance hit, but it’s worth considering if unsophisticated users need to manage the lists and if the items are especially interesting–requiring metadata, versioning, security, lifecycles, or are already objects in the docbase maybe due to ldap syncing.

    Here are some general ideas about custom objects as a data dictionaries:

    * Are you creating a custom type per drop-down? Hard to say from above, but I’d recommend against it. Create a type expressing the data dictionary as a concept, apply a naming or id scheme, and let the instances be specific to each list. That will make reuse and expansion easier. (I prefer a similar approach when I use registered tables with group, key, value, description, status as the most common columns. You can index the hell out of it or even cache the table in some RDBMSes. Nest them by having one group’s value be the next group’s key. Think perl hashs in table form.)

    * If ordering the query is causing the problem, can you drop the “order by” in the query and let the application do it? In the Web 2.0 world, pulling down a data island and slicing/dicing it client-side is all the rage.

    * Why extract the values with a query if they’re all in a single object? I use queries to harvest attributes when I know I’m hitting more than one object to avoid per-object fetch overhead. If my target is a single object that I may reuse, it’s easier to query for its r_object_id and fetch it as an object. Let the DMCL or DFC cache it for you–assuming you have something like a singleton or a persistent process containing the DMCL. If you don’t, consider it. Performance on subsequent requests in the DMCL will make up for the initial cost of the first fetch. Even easier in DFC with getObjectByQualification() or whatever it’s called. (Eclipse makes me so lazy.)

    * Consider going whole hog on objects as data dictionaries. Build a client-side singleton that manages cached versions of your custom data dictionary objects and that runs a quick query to find any data dictionary objects that have changed since the last update request. Feed it lots of settings about preloading certain dictionaries, garbage collection, change detection frequency, and all that fun stuff in an ini or properties file. Give the data dictionary type some of those attributes so you can override them on a per-object basis. Use symbolic version labels with lifecycles to add greater flexibility in testing changed data dictionary objects in prod. A data dictionary based on dm_document has every single feature available in Documentum, so think big (since the object itself is going to be honking huge). The drop-down/data dictionary thing is such a common, well-known problem/solution you should get plenty of reuse from such a beastie.

    * They don’t have to be contentless objects actually. You could make the content a document telling people about this data dictionary object. Flip things on their heads by making the attributes the content and the files the metadata. If you have users going into the objects and editing them, put the documentation right there so there’s no excuse for them doing the wrong thing when they edit them.

    * Given that repeating attributes are ordered, consider dropping sort altogether. Let the maintainer decide the order by how they actually arrange things in the attribute screen. Depending on your client, you could still let users reorder them with some kind of GUI feature.

    A few final comments:

    Non-current versions with no symbolic version labels will have only one r_version_label. Not sure if you care about non-current objects, but you need to be really careful when circumventing the DMCL and looking directly at the underlying tables. Frankly, avoid it unless there’s no better way to do something.

    Finally, don’t look down on contentless objects. There are times I find Documentum more useful as a scaleable object-based database than a document management system. Attributes assume the role of metadata in mundane document management, but they can be the real content when you start getting into interesting server-side or agent-based applications.

    Like

  7. I had the same problem querying repeating attributes for value assitance. my fix was:
    – register the repeating view _rv as a readonly public registered table and query the registered table with not null and distinct.

    Like

  8. Harry says:

    I am using queries like below to add value assistance from lookup objects and a repating attribute. These work for me perfectly.
    —————————————————————–
    select repeating_attribute_name from lookup_object_type where object_name=’lookup_object_name’ and repeating_attribute_name is not null order by repeating_attribute_name enable(ROW_BASED)

    Like

Comments are closed.