EMC World 2008: Implementing DFS Search Services

My Day Two thoughts will be coming up in a subsequent post. Right now, listening to Pierre-Yves Chevalier give some examples and demos of the Search Service in action. This is Marc Brette’s presentation, he apparently canceled at the last minute, so the Q&A may be a little weak (Pierre-Yves knows this stuff well from what I can tell).

  • Presentation and source-code will be available on the Developer Network
  • D6 DFS has federated search to search against both Documentum repositories and external repositories
  • D6.5 adds:
    • Non-blocking search
    • Clustering of Search Results
    • Saved Searches
    • Classification Service
  • The basic Search Service has execute and getRepository as capabilities (structured as methods in the client library)
    • Can pass structured queries and DQL pass-through
    • Result includes Query Status and a Data Package
    • Results are stateless and cached on the server
  • The code for the first, simple, search is pretty simple. This is the client code making a DFS call.
    • All examples are using the client libraries that Documentum provides and not something built directly from the WSDL. (Would prefer more WSDL interaction, but to be fair, a large chunk of developers in this room will either use the client libraries for developing with DFS or not at all).
    • Called from a servlet in the user interface that is initiated from JSON in the client’s browser (I think I got that right)
    • The servlet formats the response for JSON
  • Federated Search to external repositories requires Enterprise Content Integration (ECI) on the Documentum back-end
  • Nonblocking search is asynchronous and supports multiple calls to get results
    • Each query has an id used as a key in the cache
    • Cache policy is size and time based
    • Each call must contain the full query definition in case the cache is gone and the query can be re-executed
    • Cache is configured in DFS directly
    • Each call will tell you in Query Status if it has completed executing and if there are more results to be returned. Also includes status of the different search sources. (Ah, XML)
  • Structured Queries are abstract and is independent of the Indexer and Content Sever syntax and features
    • Full Text expressions and Property Expressions, combined in an Expression set in a boolean fashion
  • Clustering helps to organize search results
    • Dynamic grouping of results into clusters
    • based on result properties, not content
    • linguistic rules
    • Requires an SBO that comes with Webtop Extended Search, not any other product
    • Supports hierarchical Clustering
    • Can retrieve cluster and sub clusters
    • Retrieve results by cluster
    • Topics can be defined in code as the clustering strategy (Go Ed! Infodata alumni strike again!) Date is a pre-defined strategy
    • Clusters can use multiple strategies
  • Can retrieve a list of the saved queries
  • Classification using Content Intelligence Services (CIS), or a pre-built taxonomy, to return results based upon the taxonomy in the system

Ran late this morning, so I am off in search of food and, wait for it….COFFEE!!! Debating on the next session, but I know I’m not doing an 11:30 as I have some demos to attend. Going to spend a lot of time in the booths this afternoon. Find me on the floor if you like.

Disclaimer

All information in this post was gathered from the presenters and presentation. It does not reflect my opinion unless clearly indicated (Italics in parenthesis). Any errors are most likely from my misunderstanding a statement or imperfectly recording the information. Updates to correct information are reflected in red, but will not be otherwise indicated.

All statements about the future of EMC products and strategy are subject to change due to a large variety of factors.

5 thoughts on “EMC World 2008: Implementing DFS Search Services

  1. Did anyone talk about integrating DFS search into a google appliance? I know that EMC almost never talks about integration unless it is with another EMC product. Would love to know why the Documentum community doesn’t keep them honest?

    Like

  2. Mike Hancock says:

    Hi Pie,

    JSON is alternative to XML. Many javascript developers prefer it for exchanging data/messages across the net because it is much less verbose.

    http://www.json.org/fatfree.html

    My personal preference is contract-based development using WSDL/XML for the backend. And for client code I’m fine with any of WSDL, REST/POX, or REST/JSON.

    Like

  3. Thanks Mike. I am familiar with JSON, but I’m sure not everyone is. I didn’t take time to define it while I was busy typing that day.

    If you were referring to my comment on using WSDL’s instead of the client libraries, that stems from the fact that my teams are usually interacting with Web Services from multiple systems. It is also referring to the code in the servlet, not the client. I prefer my teams to be consistent across the board since they are conversant with the technology.

    -Pie

    Like

Comments are closed.