Documentum xPlore, The New “FAST” Search

Taking a break from talking about the economics and future of Content Management to look back at the reality we are facing today.  For all of the need to start moving to the future, we all still have problems that have to be solved today.

imageOne of my most challenging problems in Documentum is full-text search.  While fine for the average system, FAST is a beast.  It requires a large cage, is barely tamed, and takes a lot of work to train for larger tasks.  When you invest in it, life is good, but it is an investment of time and effort.  Like many, a chance to have a scalable, highly-available, reliable, and EASY full-text search as part of the system is something we’ve been missing in Documentum and most Content Management systems.

Well, that is changing.  At the end of October, the new Documentum xPlore search engine is being released.  Known during development as Documentum Search Server, xPlore promises to make life much easier for people.

Before I add any more details, the usual disclaimers apply.  Anything in this post talking about things not released, including dates, is subject to change.  If anything in this post fails to take reality, don’t yell at EMC, yell at me….and I’ll yell at EMC. 😉

Why Kill FAST?

Let’s cover the obvious.  FAST is a resource consumer and is not simple to scale beyond the simple single node.  It will do it, but you need to do a lot of manual planning and work.  It doesn’t work well on NAS (trust me) and is dangerous to run in production in VMWare.  It does work and performs better than its predecessor, but it is in no way ideal.

Then there are the business issues.  Microsoft bought FAST.  Microsoft has ended support for OEM versions of FAST.  EMC negotiated support through the end of 2011 for FAST, but it is clearly NOT even an option for the future.  This has led the FAST engine to be sitting in maintenance since D6.5 allowing effort to be focused on the next engine.

So a replacement is needed.  How about using X-Hive to store the index and Lucene to create it?  One is mature, proven, and owned by EMC.  The other is a solid open source engine that is also proven.  We’ve used it on some of our projects and it works great.

Sounds great.  So let’s look at what we are going to get in the next month.

Coming Soon

It’ll be time to start downloading Documentum xPlore 1.0 in November.  I’ve heard the end of October repeatedly, but a few extra days seems like a realistic timeline for planning.  If you want to use it, you’ll need to be on Documentum 6.5 sp2 or higher.

Most importantly, it is free to those with Content Server (which should be everyone).  This is a replacement for a core feature of every Content Management system since the 90s, so free is the right choice.  Starting with D6.7 (Q1 2011), xPlore will be the only option for search.  The version of xPlore to be packaged with D6.7 will be tagged xPlore 1.1.

Now for feature facts/notes from the call I was tuned to on Friday:

  • Architecture details:
    • Runs on Windows or Linux, not Solaris or other UNIX flavors
      • Solaris not in roadmap
      • Feedback from customers say that VMWare and the Windows/Linux option has been suitable.  If you NEED UNIX, let them know.  They aren’t opposed, they just haven’t heard the absolute demand.
    • Supports all Content Server platforms
    • VMWare support
    • NAS Support (guidance/tips to come for disk configurations)
    • High Availability options
      • Active/Active (was the only option with FAST)
      • Active/Passive with Clusters
      • N+1 Sparing (JOY!!!!)
    • Sizing spreadsheet being refined. More efficient than FAST, so existing hardware can support current installations.
    • Less transient disk space needed.
    • Online backup (hot backup!)
  • Language/Internationalization Support:
    • xPlore 1.0 supports English, German, and Chinese
    • xPlore 1.1 supports French, Italian, Spanish, Japanese, and Korean
    • Works with other languages.  Those languages are not “supported” as grammar nuances have not been tested.
    • Brazilian Portuguese support is planned.
  • All DQL/FTDQL automatically translated into XQuery, so no need to change code
  • Zone search (VQL) is being replaced with XQuery
  • Integrated Documentum Security in search (This is big. No more taking results back into the database to validate against the security model.)
  • Native Facet computation
    • Exposed within CenterStage 1.1
    • Requires 6.6 DFC to utilize
    • It can be exposed in Webtop, but it is not out-of-the-box
  • Query and Ingestion Analytics (tune those queries!)
  • There is an xPlore Administrator interface
  • Federated Search Server 2 is supports xPlore

Some good stuff if it all comes to pass. It has been in controlled release since July.  With Ed Bueche is running the whole thing on top of XDB technology, I’m optimistic.image

Talking ‘Bout my Migration!

Okay, let’s think about the migration issue.  Full-text indexes of large systems aren’t created overnight.  I’m also optimistic on this front.  Let me tell you why.

The migration to FAST wasn’t that bad.  FAST was an issue, but not the migration process.  Migrating to FAST was not that much more work than installing a new system and ingesting content.

This migration may be easier.  Aside from the same EMC engineering team planning this out, there is no change to the Indexing infrastructure within the Content Server.  That means you can technically run both FAST and xPlore at the same time.

16 thoughts on “Documentum xPlore, The New “FAST” Search

  1. pitch says:

    Nice summary and glad to see you are looking forward Documentum xPlore.

    For those lucky folks going to Momentum Lisbon, we will showcase CenterStage 1.1 on top Documentum xPlore. The demo will also combine classification and text analytics for better findability of the content.

    Like

  2. Steve Bickle says:

    Probably not the best place to be posing these questions, but your post has set me thinking. On the current FAST implementation, the index agents watch queued events for document saves to pick up changes for the indexes. To keep the xPlore index up to date with the security context of the documents the agents for xPlore would need to react to changes to ACLs. Is this the case? Also does xPlore search accommodate the Trusted Content Services conditional access controls within the index? Its good to see EMC using their own technology to deliver this kind of service to the Documentum stack.

    Like

    • Those are two pretty specific questions, so I have a very specific punt…..They are planning on setting-up an area on the EMC Developers Network (EDN) between now and the release. That would be the perfect place to ask those questions because Ed Bueche will be watching, and Ed knows the answers.

      Like

    • Aamir says:

      Yes, Documentum xPlore will replicate/update ACLs upon any change within the Documentum Content Server repository. Regarding the question around Trusted Content Services, assuming you are referring to the MACL (Required Group, Required Group Set, Access Restrictions, etc) functionality, then yes, xPlore will honor those as well. The security enforcement will be exactly the same as the Content Server today.

      The only exception to note is Application Permits (also part of MACL), but for these, even the Content Server does not do anything.

      Hope this helps.

      Like

  3. Chris Campbell says:

    I’ve been waiting for this since I heard about it at EMC World 2009. Good to see it in 6.6 rather than 7.0. Pie, you should go back and boldface that part about xPlore being *free*. Excuse me. I… I think I have some dust in my eye.

    Like

  4. xPlore is much better than DSS, I mean in terms of naming conventions, all that DSS, FSS, Extended Search….
    I like the fact that you don’t need special permission to install multiple nodes, though that proved to be quiet beneficial for DM Consulting. More HA options than FAST will definitely please a lot of current customers.
    Best of all I liked the fact that EMC put out a product based on xHive, once people get to see the wonderfulness of this product, it could possibly be a bigger play in the WCM market as well. xhive can definitely be a step in between RDBMS and NOSQL Dbs like mongodb.

    I was fortunate to be on the DSS controlled release prorgam and we got to test DSS and FAST working together, worked like a charm, anytime you can move a customer to a new product with minimal downtime it is a very successful migration. Hopefully all issues with multiple collections gets resolved. Also troubleshooting xplore is a very different game compared to FAST, or maybe thats just because I had become so accustomed to tracking issues in FAST.

    Like

    • I like that fact that I can type xPlore in seconds. The EMC Documentum Content Services for Microsoft SharePoint is so bad that I almost always just call it ‘Bob”.

      Also thanks for the feedback from the Controlled Release. Very happy that it went well.

      Like

    • Can’t say definitively. I would be hard-pressed to not choose active-active. By going to the plus one model, you can maintain performance and protect against downtime. It becomes even more pronounced when you need multiple instances to just handle your base index. In active-passive, you are duplicating everything. Speaking as a survivor of FAST, that can be a pain.

      Like

  5. Sudhir says:

    I am having one problem in XPlore. It is not crawling files which has “-” in their name ! Any suggestion what is issue here?

    Like

Comments are closed.