EMC World 2011: Documentum Performance and Scalability, Part 1


This is Ed Bueche’s annual performance session. This is a standard must attend of every Momentum. Coffee in-hand I am ready to listen to Ed impart his wisdom upon us.

  • Documentum xPlore is the next generation search engine, requires D6.5sp2 [For more information, see my notes from the xPlore kickoff]
  • New versions of xPlore will be compatible back to the 6.5sp2 version.
  • xPlore sizing inputs:
    • Indexing rate and document complexity
    • Total index size
    • Query complexity
    • Query rate
  • Basic xPlore sizing technique: Size based upon existing FAST implementation, xPlore uses less resources, may oversize the system, does not leverage cost-saving features
  • xPlore Sizing tool [ooo, pretty] available on Powerlink, separate from standard Documentum sizing spreadsheet
    • Document count and avg document size is on the State of the Docbase report
    • Avg metadata size is in the DB Warning report
    • Avg Percent Indexable is tricky
  • If size of index is widely different between xPlore and FAST, then re-jigger Avg % Indexable values
  • For Index Size, budget for 2x the size of the “final” index.
  • Can free-up temporary working space by freezing old index collections
  • Pay attention to sizing for the back-file migration vs. the day-forward rate.
  • Index-cache ratio vs I/O load
  • Tool trades off memory vs disk I/O, index cache ratio is the major lever
  • Need to provide the percentage of the index that you want in memory
  • Memory allocation: operating system Buffer cache, CPS memory, xPlore Instance (4 memory groupings)
    • xDB Buffer cache caches query results
    • Lucene cache used to process queries
  • Want to learn more about xPlore memory tuning, check Ed’s memory tuning blog post.
  • Virtualization of xPlore tips:
    • Don’t assume one size fits all. Size correctly, do not accept IT standard VM environment. VM Sizing tools online. Test pre-production environments to validate allocations.
    • Look for consolidation contention. Make sure that resources aren’t taken by other resources. Make sure that getting the resources needed through VM capacity. SANS track % I/O utilization vs number of I’/Os. Track denied CPU click-ups.
    • Ensure availability of resources. Inactivity may cause resources to be displaced. For xPlore, run cache warm-up utility every couple of hours to maintain resources.
  • xPlore, unlike FAST, supports virtual IPs
  • Tuning xPlore Document Ingest
    • Increase the Final merge Time, default time is a little low in 1.0 & 1.1. If merge happens too often, query response time could grow. Set in xdb.properties. Can also set finalMergingBlackout to blockout merge times.
    • Multiple collections with temporary Round Robin Ingest. Used to increase ingestion rate upon backfile. Stop round-robin once on day-forward.
    • Disable Facet compression for unique values. Normally compressed into memory, but unique values make this limiting. If you have more that 250k values, it’ll start to consume too much memory.
    • Use Index Filters…cautiously. xPlore expands ability to not index certain documents. For example, folder-descend() queries won’t work if dm_folder is excluded
    • 64-bit xPlore
    • Multiple instances and sub-collection adoption. Content processing is intensive. Allows you to index in parallel and then combine collections onto one host once done. xPlore 1.1.
  • Query tuning tips
    • xPlore admin reports are your friends. Allows you to diagnose performance.
    • Avoid consuming large (unbounded) result sets. Don’t suck in all the results and then divide by facets. Use Search Service and xPlore Service (starting in 6.6), also use “RETURN_TOP N” hint in DQL. Can detect unbound results in xPlore admin reports
    • Leverage the xPlore query warm-up utility. Can use own sample queries to warm-up environment.
    • Pre-test using xPlore query driver utility. Will mine index to create sample queries to start hitting the index to test performance.
    • Tune wildcard query usage. Can cache more of the Lucene index using the warmup utility. Ensure that fast_wildcard_compatible is set to false which will lower search recall but improve response. Tune metadata LIKE %% searches to be optimized for leading wildcard searches, set “leading-wildcard=true”

As for my “next” I am not going anywhere as Part 2 is next.

Disclaimer

All information in this post was gathered from the presenters and presentation. It does not reflect my opinion unless clearly indicated (Italics in parenthesis). Any errors are most likely from my misunderstanding a statement or imperfectly recording the information. Updates to correct information are reflected in red, but will not be otherwise indicated.