Those of you that follow Documentum’s products know that search has been a bug-a-boo the last few years. When 5.3 was rolled-out, there was much promised around faster search.  It is here, but at a price.  Additional hardware is needed and the version of FAST used by Documentum isn’t VMWare safe. To be fair, dedicating a server to search is part of the reason we have better performance, but it hasn’t been the panacea that we wanted.

In 7.0, we are looking at the prospect of Lucene support for the more plug-and-play repositories, while the larger ones will still be able to leverage a larger, multi-node, FAST installation. (Works great! Seriously, I mean it.)  This is fine, but supporting two search engines, neither of which you actually own, is an issue for any vendor.

So what is the solution? Last week I read an article speculating on the prospect of EMC looking for a search company to add to their portfolio. Now the article was pure speculation, but that is what makes it fun.  Let’s see if it makes sense and who could EMC acquire.

The Why

I already covered the need for it to be in their ECM platform, but this isn’t enough. That need has been around since the 90’s and there have been plenty of companies to acquire along the way. There are, however, some newer drivers.

One is Information Management. This is not just content, but structured data. It also needs to be managed as a record and found when needed. Business Intelligence, ECM, and portals are a traditional grouping of three technologies for IM, but search ties it all together. EMC wants to manage your information from cradle to grave, so search is something that would make sense.

A second driver is e-discovery. This is a strong push with the CMA group at EMC this year, with good reason. As dull as it may be, it is critical to users and represents real ROI that can easily justify purchases, even in this economy. I heard a stat that a company can expect to spend $6.5 mil in discovery costs per year for each billion dollars of revenue (sorry, still looking for a published source). EMC made $14.88 billion in revenue last year. Even discounting that figure heavily, e-discovery makes sense. The kicker is that the cost doesn’t include judgements against the company that could have been avoided with proper Information Management/e-discovery.

A third driver is that Oracle, IBM, Microsoft (FAST), and Open Text all have some sort of search solution.  Normally if you see the product vendors, however re-branded and evolved, of Stellent, FileNet, SharePoint, Open Text, and PC Docs in a group, you would expect to see Documentum there. EMC is not on that list and it sticks out.

There is a lot of money in them-there hills. Who wouldn’t want a part of it when it already fits into your business plan. Forget aggregators and federators, like AskOnce, with a search platform, they can search everything that doesn’t already have an active index, or just everything.

The Who

This is the fun part, who might they acquire? I’m going to work off of two lists, non-niche vendors from Gartner’s 2008 Magic Quadrant for Information Access Technology, and those put forward by the article that started this rampant speculation.

  • Oracle, Microsoft, IBM, Open Text: The competition. They won’t sell EMC their search component individually and EMC either can’t (Oracle, MS, and IBM) afford to buy or doesn’t want (Open Text) everything that comes with the acquisition.  Shoot, Open Text might acquire one or two search vendors.
  • Autonomy: Interesting, but unlikely.  Autonomy just bought Interwoven and aren’t likely in the mood to be bought.  They are building a great e-discovery story and if they merge the Interwoven technology well, they may move up to the previous bullet.
  • Google: No. Too expensive and Google is not the best choice for enterprise search at this time.
  • Endeca: The best position in the leaders quadrant and they seem pretty bright. They are based in Boston, so the culture would most likely be a match as well. This is my leading candidate, but that just means that it won’t happen.
  • Recomind: This one gets special attention as they focus on e-discovery. I’m don’t know that if they could work under the Documentum covers, but they have a strength in EMC’s target solution and they are used by some competitors for their e-discovery suite.  That would be a two-for-the-price-of-one benefit. I would file this as an e-discovery acquisition more than a search acquisition though. That means if they bought Recommind, they might still buy someone else.
  • ZyLAB, Vivisimo, & ExaLead: Nothing exciting for these guys. ZyLAB is a Microsoft only platform and ExaLead has limited U.S. presence, though EMC hasn’t discriminatedin the past. I don’t know enough about the technology for these guys, but they are possibilities.

My favorite dark-horse candidate? Lucene. Now, before you start with the whole open-source thing, EMC could take the Lucene code and start building upon it themselves, giving back to the community.  They could keep some new components/extensions for themselves and offer support. If you sign-up for support, you get the extensions. Linux went through this, and Alfresco has a mixed open-source and enterprise model, so don’t knock it. This is my favorite and one that allows them to keep down the current path that they are on right now. They main downside is that they would have to work to build the revenue streams as opposed to buying the streams and expertise.

Time will tell, but it should be interesting. Maybe we should start a pool.  I got 20 on Endeca.

  1. Hey Pie, EMC could do what they like with Lucene exactly becaue it is a project which uses an open source model – it is an Apache Software Foundation project and so uses the Apache 2.0 licence, and I beleive ASF are still discussing whether thier 2.0 version is compatible with the GPL.

    Dont confuse open source licencing with a lack of a commercial product. The open source licence is about freedom as in speech, that does not mean you have to give your product away for free (as in free beer). So yes Alfresco is an open source product period, its not a mixed open source and enterprise model per se, it just so happens there is a freely downloadable version, but when you pay for the ‘enterprise subscription’ your money is paying for the support, updates etc – its a popular commercial model for open source vendors.

    Should they invest in Lucene – fine for putting inside content server, but what about ECIS / Federated Search Server – that could stand some serious investment too !


    • ECIS would still have a place. It reaches out to existing search engines and aggregates results. There will always be a need for that. Having an engine that can go out and search everything without an index of its own, a problem I have faced before, has significant value. Plus, we might have a consistent Index Server.

      Oh and thanks for the open-source notes. I’m not an expert in those models by any means, but it is a viable model.


  2. While I am sure a search technology acquisition is on the EMC agenda, I don’t think they will go for a Lucene related deal. Too much of a cultural difference (remember, Alfresco was born probably because of such strategic differences at the top).
    My bet is that they will not make an acquisition in this area this year. They have enough on their hands with current integration and r&d efforts.


    • Good thoughts. The Lucene thought was pie-in-the-sky stuff. I think that if they do make a buy, they will this year, because the timing will be perfect. They have cash in-flow and the market is soft. There are going to be deals to be made. Plus, the e-discovery driver is one that is very strong now and offers a better ROI for EMC the sooner they act.

      Of course, they may not buy anything.


      • If the cash is there, then probably EMC will indeed make some acquisitions this year. I’m probably too concerned on existing product roadmap and i would like to see the cash going that way.
        I fully agree on eDiscovery, but recently I’ve seen some other storage devices with eDiscovery tools inside (no ECM, just storage) and it really exceeds what EMC (even CMA line) has right now (using internally the same “speedy” search technology). So they should beef it up.


    • Don’t know about that. It would be interesting. They would get MySQL in the deal and expand hardware. EMC has more cash than Sun’s Market Cap, but not by much, and IBM has almost twice the cash of IBM.


      • Nice post. I left a comment specific to Sun-EMC over there. I want to add that I was talking to some people that have used Documentum for years in a search-heavy application and they thought that Endeca would be a nice buy, without any prompting from me. They like the tech and think it would serve Documentum well.


