Monday, September 24, 2007

Entity Extraction & Enterprise Search

Earlier in September, Ron Friedmann posted, with permission, some Oz Benamram comments from an ILTA conversation on enterprise search.

Ron opined that the ability to zero in on the best results through finding documents with a particular business or legal context, achievable in Recommind's MindServer Enterprise Search, is derived from entity extraction.

Certainly entity extraction is what Oz' firm uses and is a feature that legal enterprise search products are starting to achieve. The long time leader in the legal space is Real Practice. I am sure that other vendors are also developing entity extraction toolkits if they do not have them already (see, for instance, page 24 of this April 2007 presentation by Thomson West VP / manager of West KM George May).

But I do not believe that context has to be provided solely through entity extraction. In a professional services firm, context can also be provided through bringing in information from other systems about the matter in which the document was generated. For instance, at a law firm, a matter opening a code could be provided that indicates the "matter industry" (shorthand for the underlying business context of the matter, rather than the legal context). This information could also be imported into a matters database at the time a deal is closed, a litigation case is publicly reported, or some other triggering event. An enterprise search engine can take advantage of a "facet" like industry because both the matter and the document have a matter number that links these pieces of information. This way, the better job a firm does in capturing information about its matters, the better and more focused search it can have.

I suspect that Ron is also missing a key point about relevancy. The faceted navigation provided by Recommind and, by now, the other major enterprise search vendors such as Vivisimo*, allows the user to select the facets (such as a particular industry like shoe manufacturing) that are relevant to them, instead of presenting the user with an undifferentiated list or "wall" of results. The engine does not have to predetermine the higher relevancy of a particular author or industry because the user can drill down into what interests her. (Endeca is the firm that actually pioneered what they call "guided navigation", and they have dominated the e-commerce enterprise search market as a result.)

Of course, it isn't just the facets that matter. I understand that MindServe both provides facets for guided navigation and takes advantage of the metadata to pull up the most relevant documents (in addition to the document content). Good relevancy makes the search worthwhile but is not the only criteria since guided navigation also greatly enhances the lawyer's search experience.

No comments: