lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tommaso Teofili <tommaso.teof...@gmail.com>
Subject Re: Require some advice
Date Sat, 21 Aug 2010 06:56:10 GMT
Hi Pavan,
you may want to plug UIMA as a particular UpdateRequestProcessor [1] while
indexing data (I am working on such a use case). This way you could extract
entities and add them either as dynamicFields or pre defined (fixed) fields.

2010/8/12 Michael Griffiths <mgriffiths@am-ind.com>

>
> While there are some decent open source entity extraction tools, they are
> focused on processing sentences and paragraphs. The structural differences
> in text messages means you'd need to do a fair amount of work to get decent
> entity extraction.
>
> That said, you may want to look into simple word/phrase matching if your
> domain is sufficiently small. Use RegEx to extract ZIP, use dictionaries to
> extract city/area, skills, and names. Much simpler and cheaper.
>
>
>
in UIMA you have some components that may be useful (DictionaryAnnotator,
ConceptMapper, Tagger, RegExAnnotator [2] ) for the above cases, however, as
Michael underlined, you have to consider the effort needed to understand,
use and eventually customize such components. UIMA is well suited for large
scale collections of data and let you work on a flexible and customizable
analysis pipeline that may change and be enriched in the future, but you
have to evaluate well if you deserve it.


2010/8/12 Nagelberg, Kallin <KNagelberg@globeandmail.com>

> Try this,
>
> http://viewer.opencalais.com/


the OpenCalais service is wrapped as a UIMA analysis engine and may be
called inside a UIMA pipeline together with other components (see above) or
services (i.e.: the UIMA wrapped Alchemy API service [3] ).
That said, this makes sense only if you are strongly focused on searching
over text and its extracted entities.
My 2 cents,
Tommaso

[1] : http://wiki.apache.org/solr/UpdateRequestProcessor
[2] : http://uima.apache.org/annotators.html
[3] : http://svn.apache.org/viewvc/uima/sandbox/trunk/AlchemyAPIAnnotator/

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message