lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From maheshkumar <maheshkuma...@gmail.com>
Subject Re: Solr UIMA integration
Date Thu, 07 Oct 2010 05:10:07 GMT

Hi Tommaso,

Thanks a lot i am able index the content and extract the entities has
mentioned by you.
I have made the xml content like this

<add>
<doc>
 <field name="reference">Entity.xml</field>
 <field name="text">Senator Dick Durbin (D-IL)  Chicago , March
3,2007.</field>
 <field name="title">Entity Extraction</field>
</doc>
</add> 

and it worked.

For benefit of others the procedure which i followed is:
Step1: 

Get these dependency jars
AlchemyAPIAnnotator.jar
commons-beanutils-1.7.0.jar
commons-digester-2.0.jar
commons-lang-2.4.jar
OpenCalaisAnnotator.jar
slf4j-api-1.5.5.jar
slf4j-jdk14-1.5.5.jar
solr-uima.jar
Tagger.jar
uima-core.jar
WhitespaceTokenizer.jar 


and source of them are
AlchemyAPIAnnotator:
http://svn.apache.org/repos/asf/uima/sandbox/trunk/AlchemyAPIAnnotator
OpenCalaisAnnotator:
http://svn.apache.org/repos/asf/uima/sandbox/trunk/OpenCalaisAnnotator
Tagger: http://svn.apache.org/repos/asf/uima/sandbox/trunk/Tagger
WhitespaceTokenizer:
http://svn.apache.org/repos/asf/uima/sandbox/trunk/WhitespaceTokenizer
solr-uima: http://solr-uima.googlecode.com/svn/trunk/solr-uima


Step 2:
Register in http://www.opencalais.com/apikey &
http://www.alchemyapi.com/api/register.html and get the api keys

Step 3: as mentioned by Tommaso in
http://code.google.com/p/solr-uima/wiki/5MinutesTutorial
modify your schema.xml adding the following fields: 
 <field name="language" type="string" indexed="true" stored="true"
required="false"/>
  <field name="concept" type="string" indexed="true" stored="true"
multiValued="true" required="false"/>
  <field name="keyword" type="string" indexed="true" stored="true"
multiValued="true" required="false"/>
  <field name="suggested_category" type="string" indexed="true"
stored="true" multiValued="false" required="false"/>
  <field name="sentence" type="text" indexed="true" stored="true"
multiValued="true" required="false" />
  <dynamicField name="entity*" type="text" indexed="true" stored="true" />

<field name="text" type="text" indexed="true" stored="true"/>   
<field name="reference" type="string" indexed="true" stored="true"
required="true" />    
<field name="title" type="text" indexed="true" stored="true"
multiValued="false"/>   


modify your solrconfig.xml adding the UIMA config with the following : 
 <uimaConfig>
  <runtimeParameters>
      <keyword_apikey>VALID_ALCHEMYAPI_KEY</keyword_apikey>
      <concept_apikey>VALID_ALCHEMYAPI_KEY</concept_apikey>
      <lang_apikey>VALID_ALCHEMYAPI_KEY</lang_apikey>
      <cat_apikey>VALID_ALCHEMYAPI_KEY</cat_apikey>
      <entities_apikey>VALID_ALCHEMYAPI_KEY</entities_apikey>
      <oc_licenseID>VALID_OPENCALAIS_KEY</oc_licenseID>
  </runtimeParameters>
</uimaConfig>

 <updateRequestProcessorChain name="uima">
    <processor class="org.apache.solr.uima.processor.UIMAProcessorFactory"/>
    <processor class="solr.LogUpdateProcessorFactory" />
    <processor class="solr.RunUpdateProcessorFactory" />
  </updateRequestProcessorChain>

replace your existing default UpdateRequestHandler (<requestHandler
name="/update"...) with the following: 
<requestHandler name="/update" class="solr.XmlUpdateRequestHandler">
    <lst name="defaults">
      <str name="update.processor">uima</str>
    </lst>
  </requestHandler


Step 4:
Increase the tomcat heap size : set JAVA_OPTS=%JAVA_OPTS% -Xmx256m for
windows or  JAVA_OPTS=%JAVA_OPTS% -Xmx256m for linux.

Step 5:
Index using a sample data

File name: 
<add>
<doc>
 <field name="reference">Entity.xml</field>
 <field name="text">Senator Dick Durbin (D-IL)  Chicago , March
3,2007.</field>
 <field name="title">Entity Extraction</field>
</doc>
</add> 

use curl to index curl http://127.0.0.1:8080/solr/update -F
solr.body=@Entity.xml
followed by a http://127.0.0.1:8080/solr/update?stream.body=<commit/>

and you are done.

Tommaso, thanks a lot once again for all your support. Please add any steps
if i have missed one.

Thanks
Mahesh


-- 
View this message in context: http://lucene.472066.n3.nabble.com/Solr-UIMA-integration-tp1528253p1646609.html
Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message