lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From arnaud gaudinat <arnaud.gaudi...@gmail.com>
Subject boilerpipe solr tika howto please
Date Fri, 14 Jan 2011 11:57:17 GMT
Hello,

I would like to use BoilerPipe (a very good program which cleans the 
html content from surplus "clutter").
I saw that BoilerPipe is inside Tika 0.8 and so should be accessible 
from solr, am I right?

How I can Activate BoilerPipe in Solr? Do I need to change 
solrconfig.xml ( with 
org.apache.solr.handler.extraction.ExtractingRequestHandler)?

Or do I need to modify some code inside Solr?

I so something like TikaCLI -F in the tika forum 
(http://www.lucidimagination.com/search/document/242ce3a17f30f466/boilerpipe_integration)

is it the right way?

Thanks in advance,

Arno.


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message