lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abdullah Shaikh <abdullah.sha...@viithiisys.com>
Subject Re: Using SolrJ with Tika
Date Thu, 03 Sep 2009 12:31:10 GMT
Hi Laurent,

I am not sure if this is what you need, but you can extract the content from
the uploaded document (MS Docs, PDF etc) using TIKA and then send it to SOLR
for indexing.

String CONTENT = extract the content using TIKA (you can use
AutoDetectParser)

and then,

SolrInputDocument doc = new SolrInputDocument();
doc.addField("DOC_CONTENT", CONTENT);

solrServer.add(doc);
soltServer.commit();


On Wed, Sep 2, 2009 at 5:26 PM, Angel Ice <lbil_fr@yahoo.fr> wrote:

> Hi everybody.
>
> I hope it's the right place for questions, if not sorry.
>
> I'm trying to index rich documents (PDF, MS docs etc) in SolR/Lucene.
> I have seen a few examples explaining how to use tika to solve this. But
> most of these examples are using curl to send documents to Solr or an HTML
> POST with an input file.
> But i'd like to do it in full java.
> Is there a way to use Solrj to index the documents with the
> ExtractingRequestHandler of SolR or at least to get the extracted xml back
> (with the extract.only option) ?
>
> Many thanks.
>
> Laurent.
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message