lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Angel Ice <>
Subject Re : Using SolrJ with Tika
Date Wed, 02 Sep 2009 14:50:46 GMT
Hi Rajan.

As mentioned in my message, I don't want tu use Curl to post documents and can't use an HTTP
POST (the document has already been posted to my JEE webapp for other purposes). All I can
use is just java.

In fact, I'd like the user to post the document to my webapp with an HTML POST (it's a struts2
webapp).  --This is OK.
Then my webapp uses the document for its own purposes. --This is OK.
And finally the webapp send  the document to solr in order to index it.  --This is not OK.

That's what I am doing with other stuffs that I index where there is no rich document, just
some simple text fields to index, like daily articles.
In this case, once my webapp has finished its job on the article (creating, saving ...), I
index the title/author/... like this :
    SolrInputDocument doc = new SolrInputDocument();
    doc.addField("art_title", "foo");

I'm looking for a way to do the same thing for rich document, once my webapp has finished
its job with the document.



De : rajan chandi <>
À :
Envoyé le : Mercredi, 2 Septembre 2009, 16h13mn 22s
Objet : Re: Using SolrJ with Tika


Check-out Solr 1.4.

You can download the trunk and Build it on your box.

The Solr 1.4 does this out-of-the-box. No configuration required.

You can use HTTP POST to post the document using some Linux utility like
Curl and the PDF/Word/RTF/PPT/XLS etc. will be indexed. We tested this last

Tika has already been included in Solr 1.4.


On Wed, Sep 2, 2009 at 5:26 PM, Angel Ice <> wrote:

> Hi everybody.
> I hope it's the right place for questions, if not sorry.
> I'm trying to index rich documents (PDF, MS docs etc) in SolR/Lucene.
> I have seen a few examples explaining how to use tika to solve this. But
> most of these examples are using curl to send documents to Solr or an HTML
> POST with an input file.
> But i'd like to do it in full java.
> Is there a way to use Solrj to index the documents with the
> ExtractingRequestHandler of SolR or at least to get the extracted xml back
> (with the extract.only option) ?
> Many thanks.
> Laurent.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message