lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Update existing documents when using ExtractingRequestHandler?
Date Thu, 10 Oct 2013 11:21:50 GMT
1 - puts the work on the Solr server though.
2 - This is just a SolrJ program, could be run anywhere. See:
http://searchhub.org/dev/2012/02/14/indexing-with-solrj/ It would give
you the most flexibility to offload the Tika processing to N other
machines.
3 - This could work, but you'd then be indexing every document twice
as well as loading the server with the Tika work. And you'd have to
store all the fields.

Personally I like <2>...

FWIW,
Erick


On Wed, Oct 9, 2013 at 11:50 AM, Jeroen Steggink <jeroen@stegg-inc.com> wrote:
> Hi,
>
> In a content management system I have a document and an attachment. The
> document contains the meta data and the attachment the actual data.
> I would like to combine data of both in one Solr document.
>
> I have thought of several options:
>
> 1. Using ExtractingRequestHandler I would extract the data (extractOnly)
> and combine it with the meta data and send it to Solr.
>      But this might be inefficient and increase the network traffic.
> 2. Seperate Tika installation and use that to extract and send the data
> to Solr.
>      This would stress an already busy web server.
> 3. First upload the file using ExtractingRequestHandler, then use atomic
> updates to add the other fields.
>
> Or is there another way? First add the meta data and later use the
> ExtractingRequestHandler to add the file contents?
>
> Cheers,
> Jeroen
>
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.

Mime
View raw message