lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Hellman <jhell...@innoventsolutions.com>
Subject Re: Update existing documents when using ExtractingRequestHandler?
Date Thu, 10 Oct 2013 15:09:16 GMT
As an endorsement of Erick's like, the primary benefit I see to processing through your own
code is better error-, exception-, and logging-handling which is trivial for you to write.

Consider that your code could reside on any server, either receiving through a PUSH or PULLing
the data from your web server (as suits your needs) and thus offloads the effort from your
busy web server.

In the long run, this will be a more flexible, adaptable solution that meets future needs
with minimal effort.  Further, it typically doesn't require a "Solr expert" to write so you
can find plenty of people to help on this as future needs dictate.


On Oct 10, 2013, at 4:21 AM, Erick Erickson <erickerickson@gmail.com> wrote:

> 1 - puts the work on the Solr server though.
> 2 - This is just a SolrJ program, could be run anywhere. See:
> http://searchhub.org/dev/2012/02/14/indexing-with-solrj/ It would give
> you the most flexibility to offload the Tika processing to N other
> machines.
> 3 - This could work, but you'd then be indexing every document twice
> as well as loading the server with the Tika work. And you'd have to
> store all the fields.
> 
> Personally I like <2>...
> 
> FWIW,
> Erick
> 
> 
> On Wed, Oct 9, 2013 at 11:50 AM, Jeroen Steggink <jeroen@stegg-inc.com> wrote:
>> Hi,
>> 
>> In a content management system I have a document and an attachment. The
>> document contains the meta data and the attachment the actual data.
>> I would like to combine data of both in one Solr document.
>> 
>> I have thought of several options:
>> 
>> 1. Using ExtractingRequestHandler I would extract the data (extractOnly)
>> and combine it with the meta data and send it to Solr.
>>     But this might be inefficient and increase the network traffic.
>> 2. Seperate Tika installation and use that to extract and send the data
>> to Solr.
>>     This would stress an already busy web server.
>> 3. First upload the file using ExtractingRequestHandler, then use atomic
>> updates to add the other fields.
>> 
>> Or is there another way? First add the meta data and later use the
>> ExtractingRequestHandler to add the file contents?
>> 
>> Cheers,
>> Jeroen
>> 
>> --
>> Sent from my Android device with K-9 Mail. Please excuse my brevity.


Mime
View raw message