lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <>
Subject Re: ExtractingRequestHandler - extracted files caching?
Date Tue, 01 Jul 2014 04:16:12 GMT
Here's an example of what Alexandre is
talking about:

It mixes database fetching in with the
Tika processing, but that should be pretty easy
to pull out.


On Mon, Jun 30, 2014 at 8:21 PM, Alexandre Rafalovitch
<> wrote:
> Under the covers, Tika is used. You can use Tika yourself on the
> client side and cache it's output in the database or text file. Then,
> send that to Solr instead. Puts less load on Solr as well.
> Or you can use atomic update, but then all the primary (not copyField)
> fields must be stored="true".
> Regards,
>    Alex.
> Personal website:
> Current project: - Accelerating your Solr proficiency
> On Tue, Jul 1, 2014 at 5:55 AM, Gili Nachum <> wrote:
>> Hello,
>> I plan to use ExtractingRequestHandler to index binary files text plus app
>> metadata (like literal.downloadCount and others) into a single document.
>> I expect the app metadata to change much more often than the binary file
>> itself. I would hate to have to extract text from the binary file whenever
>> I need to re-index the doc because of a metadata change.
>> Is there a some extraction caching solution for files content? or some
>> other workaround?
>> Thanks!

View raw message