lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gora Mohanty <g...@mimirtech.com>
Subject Re: Adding pdf/word file using JSON/XML
Date Mon, 10 Jun 2013 13:56:06 GMT
On 10 June 2013 18:53, Roland Everaert <reveatwork@gmail.com> wrote:
> Sorry if it was not clear.
>
> What I would like is to know how to construct an XML/JSON request that
> provide any necessary information (supposedly the full path on disk) to
> solr to retrieve and index a pdf/ms word document.
>
> So, an XML request could look like this:
>
> <add>
> <doc>
> <field name="id">doc10</field>
> <field name="name">BLAH</field>
> <field name="path">/path/to/file.pdf</field>
> </doc>
> </add>
[...]

You cannot directly do this with the ExtractingRequestHandler.
One possibility is to use the DataImportHandler, with
XPathEntityProcessor or FileListEntityProcessor to get the filename,
and then use TikaEntityProcessor to actually process the file.
Please see http://wiki.apache.org/solr/DataImportHandler and
the various sections within it.

Regards,
Gora

Mime
View raw message