lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: Adding pdf/word file using JSON/XML
Date Mon, 10 Jun 2013 14:14:00 GMT
Sorry, but you are STILL not being clear!

Are you asking if you can pass Solr parameters as XML fields? No.

Are you asking if the file name and path can be indexed as metadata? To some 
degree:

curl "http://localhost:8983/solr/update/extract?literal.id=doc-1\
&commit=true&uprefix=attr_" -F "HelloWorld.docx=@HelloWorld.docx"

Then the stream has a name that is indexed as metadata:

<arr name="attr_meta">
  <str>stream_source_info</str>
  <str>HelloWorld.docx</str>
  <str>stream_content_type</str>
  <str>application/octet-stream</str>
  <str>stream_size</str>
  <str>10096</str>
  <str>stream_name</str>
  <str>HelloWorld.docx</str>
  <str>Content-Type</str>
  <str>application/vnd.openxmlformats-officedocument.wordprocessingml.document</str>
</arr>

and

<arr name="attr_stream_source_info">
  <str>HelloWorld.docx</str>
</arr>

<arr name="attr_stream_name">
  <str>HelloWorld.docx</str>
</arr>

Or, what is it that you are really string to do?

Simply tell us in plain language what problem you are trying to solve.

-- Jack Krupansky

-----Original Message----- 
From: Roland Everaert
Sent: Monday, June 10, 2013 9:23 AM
To: solr-user@lucene.apache.org
Subject: Re: Adding pdf/word file using JSON/XML

Sorry if it was not clear.

What I would like is to know how to construct an XML/JSON request that
provide any necessary information (supposedly the full path on disk) to
solr to retrieve and index a pdf/ms word document.

So, an XML request could look like this:

<add>
<doc>
<field name="id">doc10</field>
<field name="name">BLAH</field>
<field name="path">/path/to/file.pdf</field>
</doc>
</add>


Regards,


Roland.


On Mon, Jun 10, 2013 at 3:12 PM, Gora Mohanty <gora@mimirtech.com> wrote:

> On 10 June 2013 17:47, Roland Everaert <reveatwork@gmail.com> wrote:
> > Hi,
> >
> > Based on the wiki, below is an example of how I am currently adding a 
> > pdf
> > file with an extra field called name:
> > curl "
> >
> http://localhost:8080/solr/update/extract?literal.id=doc10&literal.name=BLAH&defaultField=text
> "
> > --data-binary @/path/to/file.pdf -H "Content-Type: application/pdf"
> >
> > Is it possible to add a file + any extra fields using a JSON or XML
> request.
>
> It is not entirely clear what you are asking. Do you mean
> can one do the same as your example above for a PDF
> file, but with a XML or JSON file? If so, yes. Please see
> the examples in example/exampledocs/ of a Solr source
> tree, and http://wiki.apache.org/solr/ExtractingRequestHandler
>
> Regards,
> Gora
> 


Mime
View raw message