lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: Adding pdf/word file using JSON/XML
Date Tue, 11 Jun 2013 14:35:40 GMT
"is it possible to index the file + metadata with a JSON/XML request?"

You still aren't being clear as to what you are really trying to achieve 
here. I mean, just write a shell script that does the curl command, or write 
a Java program or application layer that uses SolrJ to talk to Solr and 
accepts JSON?XML/REST requests.

"It seems that the only way to index a file with some metadata is to build a
request that would look like the following example that uses curl."

Curl is just a fancy way to do an HTTP request. You can do the same HTTP 
request from Java code (or Python or whatever.)

"The developer would like to avoid using parameters in the url to pass 
arguments."

Seriously?! What is THAT all about!!  I mean, really, HTTP and URLs and URL 
query parameters are part of the heart of the Internet infrastructure!

If this whole thread is merely that you have an IDIOT who can't cope with 
passing HTTP URL query parameters, all I can say is... Wow!

But use SolrJ and then at least it doesn't LOOK like they are URL Query 
parameters.

Or, maybe this is just a case where the developer WANTS to use SOAP rather 
than a REST style of API.

In any case, please clue us in as to what PROBLEM you are really trying to 
solve. Just use plain English and avoid getting caught up in what the 
solution might be.

The real bottom line is that random application developers should not be 
talking directly to Solr anyway - they should be provided with an 
"application layer" that has a clean, application-oriented REST API and the 
gory details of the Solr API would be hidden inside the application layer.

-- Jack Krupansky

-----Original Message----- 
From: Roland Everaert
Sent: Tuesday, June 11, 2013 8:48 AM
To: solr-user@lucene.apache.org
Subject: Re: Adding pdf/word file using JSON/XML

We are working on an application that allows some users to add files (pdf,
ms word, odt, etc), located on their local hard disk, to our internal
system and allows other users to search for them. So we are considering
Solr for the indexing and search functionalities of the system. Along with
the file content, we want to index some metadata related to the file.

It seems obvious that Solr couldn't import the file from the local disk of
the user, so the system will have to import the file into a directory that
Solr can reach and instruct Solr to index the file with the metadata, but
is it possible to index the file + metadata with a JSON/XML request?

It seems that the only way to index a file with some metadata is to build a
request that would look like the following exemple that uses curl. The
developer would like to avoid using parameters in the url to pass arguments.

curl "
http://localhost:8080/solr/update/extract?literal.id=doc10&literal.name=BLAH&defaultField=text"
--data-binary @/path/to/file.pdf -H "Content-Type: application/pdf"


Additionally, it seems that if a subsequent request is sent to the indexer
to update the file, if the metadata are not passed to Solr with the
request, they are deleted.

Thanks for your help,



Roland.


On Mon, Jun 10, 2013 at 4:14 PM, Jack Krupansky 
<jack@basetechnology.com>wrote:

> Sorry, but you are STILL not being clear!
>
> Are you asking if you can pass Solr parameters as XML fields? No.
>
> Are you asking if the file name and path can be indexed as metadata? To
> some degree:
>
> curl 
> "http://localhost:8983/solr/**update/extract?literal.id=doc-**1\<http://localhost:8983/solr/update/extract?literal.id=doc-1%5C>
> &commit=true&uprefix=attr_" -F "HelloWorld.docx=@HelloWorld.**docx"
>
> Then the stream has a name that is indexed as metadata:
>
> <arr name="attr_meta">
>  <str>stream_source_info</str>
>  <str>HelloWorld.docx</str>
>  <str>stream_content_type</str>
>  <str>application/octet-stream<**/str>
>  <str>stream_size</str>
>  <str>10096</str>
>  <str>stream_name</str>
>  <str>HelloWorld.docx</str>
>  <str>Content-Type</str>
>  <str>application/vnd.**openxmlformats-officedocument.**
> wordprocessingml.document</**str>
> </arr>
>
> and
>
> <arr name="attr_stream_source_info"**>
>  <str>HelloWorld.docx</str>
> </arr>
>
> <arr name="attr_stream_name">
>  <str>HelloWorld.docx</str>
> </arr>
>
> Or, what is it that you are really string to do?
>
> Simply tell us in plain language what problem you are trying to solve.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Roland Everaert
> Sent: Monday, June 10, 2013 9:23 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Adding pdf/word file using JSON/XML
>
>
> Sorry if it was not clear.
>
> What I would like is to know how to construct an XML/JSON request that
> provide any necessary information (supposedly the full path on disk) to
> solr to retrieve and index a pdf/ms word document.
>
> So, an XML request could look like this:
>
> <add>
> <doc>
> <field name="id">doc10</field>
> <field name="name">BLAH</field>
> <field name="path">/path/to/file.pdf<**/field>
> </doc>
> </add>
>
>
> Regards,
>
>
> Roland.
>
>
> On Mon, Jun 10, 2013 at 3:12 PM, Gora Mohanty <gora@mimirtech.com> wrote:
>
>  On 10 June 2013 17:47, Roland Everaert <reveatwork@gmail.com> wrote:
>> > Hi,
>> >
>> > Based on the wiki, below is an example of how I am currently adding a >
>> pdf
>> > file with an extra field called name:
>> > curl "
>> >
>> http://localhost:8080/solr/**update/extract?literal.id=**
>> doc10&literal.name=BLAH&**defaultField=text<http://localhost:8080/solr/update/extract?literal.id=doc10&literal.name=BLAH&defaultField=text>
>> "
>> > --data-binary @/path/to/file.pdf -H "Content-Type: application/pdf"
>> >
>> > Is it possible to add a file + any extra fields using a JSON or XML
>> request.
>>
>> It is not entirely clear what you are asking. Do you mean
>> can one do the same as your example above for a PDF
>> file, but with a XML or JSON file? If so, yes. Please see
>> the examples in example/exampledocs/ of a Solr source
>> tree, and 
>> http://wiki.apache.org/solr/**ExtractingRequestHandler<http://wiki.apache.org/solr/ExtractingRequestHandler>
>>
>> Regards,
>> Gora
>>
>>
> 


Mime
View raw message