lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: Solrj ContentStreamUpdateRequest Slow
Date Thu, 19 Aug 2010 05:45:45 GMT
'stream.url' is just a simple parameter. You should be able to just
add it directly.

On Wed, Aug 18, 2010 at 5:35 AM, Tod <listacctc@gmail.com> wrote:
> On 8/16/2010 6:12 PM, Chris Hostetter wrote:
>>
>> : > I think your problem may be that StreamingUpdateSolrServer buffers up
>> : > commands and sends them in batches in a background thread.  if you
>> want to
>> : > send individual updates in real time (and time them) you should just
>> use
>> : > CommonsHttpSolrServer
>> : : My goal is to batch updates.  My content lives somewhere else so I was
>> trying
>> : to find a way to tell Solr where the document lived so it could go out
>> and
>> : stream it into the index for me.  That's where I thought
>> : StreamingUpdateSolrServer would help.
>>
>> If your content lives on a machine which is not your "client" nor your
>> "server" and you want your client to tell your server to go fetch it
>> directly then the "stream.url" param is what you need -- that is unrelated
>> to wether you use StreamingUpdateSolrServer or not.
>
>
> Do you happen to have a code fragment laying around that demonstrates using
> CommonsHttpSolrServer and "stream.url"?  I've tried it in conjunction with
> ContentStreamUpdateRequest and I keep getting an annoying null pointer
> exception.  In the meantime I will check the examples...
>
>
>
>> Thinking about it some more, i suspect the reason you might be seeing a
>> delay when using StreamingUpdateSolrServer is because of this bug...
>>
>>   https://issues.apache.org/jira/browse/SOLR-1990
>>
>> ...if there are no actual documents in your UpdateRequest (because you are
>> using the stream.url param) then the StreamingUpdateSolrServer blocks until
>> all other requests are done, then delegates to the super class (so it never
>> actaully puts your indexing requests in a buffered queue, it just delays and
>> then does them immediately)
>>
>> Not sure of a good way arround this off the top of my head, but i'll note
>> it in SOLR-1990 as another problematic use case that needs dealt with.
>
> Perhaps I can execute an initial update request using a benign file before
> making the "stream.url" call?
>
> Also, to beat a dead horse, this:
> 'http://localhost:8080/solr/update/extract?stream.url=http://remote_server.mydomain.com/test.pdf&stream.contentType=application/pdf&literal.content_id=12342&commit=true'
>
> ... works fine - I just want to do it a LOT and as efficiently as possible.
>  If I have to I can wrap it in a perl script and run a cURL or LWP loop but
> I'd prefer to use SolrJ if I can.
>
> Thanks for all your help.
>
>
> - Tod
>



-- 
Lance Norskog
goksron@gmail.com

Mime
View raw message