lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tod <listac...@gmail.com>
Subject Re: Solrj ContentStreamUpdateRequest Slow
Date Wed, 18 Aug 2010 12:35:49 GMT
On 8/16/2010 6:12 PM, Chris Hostetter wrote:
> : > I think your problem may be that StreamingUpdateSolrServer buffers up
> : > commands and sends them in batches in a background thread.  if you want to
> : > send individual updates in real time (and time them) you should just use
> : > CommonsHttpSolrServer
> : 
> : My goal is to batch updates.  My content lives somewhere else so I was trying
> : to find a way to tell Solr where the document lived so it could go out and
> : stream it into the index for me.  That's where I thought
> : StreamingUpdateSolrServer would help.
> 
> If your content lives on a machine which is not your "client" nor your 
> "server" and you want your client to tell your server to go fetch it 
> directly then the "stream.url" param is what you need -- that is unrelated 
> to wether you use StreamingUpdateSolrServer or not.


Do you happen to have a code fragment laying around that demonstrates 
using CommonsHttpSolrServer and "stream.url"?  I've tried it in 
conjunction with ContentStreamUpdateRequest and I keep getting an 
annoying null pointer exception.  In the meantime I will check the 
examples...



> Thinking about it some more, i suspect the reason you might be seeing a 
> delay when using StreamingUpdateSolrServer is because of this bug...
> 
>    https://issues.apache.org/jira/browse/SOLR-1990
> 
> ...if there are no actual documents in your UpdateRequest (because you are 
> using the stream.url param) then the StreamingUpdateSolrServer blocks 
> until all other requests are done, then delegates to the super class (so 
> it never actaully puts your indexing requests in a buffered queue, it just 
> delays and then does them immediately)
> 
> Not sure of a good way arround this off the top of my head, but i'll note 
> it in SOLR-1990 as another problematic use case that needs dealt with.

Perhaps I can execute an initial update request using a benign file 
before making the "stream.url" call?

Also, to beat a dead horse, this:
'http://localhost:8080/solr/update/extract?stream.url=http://remote_server.mydomain.com/test.pdf&stream.contentType=application/pdf&literal.content_id=12342&commit=true'

... works fine - I just want to do it a LOT and as efficiently as 
possible.  If I have to I can wrap it in a perl script and run a cURL or 
LWP loop but I'd prefer to use SolrJ if I can.

Thanks for all your help.


- Tod

Mime
View raw message