lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul, Terry" <Terry.P...@disney.com>
Subject Streaming Updates Using HttpSolrServer.add(Iterator) In Solr 4.3
Date Mon, 29 Jul 2013 23:43:41 GMT
Hi all.  We're in the midst of upgrading from Solr 1.4 to 4.3.1, and we've run into issues
with memory on our client side during a mass index operation.

We use the approach described on the SolrJ wiki at http://wiki.apache.org/solr/Solrj#Streaming_documents_for_an_update.
In the Solr 1.4 days this worked very smoothly and reliably, consuming a fairly small amount
of memory within our client.  We could send bulk updates in batches of 100,000 or more.

With Solr 4.3 it appears that the HttpSolrServer.add(Iterator) method fetches all of the SolrInputDocuments
included in the transaction before opening the stream to the server.

We're indexing stories for a news site, and some of the documents are 10's of KB.  With Solr
4.3 we need to keep our transaction batch size very small (in the hundreds) to avoid going
OOM.

I've searched the Wiki as well as Google in general, and I don't find any other approaches
in Solr 4.x.  The SolrJ wiki still recommends using the Iterator approach for indexing large
amounts of data, but we're hoping someone has another method that's as efficient as the old
1.4/3.6 approach.  I don't really want to send 10 million documents one at a time.

Thanks!
Terry


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message