lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: One big XML file vs. many HTTP requests
Date Fri, 12 May 2006 17:33:19 GMT

On May 12, 2006, at 1:02 PM, Michael Levy wrote:
> One nice feature of INQUERY is that you can create one large SGML  
> file, containing lots of records, each bracketed with <DOC> and </ 
> DOC> tags.  Submitting that big SGML document for indexing goes  
> very fast.
> I believe that Solr indexes one document at a time; each document  
> requires a separate HTTP POST.

Actually adding multiple documents per POST is possible

> How efficient is making a separate HTTP request per-document, when  
> there are millions of documents?  Do people ever use Solr's or  
> Lucene's API directly for indexing large numbers of documents, and  
> if so, what are the considerations pro and con?

Maybe Solr could evolve a facility for doing these types of bulk  
operations without HTTP, but still using Solr's engine somehow via  
API directly.  I guess this gets tricky when you have a live Solr  
system up and juggling write locks though.

But currently going through HTTP is the only way, and likely to not  
be that much of a bottleneck especially given you can post multiple  
documents at a time (the wiki has an example, but I can't get to the  
web at the moment to post the link).

	Erik


Mime
View raw message