lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "William Pierce" <evalsi...@hotmail.com>
Subject Re: Tips on speeding up indexing needed...
Date Sun, 11 Oct 2009 13:18:19 GMT
Thanks, Lance.  I already commit at the end.  I will take a look at the data 
import handler.   Thanks again!

-- Bill

--------------------------------------------------
From: "Lance Norskog" <goksron@gmail.com>
Sent: Saturday, October 10, 2009 7:58 PM
To: <solr-user@lucene.apache.org>
Subject: Re: Tips on speeding up indexing needed...

> A few things off the bat:
> 1) do not commit until the end.
> 2) use the DataImportHandler - it runs inside Solr and reads the
> database. This cuts out the HTTP transfer/XML xlation overheads.
> 3) examine your schema. Some of the text analyzers are quite slow.
>
> Solr tips:
> http://wiki.apache.org/solr/SolrPerformanceFactors
>
> Lucene tips:
> http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
>
> And, what you don't want to hear: for jobs like this, Solr/Lucene is
> disk-bound. The Windows NTFS file system is much slower than what is
> available for Linux or the Mac, and these numbers are for those
> machines.
>
> Good luck!
>
> Lance Norskog
>
>
> On Sat, Oct 10, 2009 at 5:57 PM, William Pierce <evalsinca@hotmail.com> 
> wrote:
>> Oh and one more thing...For historical reasons our apps run using msft
>> technologies, so using SolrJ would be next to impossible at the present
>> time....
>>
>> Thanks in advance for your help!
>>
>> -- Bill
>>
>> --------------------------------------------------
>> From: "William Pierce" <evalsinca@hotmail.com>
>> Sent: Saturday, October 10, 2009 5:47 PM
>> To: <solr-user@lucene.apache.org>
>> Subject: Tips on speeding up indexing needed...
>>
>>> Folks:
>>>
>>> I have a corpus of approx 6 M documents each of approx 4K bytes.
>>> Currently, the way indexing is set up I read documents from a database 
>>> and
>>> issue solr post requests in batches (batches are set up so that the
>>> maxPostSize of tomcat which is set to 2MB is adhered to).  This means 
>>> that
>>> in each batch we write approx 600 or so documents to SOLR.  What I am 
>>> seeing
>>> is that I am able to push about 2500 docs per minute or approx 40 or so 
>>> per
>>> second.
>>>
>>> I saw in Erik's talk on Friday that speeds of 250 docs/sec to 25000
>>> docs/sec have been achieved.  Needless to say I am sure that performance
>>> numbers vary widely and are dependent on the domain, machine 
>>> configurations,
>>> etc.
>>>
>>> I am running on Windows 2003 server, with 4 GB RAM, dual core xeon.
>>>
>>> Any tips on what I can do to speed this up?
>>>
>>> Thanks,
>>>
>>> Bill
>>>
>>
>
>
>
> -- 
> Lance Norskog
> goksron@gmail.com
> 

Mime
View raw message