lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matteo Grolla <matteo.gro...@gmail.com>
Subject Re: Improving indexing performance
Date Tue, 08 Oct 2013 09:29:47 GMT
Thanks Erik,
	I think I have been able to exhaust a resource
	if I split the data in 2 and upload it with 2 clients like benchmark 1.1 it takes 120s here
the bottleneck it my LAN,
	if I use a setting like benchmark 1 probably the bottleneck is the ramBuffer.

	I'm going to buy a Gigabit ethernet cable so I can make a better test.

	OutOfMemory error: it's the solrj client that crashes
		I'm using solr 4.2.1 and corresponding solrj client
		httpsolrserver works fine
		concurrentupdatesolrsever gives me problems, and I didn't understand how to size the queuesize
parameter optimally

	
Il giorno 07/ott/2013, alle ore 14:03, Erick Erickson ha scritto:

> Just skimmed, but the usual reason you can't max out the server
> is that the client can't go fast enough. Very quick experiment:
> comment out the server.add line in your client and run it again,
> does that speed up the client substantially? If not, then the time
> is being spent on the client.
> 
> Or split your csv file into, say, 5 parts and run it from 5 different
> PCs in parallel.
> 
> bq:  I can't rely on auto commit, otherwise I get an OutOfMemory error
> This shouldn't be happening, I'd get to the bottom of this. Perhaps simply
> allocating more memory to the JVM running Solr.
> 
> bq: committing every 100k docs gives worse performance
> It'll be best to specify openSearcher=false for max indexing throughput
> BTW. You should be able to do this quite frequently, 15 seconds seems
> quite reasonable.
> 
> Best,
> Erick
> 
> On Sun, Oct 6, 2013 at 12:19 PM, Matteo Grolla <matteo.grolla@gmail.com> wrote:
>> I'd like to have some suggestion on how to improve the indexing performance on the
following scenario
>> I'm uploading 1M docs to solr,
>> 
>> every docs has
>>        id: sequential number
>>        title:  small string
>>        date: date
>>        body: 1kb of text
>> 
>> Here are my benchmarks (they are all single executions, not averages from multiple
executions):
>> 
>> 1)      using the updaterequesthandler
>>        and streaming docs from a csv file on the same disk of solr
>>        auto commit every 15s with openSearcher=false and commit after last document
>> 
>>        total time: 143035ms
>> 
>> 1.1)    using the updaterequesthandler
>>        and streaming docs from a csv file on the same disk of solr
>>        auto commit every 15s with openSearcher=false and commit after last document
>>        <ramBufferSizeMB>500</ramBufferSizeMB>
>>        <maxBufferedDocs>100000</maxBufferedDocs>
>> 
>>        total time: 134493ms
>> 
>> 1.2)    using the updaterequesthandler
>>        and streaming docs from a csv file on the same disk of solr
>>        auto commit every 15s with openSearcher=false and commit after last document
>>        <mergeFactor>30</mergeFactor>
>> 
>>        total time: 143134ms
>> 
>> 2)      using a solrj client from another pc in the lan (100Mbps)
>>        with httpsolrserver
>>        with javabin format
>>        add documents to the server in batches of 1k docs       ( server.add( <collection>
) )
>>        auto commit every 15s with openSearcher=false and commit after last document
>> 
>>        total time: 139022ms
>> 
>> 3)      using a solrj client from another pc in the lan (100Mbps)
>>        with concurrentupdatesolrserver
>>        with javelin format
>>        add documents to the server in batches of 1k docs       ( server.add( <collection>
) )
>>        server queue size=20k
>>    server threads=4
>>        no auto-commit and commit every 100k docs
>> 
>>        total time: 167301ms
>> 
>> 
>> --On the solr server--
>> cpu averages    25%
>>        at best 100% for 1 core
>> IO      is still far from being saturated
>>        iostat gives a pattern like this (every 5 s)
>> 
>>        time(s)         %util
>>        100                     45,20
>>        105                     1,68
>>        110                     17,44
>>        115                     76,32
>>        120                     2,64
>>        125                     68
>>        130                     1,28
>> 
>> I thought that using concurrentupdatesolrserver I was able to max cpu or IO but I
wasn't.
>> With concurrentupdatesolrserver I can't rely on auto commit, otherwise I get an OutOfMemory
error
>> and I found that committing every 100k docs gives worse performance than auto commit
every 15s (benchmark 3 with httpsolrserver took 193515)
>> 
>> I'd really like to understand why I can't max out the resources on the server hosting
solr (disk above all)
>> And I'd really like to understand what I'm doing wrong with concurrentupdatesolrserver
>> 
>> thanks
>> 


Mime
View raw message