lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ian Connor" <ian.con...@gmail.com>
Subject Re: fastest way to load documents
Date Fri, 01 Aug 2008 21:08:00 GMT
I am on fedora and just running with jetty (I guess that means it will
not just use as much RAM as I have and I need to specify it when I
load java).

So, if I have 8GB RAM are you suggesting that I set the -Xmx 5000M or
something large and then set merge to:

<mergeFactor>10000</mergeFactor>

should I also increase any of these?


    <maxBufferedDocs>10000</maxBufferedDocs>
    <maxMergeDocs>2147483647</maxMergeDocs>
    <maxFieldLength>10000</maxFieldLength>
    <writeLockTimeout>1000</writeLockTimeout>
    <commitLockTimeout>10000</commitLockTimeout>

and play with this to optimize?

3000/s is my theoretical maximum. I cannot cat/grep and pass the docs
any faster than that to curl. 100/s seems to be how fast solr can
index at - I just want to know what to tweak to see if this can be
increased.

On Fri, Aug 1, 2008 at 4:37 PM, Otis Gospodnetic
<otis_gospodnetic@yahoo.com> wrote:
> Configure Solr to use as much RAM as you can afford and not merge too often via mergeFactor.
> It's not clear (to me) from your explanation when you see 3000 docs/second and when only
100 docs/second.
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
>> From: Ian Connor <ian.connor@gmail.com>
>> To: solr-user@lucene.apache.org
>> Sent: Friday, August 1, 2008 3:36:13 PM
>> Subject: fastest way to load documents
>>
>> I have a number of documents in files
>>
>> 1.xml
>> 2.xml
>> ...
>> 17M.xml
>>
>> I have been using cat to join them all together:
>>
>> cat 1.xml 2.xml ... 1000.xml  | grep -v '<\/add>' > /tmp/post.xml
>>
>> and posting them with curl:
>>
>> curl -d @/tmp/post.xml 'http://localhost:8983/solr/update' -H
>> 'Content-Type: text/xml'
>>
>> Is there a faster way to load up these documents into a number of solr
>> shards? I seem to be able to cover 3000/second just catting them
>> together (2500 at a time is the sweet spot for me) - but this slows
>> down to under 100/s once I try to do the post with curl.
>>
>> --
>> Regards,
>>
>> Ian Connor
>
>



-- 
Regards,

Ian Connor
82 Fellsway W #2
Somerville, MA 02145
Direct Line: +1 (978) 6333372
Call Center Phone: +1 (714) 239 3875 (24 hrs)
Mobile Phone: +1 (312) 218 3209
Fax: +1(770) 818 5697
Suisse Phone: +41 (0) 22 548 1664
Skype: ian.connor

Mime
View raw message