lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Steinfeld <Sebastian.Steinf...@mgm-tp.com>
Subject AW: Solr indexing slows down
Date Mon, 10 Jun 2013 08:32:15 GMT
Hi Shawn,

thank you for your answer.

I am using Oracle. This is the configuration I am using:
---------
<dataSource 
name="local" 
driver="oracle.jdbc.driver.OracleDriver" 
url="jdbc:oracle:thin:@localhost:1521:XE" 
user="****" 
password="****"
batchSize="20000"
/>
------------

There are 12GB free memory on the server I hope this is enough.
I will test the import with 4GB vm memory.

Do you know if the "autocommit" inside solrconfig.xml configuration works when using the DIH
with the url:
/dataimport?command=full-import&clean=true&commit=true

I read, that "commit=true" will only make one commit in the end of the import and so "autocommit"
won't work.

I am using Solr 4.3

Thank you,
Sebastian


-----Urspr√ľngliche Nachricht-----
Von: Shawn Heisey [mailto:solr@elyograg.org] 
Gesendet: Donnerstag, 6. Juni 2013 19:06
An: solr-user@lucene.apache.org
Betreff: Re: Solr indexing slows down

On 6/6/2013 4:13 AM, Sebastian Steinfeld wrote:
> The amout of documents I want to index is 8 million, the first 1,6 million are indexed
in 2min, but to complete the Import it takes nearly 2 hours.
> The size of the index on the hard drive is 610MB.
> I started the solr server with 2GB memory.
>
> I read that the duration of indexing might be connected to the batch size, so I increased
the batchSize in the dataSource to 10.000, but this didn't make any differences.
> I also tried to disable the autocommit, which is configured in the solrconfig.xml. I
disabled it by uncommenting it, but this also didn't made any differences.

If you are importing from MySQL, you actually want the batchSize to be -1.  This streams the
results so they don't take up large blocks of memory.  Other JDBC drivers have different ways
of configuring this mode of operation.  You fully redacted the driver and URL in your config
file, so I don't know what you are using.

2GB of Java heap for Solr is probably not enough.  It's likely that once your index gets big
enough, Solr is starved for memory and has to perform constant garbage collections to free
up enough for basic operation.  I would bet that you also don't have enough free memory for
the OS to cache the index well:

http://wiki.apache.org/solr/SolrPerformanceProblems

If you are using 4.x with the updateLog turned on, then you want autoCommit enabled with openSearcher
to be false.  This is covered on the wiki page I linked.

Thanks,
Shawn


Mime
View raw message