lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anshum Gupta <ans...@anshumgupta.net>
Subject Re: Fastest way to import big amount of documents in SolrCloud
Date Thu, 01 May 2014 20:57:51 GMT
Hi Costi,

I'd recommend SolrJ, parallelize the inserts. Also, it helps to set the
commit intervals reasonable.

Just to get a better perspective
* Why do you want to do a full index everyday?
* How much of data are we talking about?
* What's your SolrCloud setup like?
* Do you already have some benchmarks which you're not happy with?



On Thu, May 1, 2014 at 1:47 PM, Costi Muraru <costimuraru@gmail.com> wrote:

> Hi guys,
>
> What would you say it's the fastest way to import data in SolrCloud?
> Our use case: each day do a single import of a big number of documents.
>
> Should we use SolrJ/DataImportHandler/other? Or perhaps is there a bulk
> import feature in SOLR? I came upon this promising link:
> http://wiki.apache.org/solr/UpdateCSV
> Any idea on how UpdateCSV is performance-wise compared with
> SolrJ/DataImportHandler?
>
> If SolrJ, should we split the data in chunks and start multiple clients at
> once? In this way we could perhaps take advantage of the multitude number
> of servers in the SolrCloud configuration?
>
> Either way, after the import is finished, should we do an optimize or a
> commit or none (
> http://wiki.solarium-project.org/index.php/V1:Optimize_command)?
>
> Any tips and tricks to perform this process the right way are gladly
> appreciated.
>
> Thanks,
> Costi
>



-- 

Anshum Gupta
http://www.anshumgupta.net

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message