lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Kanarsky <kanarsky2...@gmail.com>
Subject Re: Fastest way to import big amount of documents in SolrCloud
Date Fri, 02 May 2014 06:12:57 GMT
If you build your index in Hadoop, read this (it is about the Cloudera
Search but in my understanding also should work with Solr Hadoop contrib
since 4.7)
http://www.cloudera.com/content/cloudera-content/cloudera-docs/Search/latest/Cloudera-Search-User-Guide/csug_batch_index_to_solr_servers_using_golive.html


On Thu, May 1, 2014 at 1:47 PM, Costi Muraru <costimuraru@gmail.com> wrote:

> Hi guys,
>
> What would you say it's the fastest way to import data in SolrCloud?
> Our use case: each day do a single import of a big number of documents.
>
> Should we use SolrJ/DataImportHandler/other? Or perhaps is there a bulk
> import feature in SOLR? I came upon this promising link:
> http://wiki.apache.org/solr/UpdateCSV
> Any idea on how UpdateCSV is performance-wise compared with
> SolrJ/DataImportHandler?
>
> If SolrJ, should we split the data in chunks and start multiple clients at
> once? In this way we could perhaps take advantage of the multitude number
> of servers in the SolrCloud configuration?
>
> Either way, after the import is finished, should we do an optimize or a
> commit or none (
> http://wiki.solarium-project.org/index.php/V1:Optimize_command)?
>
> Any tips and tricks to perform this process the right way are gladly
> appreciated.
>
> Thanks,
> Costi
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message