lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Vasilyev <nick.vasily...@gmail.com>
Subject Re: How to re-index SOLR data
Date Tue, 09 Aug 2016 16:40:25 GMT
Hi, I work on a python Solr Client
<http://solrclient.readthedocs.io/en/latest/> library and there is a
reindexing helper module that you can use if you are on Solr 4.9+. I use it
all the time and I think it works pretty well. You can re-index all
documents from a collection into another collection or dump them to the
filesystem as JSON. It also supports parallel execution and can run
independently on each shard. There is also a way to resume if your job
craps out half way through if your existing schema is set up with a good
date field and unique id.

You can read the documentation here:
http://solrclient.readthedocs.io/en/latest/Reindexer.html

Code is pretty short and is here:
https://github.com/moonlitesolutions/SolrClient/blob/master/SolrClient/helpers/reindexer.py

Here is sample:
from SolrClient import SolrClient
from SolrClient.helpers import Reindexer

r = Reindexer(SolrClient('http://source_solr:8983/solr'), SolrClient('
http://destination_solr:8983/solr') , source_coll='source_collection',
dest_coll='destination-collection')
r.reindex()






On Tue, Aug 9, 2016 at 9:56 AM, Shawn Heisey <apache@elyograg.org> wrote:

> On 8/9/2016 1:48 AM, bharath.mvkumar wrote:
> > What would be the best way to re-index the data in the SOLR cloud? We
> > have around 65 million data and we are planning to change the schema
> > by changing the unique key type from long to string. How long does it
> > take to re-index 65 million documents in SOLR and can you please
> > suggest how to do that?
>
> There is no magic bullet.  And there's no way for anybody but you to
> determine how long it's going to take.  There are people who have
> achieved over 50K inserts per second, and others who have difficulty
> reaching 1000 per second.  Many factors affect indexing speed, including
> the size of your documents, the complexity of your analysis, the
> capabilities of your hardware, and how many threads/processes you are
> using at the same time when you index.
>
> Here's some more detailed info about reindexing, but it's probably not
> what you wanted to hear:
>
> https://wiki.apache.org/solr/HowToReindex
>
> Thanks,
> Shawn
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message