lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Upayavira ...@odoko.co.uk>
Subject Re: Avoid re indexing
Date Sun, 02 Aug 2015 10:26:55 GMT
You do not want to add a new shard, first you want your docs evenly
spread, secondly, they are spread using hash ranges, to add more
capacity, you spread out those hash ranges using shard splitting.
"Adding" a new shard doesnt really make any sense here. Unless you go
for implicit routing where you decide for yourself which shard a doc
goes into, but it seems too late to make that decision in your case.

Upayavira

On Sun, Aug 2, 2015, at 12:40 AM, Nagasharath wrote:
> Yes, shard splitting will only help in managing large clusters and to
> improve query performance. In my case as index size is fully grown (no
> capacity to hold in the existing shards) across the collection adding a
> new shard will help and for which I have to re index.
> 
> 
> > On 01-Aug-2015, at 6:34 pm, Upayavira <uv@odoko.co.uk> wrote:
> > 
> > Erm, that doesn't seem to make sense. Seems like you are talking about
> > *merging* shards.
> > 
> > Say you had two shards, 3m docs each:
> > 
> > shard1: 3m docs
> > shard2: 3m docs
> > 
> > If you split shard1, you would have:
> > 
> > shard1_0: 1.5m docs
> > shard1_1: 1.5m docs
> > shard2: 3m docs
> > 
> > You could, of course, then split shard2. You could also split shard1
> > into three parts instead, if you preferred:
> > 
> > shard1_0: 1m docs
> > shard1_1: 1m docs
> > shard1_2: 1m docs
> > shard2: 3m docs
> > 
> > Upayavira
> > 
> >> On Sun, Aug 2, 2015, at 12:25 AM, Nagasharath wrote:
> >> If my current shard is holding 3 million documents will the new subshard
> >> after splitting also be able to hold 3 million documents?
> >> If that is the case After shard splitting the sub shards should hold 6
> >> million documents if a shard is split in to two. Am I right?
> >> 
> >>> On 01-Aug-2015, at 5:43 pm, Upayavira <uv@odoko.co.uk> wrote:
> >>> 
> >>> 
> >>> 
> >>>> On Sat, Aug 1, 2015, at 11:29 PM, naga sharathrayapati wrote:
> >>>> I am using solrj to index documents
> >>>> 
> >>>> i agree with you regarding the index update but i should not see any
> >>>> deleted documents as it is a fresh index. Can we actually identify what
> >>>> are
> >>>> those deleted documents?
> >>> 
> >>> If you post doc 1234, then you post doc 1234 a second time, you will see
> >>> a deletion in your index. If you don't want deletions to show in your
> >>> index, be sure NEVER to update a document, only add new ones with
> >>> absolutely distinct document IDs.
> >>> 
> >>> You cannot see (via Solr) which docs are deleted. You could, I suppose,
> >>> introspect the Lucene index, but that would most definitely be an expert
> >>> task.
> >>> 
> >>>> if there is no option of adding shards to existing collection i do not
> >>>> like
> >>>> the idea of re indexing the whole data (worth hours) and we have gone
> >>>> with
> >>>> good number of shards but there is a rapid increase of size in data
over
> >>>> the past few days, do you think is it worth logging a ticket?
> >>> 
> >>> You can split a shard. See the collections API:
> >>> 
> >>> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api3
> >>> 
> >>> What would you want to log a ticket for? I'm not sure that there's
> >>> anything that would require that.
> >>> 
> >>> Upayavira

Mime
View raw message