lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nagasharath <sharathrayap...@gmail.com>
Subject Re: Avoid re indexing
Date Sat, 01 Aug 2015 23:25:02 GMT
If my current shard is holding 3 million documents will the new subshard after splitting also
be able to hold 3 million documents?
If that is the case After shard splitting the sub shards should hold 6 million documents if
a shard is split in to two. Am I right?

> On 01-Aug-2015, at 5:43 pm, Upayavira <uv@odoko.co.uk> wrote:
> 
> 
> 
>> On Sat, Aug 1, 2015, at 11:29 PM, naga sharathrayapati wrote:
>> I am using solrj to index documents
>> 
>> i agree with you regarding the index update but i should not see any
>> deleted documents as it is a fresh index. Can we actually identify what
>> are
>> those deleted documents?
> 
> If you post doc 1234, then you post doc 1234 a second time, you will see
> a deletion in your index. If you don't want deletions to show in your
> index, be sure NEVER to update a document, only add new ones with
> absolutely distinct document IDs.
> 
> You cannot see (via Solr) which docs are deleted. You could, I suppose,
> introspect the Lucene index, but that would most definitely be an expert
> task.
> 
>> if there is no option of adding shards to existing collection i do not
>> like
>> the idea of re indexing the whole data (worth hours) and we have gone
>> with
>> good number of shards but there is a rapid increase of size in data over
>> the past few days, do you think is it worth logging a ticket?
> 
> You can split a shard. See the collections API:
> 
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api3
> 
> What would you want to log a ticket for? I'm not sure that there's
> anything that would require that.
> 
> Upayavira

Mime
View raw message