lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Whitman <brian.whit...@variogr.am>
Subject Re: What can get past document uniqueness?
Date Fri, 14 Mar 2008 15:31:02 GMT
Doesn't look like it. We do rsyncing but only as a backup for this  
index-- these queries are hitting the live index.

Also, the results we get back are not exact duplicates, even though  
the ID is the same. For example, if we update a document (replace an  
existing document) with new information, the index will only sometimes  
store two copies -- one with the old data and one with the new  
content. If I update again (with the same content) the duplicate goes  
away.

I have optimized & committed the index with no change to the pre- 
existing duplicates.

Where within Solr is uniqueness enforced? I'd like to at least put  
some debug checking in there.



On Mar 13, 2008, at 4:47 PM, Ryan McKinley wrote:

> Check this thread:
> http://www.nabble.com/duplicate-entries-being-returned%2C-possible-caching-issue--td15237016.html
>
> perhaps it is related?
>
>
> Brian Whitman wrote:
>> On a solr instance with
>> <!-- field to use to determine and enforce document uniqueness. -->
>> <uniqueKey>id</uniqueKey>
>> This is happening:
>> http://solr..../select?q=id:abc123&fl=id
>> <doc>
>> <str name="id">abc123</str>
>> </doc>
>> <doc>
>> <str name="id">abc123</str>
>> </doc>
>> Lots of weird stuff is writing to this index: solrj code, python  
>> solr.py, curl, etc. -- many things at the same time. Autocommit is  
>> at 30m.
>> 4500 of the ca. 1.5m docs in this index are doubled like this.
>> What can get past the doc uniqueness constraint? Has anyone seen  
>> this before?
>


Mime
View raw message