lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Rosen <p...@performantsoftware.com>
Subject Re: Partial updates?
Date Fri, 28 Aug 2009 19:15:31 GMT
That sounds very similar to my use case, too. (Mentioned in the recent 
thread "Updating a solr record"). So +1 on allowing updates!

Jason Rutherglen wrote:
> Don,
> 
> I started work on fixing this a while back. However I plan to
> resume again soon. Basically one would be able to update fields
> to a parallel index, without reindexing the entire document.
> There are other use cases I've seen for this such as caching.
> 
> -J
> 
> On Fri, Aug 28, 2009 at 8:49 AM, Don Werve<donw@madwombat.com> wrote:
>> Short version:
>>
>> Is there a way to either do partial updates to documents (update/add one or
>> two fields only), or to search across multiple documents grouped by a
>> (non-unique) key stored in a field?
>>
>> Long version:
>>
>> I've run into an issue with the way I'm indexing documents for a new
>> product, and figure that somebody else has run into the same problem.  In a
>> nutshell, we're building a system that deals with a lot of incoming and
>> outgoing text documents (email, word docs, short comments, etc), grouped
>> together by some common factor (basically, email threads), and want to do
>> full-text search across those threads.
>>
>> We've settled on Solr, of course. :)
>>
>> Right now, I'm adding each new incoming/outgoing message as a new document,
>> and can search just fine, unless I want to look for multiple terms that span
>> documents.  So, "foo" is in the first document, "bar" is in the second, and
>> although they both have a 'thread_id' field identifying them as belonging to
>> the same group, searching for "+foo +bar" doesn't yield results (which is
>> not surprising).
>>
>> Now, I can modify the code to store one document for each group of messages
>> without too much work.  But as I understand it, this means that for every
>> new message coming in, I need to hand an aggregate of all previous messages
>> to the indexer, because Solr will re-create the document (which indexes the
>> entire group of messages) when I do update/add.  Since there can be some
>> fairly large files sitting in there (50-100M in some cases), I'd rather not
>> have to shove that down Solr's pipe every time something changes.
>>
>> So, first question, is what I think I know about update/add correct?
>>
>> Second, if so, is there a way that I can update single-valued fields and
>> append new multivalued fields, without having to re-index the whole
>> document?
>>
>> Third, am I just totally wrong about the way I'm trying to do this, and is
>> there a better way?
>>
>> Thanks-in-advance!
>>


Mime
View raw message