lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Don Werve <>
Subject Partial updates?
Date Fri, 28 Aug 2009 15:49:01 GMT
Short version:

Is there a way to either do partial updates to documents (update/add one or
two fields only), or to search across multiple documents grouped by a
(non-unique) key stored in a field?

Long version:

I've run into an issue with the way I'm indexing documents for a new
product, and figure that somebody else has run into the same problem.  In a
nutshell, we're building a system that deals with a lot of incoming and
outgoing text documents (email, word docs, short comments, etc), grouped
together by some common factor (basically, email threads), and want to do
full-text search across those threads.

We've settled on Solr, of course. :)

Right now, I'm adding each new incoming/outgoing message as a new document,
and can search just fine, unless I want to look for multiple terms that span
documents.  So, "foo" is in the first document, "bar" is in the second, and
although they both have a 'thread_id' field identifying them as belonging to
the same group, searching for "+foo +bar" doesn't yield results (which is
not surprising).

Now, I can modify the code to store one document for each group of messages
without too much work.  But as I understand it, this means that for every
new message coming in, I need to hand an aggregate of all previous messages
to the indexer, because Solr will re-create the document (which indexes the
entire group of messages) when I do update/add.  Since there can be some
fairly large files sitting in there (50-100M in some cases), I'd rather not
have to shove that down Solr's pipe every time something changes.

So, first question, is what I think I know about update/add correct?

Second, if so, is there a way that I can update single-valued fields and
append new multivalued fields, without having to re-index the whole

Third, am I just totally wrong about the way I'm trying to do this, and is
there a better way?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message