lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: Partial update vs full update performance
Date Wed, 12 Jun 2013 16:14:41 GMT
Yes, you need to have all the fields stored to do a partial update.

Generally, not storing field values causes all sorts of headaches that far 
outweigh the modest benefit in memory savings.

Generally, make everything stored - unless you have specific and VERY 
COMPELLING need not to. Back in the early days of Lucene and Solr memory use 
was much more compelling. Now, not so much. And even if memory is an issue, 
the downside of not storing all values seems much more likely to overwhelm 
the benefits.

Sure, there are some apps where you may not want to store much if anything 
besides the key (I recall one presentation at Lucene Revolution in San 
Diego, and DataStax Enterprise does this because all the data is stored in 
Cassandra already), but generally apps would be better off biting the bullet 
and throwing memory at the problem.

And DocValues are an alternative if heap space is a critical issue.

2. Large field values are simply a potential issue since they are a lot of 
bytes to be retrieved and then re-stored.

-- Jack Krupansky

-----Original Message----- 
From: adfel70
Sent: Wednesday, June 12, 2013 11:50 AM
To: solr-user@lucene.apache.org
Subject: Re: Partial update vs full update performance

1. To support partial updates, I must have all the fields stored (most of
which I don't need stored)
Wouldn't I suffer in query perforemnce if I store all  these fields?

2. Can you elaborate on the large fields issue?
Why does it matter if the fields are large in the context of partial
updates?
One way or another, lucene will index the field..


Jack Krupansky-2 wrote
> Correct.
>
> Generally, I think most apps will benefit from partial update, especially
> if
> they have a lot of fields. Otherwise, they will have two round trip
> requests
> rather than one. Solr does the reading of existing document values more
> efficiently, under the hood, with no need to format for the response and
> parse the incoming (redundant) values.
>
> OTOH, if the client has all the data anyway (maybe because it wants to
> display the data before update), it may be easier to do a full update.
>
> You could do an actual performance test, but I would suggest that
> (generally) partial update will be more efficient than a full update.
>
> And Lucene can do add and delete rather quickly, so that should not be a
> concern for modest to medium size documents, but clearly would be an issue
> for large and very large documents (hundreds of fields or large field
> values.)
>
> -- Jack Krupansky
>
> -----Original Message----- 
> From: adfel70
> Sent: Wednesday, June 12, 2013 10:40 AM
> To:

> solr-user@.apache

> Subject: Partial update vs full update performance
>
> Hi
> As I understand, even if I use partial update, lucene can't really update
> documents. Solr will use the stored fields in order to pass the values to
> lucene, and a delete,add opeartions will still be performed.
>
> If this is the case is there a performance issue when comparing partial
> update to full update?
>
> My documents have dozens of fields, most of them are not stored.
> I sometimes need to go through a portion of the documents and modify a
> single field.
> What I do right now is deleting the portion I want to update, and adding
> them with the updated field.
> This of course takes a lot of time (I'm talking about ten of millions of
> documents).
>
> Should I move to using partial update? will it improve the indexing time
> at
> all? will it improve the indexing time in such extent that I would better
> be
> storing the fields I don't need stored just for the partial update
> feature?
>
> thanks
>
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Partial-update-vs-full-update-performance-tp4069948.html
> Sent from the Solr - User mailing list archive at Nabble.com.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Partial-update-vs-full-update-performance-tp4069948p4069973.html
Sent from the Solr - User mailing list archive at Nabble.com. 


Mime
View raw message