lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Smiley (@MITRE.org)" <DSMI...@mitre.org>
Subject Re: DocValues vs stored fields?
Date Mon, 01 Apr 2013 18:12:53 GMT
Otis,

DocValues are quite insufficient for true field updates.  DocValues is a
per-document value storage (hence the name); it's not uninverted/indexed. 
If you needed to search based on these values (e.g. find all docs that have
this value or between these values) then that's not going to work.  The most
promising field update work going on right now is
https://issues.apache.org/jira/browse/LUCENE-4258 "Incremental Field Updates
through Stacked Segments".  In my opinion, that's the most exciting thing
happening in Lucene right now; but it appears stalled a little.

I do think a DocValues based hack could make a better replacement for Solr's
ExternalizableFileField.  It's for use in FunctionQueries.

Another questioner asked essentially why a field that has DocValues won't
have its value shown when the field is marked stored="false" since the value
is stored per-document after all.  True, the disparity here is a bit
confusing.  DocValues are not intended as a replacement for stored fields in
places where you are using stored fields now.  It's basically to improve the
performance and memory use of function queries, sorting, and faceting.  It's
the new FieldCache under a different name, but hasn't strictly replaced the
FC (yet).  It's not enabled by default because it creates new data on disk
and Solr doesn't know that you want to use it.

As of Solr 4.2, DocValues is also multi-valued -- awesome!

All this said, I do think there's room for a proposed Solr DocTransformer to
expose the DocValues value as if it were a stored field in your search
results.  Actually... I wish if you explicitly ask for the field, and it's
not stored, then it would just go use docValues automatically.  That'd be
cool!

~ David


Otis Gospodnetic-5 wrote
> Hi,
> 
> The current field update mechanism is not really a field update
> mechanism.  It just looks like that from the outside.  DocValues
> should make true field updates implementable.
> 
> Otis
> --
> Solr & ElasticSearch Support
> http://sematext.com/
> 
> 
> 
> 
> 
> On Fri, Mar 29, 2013 at 3:30 PM, Marcin Rzewucki &lt;

> mrzewucki@

> &gt; wrote:
>> Hi,
>> Atomic updates (single field updates) do not depend on DocValues. They
>> were
>> implemented in Solr4.0 and works fine (but all fields have to be
>> retrievable). DocValues are supposed to be more efficient than
>> FieldCache.
>> Why not enabled by default ? Maybe because they are not for all fields
>> and
>> because of their limitations (a field has to be single-valued, required
>> or
>> to have default value).
>> Regards.
>>
>>
>>
>> On 29 March 2013 17:20, Timothy Potter &lt;

> thelabdude@

> &gt; wrote:
>>
>>> Hi Jack,
>>>
>>> I've just started to dig into this as well, so sharing what I know but
>>> still some holes in my knowledge too.
>>>
>>> DocValues == Column Stride Fields (best resource I know of so far is
>>> Simon's preso from Lucene Rev 2011 -
>>>
>>> http://www.slideshare.net/LucidImagination/column-stride-fields-aka-docvalues
>>> ).
>>> It's pretty dense but some nuggets I've gleaned from this are:
>>>
>>> 1) DocValues are more efficient in terms of memory usage and I/O
>>> performance for building an alternative to FieldCache (slide 27 is very
>>> impressive)
>>> 2) DocValues has a more efficient way to store primitive types, such as
>>> packed ints
>>> 3) Faster random access to stored values
>>>
>>> In terms of switch-over, you have to re-index to change your fields to
>>> use
>>> DocValues on disk, which is why they are not enabled by default.
>>>
>>> Lastly, another goal of DocValues is to allow updates to a single field
>>> w/o
>>> re-indexing the entire doc. That's not implemented yet but I think still
>>> planned.
>>>
>>> Cheers,
>>>  Tim
>>>
>>>
>>>
>>> On Fri, Mar 29, 2013 at 9:31 AM, Jack Krupansky &lt;

> jack@

> &gt;> >wrote:
>>>
>>> > I’m still a little fuzzy on DocValues (maybe because I’m still
>>> grappling
>>> > with how it does or doesn’t still relate to “Column Stride Fields”),
>>> so
>>> can
>>> > anybody clue me in as to how useful DocValues is/are?
>>> >
>>> > Are DocValues simply an alternative to “stored fields”?
>>> >
>>> > If so, and if DocValues are so great, why aren’t we just switching
>>> Solr
>>> > over to DocValues under the hood for all fields?
>>> >
>>> > And if there are “issues” with DocValues that would make such a
>>> complete
>>> > switchover less than absolutely desired, what are those issues?
>>> >
>>> > In short, when should a user use DocValues over stored fields, and
>>> vice
>>> > versa?
>>> >
>>> > As things stand, all we’ve done is make Solr more confusing than it
>>> was
>>> > before, without improving its OOBE. OOBE should be job one in Solr.
>>> >
>>> > Thanks.
>>> >
>>> > P.S., And if I actually want to do Column Stride Fields, is there a
>>> way
>>> to
>>> > do that?
>>> >
>>> > -- Jack Krupansky
>>>





-----
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: http://lucene.472066.n3.nabble.com/DocValues-vs-stored-fields-tp4052406p4052966.html
Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message