lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: DocValues and field requirements
Date Fri, 22 Mar 2013 19:39:24 GMT

: Thank you for your response. Yes, that's strange. By enabling DocValues the
: information about missing fields is lost, which changes the way of sorting
: as well. Adding default value to the fields can change a logic of
: application dramatically (I can't set default value to 0 for all
: Trie*Fields fields, because it could impact the results displayed to the
: end user, which is not good). It's a pity that using DocValues is so
: limited.

I'm not really up on docvalues, but i asked rmuir about this a bit on IRC>

the crux of the issue is that there are two differnet docvalue impls, one 
that uses a fixed amount of space per doc (ie: exactly one value per doc) 
and one that alloaws an ordered set of values per doc (ie: multivalued).

the multivalued docvals impl was wired into solr for multivalued fields, 
and the single valued docvals impl was wired in for hte single valued case 
-- but since since the single valued docvals impl *has* to have a value 
for every doc, the schema error you encountered was added if you try to 
use it on a field that isn't required or doesn't have a default value -- 
to force you to be explicit about which "default" you want, instead of hte 
low level lucene "0" default coming into play w/o you knowing about it. 
(as Shawn mentioned)

the multivalued docvals impl could concivably be used instead for these 
types of single valued fields (ie: to support 0 or 1 values) but there is 
no sorting support for multivalued docvals, so it would cause other 
problems.

One possible workarround for people who want to take advantage of "sort 
missing first/last" type sorting on a docvals type field would be to mange 
the "missing" information yourself in a distinct field which you also 
leveraged in any filtering or sorting on the docvals field.

ie, have a docvalues field "myfield" which is single valued, with some 
configured default value, and then have a "myfield_exists" boolean field 
which is single valued and required.  when indexing docs, if "myfield" 
does/doesn't have a value set "myfield_exists" to accordingly (this would 
be fairly trivial in an updated processor) and then instead of sorting 
just on "myfield desc" you would sort on "myfield_exists (asc|desc), 
myfield desc" (where you pick hte asc or desc depending on wether you want 
docs w/o values first or last).  you would likewise need to filter on 
myfield_exists:true anytime you did queries against the myfield field.


(perhaps someoen could work on patch to inject a synthetic field like this 
automatically for fields that are docValues="true" multiValued="false" 
required="false" w/o a defualtValue?)


-Hoss

Mime
View raw message