lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Grand <jpou...@gmail.com>
Subject Re: Lucene DocValuesField, SortedDocValuesField usage for filtering and sorting
Date Tue, 16 Dec 2014 14:14:39 GMT
Hi Piotr,

On Mon, Dec 15, 2014 at 9:43 PM, Piotr Idzikowski
<piotridzikowski@gmail.com> wrote:
> Hello.
> I am going to switch to newest (4.10.2) version of Lucene and I'd like to
> make some optimization in my index and code. I would like to use
> DocValuesField to get values but also for filtering and sorting. So here I
> have some questions: If I'd like to use range filter
> (FieldCacheRangeFilter) I need to store a value in XxxDocValuesField, but
> if i want to use terms filter (FieldCacheTermsFilter) I need to store a
> value in SortedDocValuesField. So it looks like if I want to use range and
> terms filters I need to have two different fields. Am I right? Am I using
> it correctly?

FieldCacheRangeFilter and FieldCacheTermsFilter only work well when
you have lots of terms and most documents match your filter. Otherwise
you should consider using the regular numeric range filter and terms
filter. Although they might be a bit slower in the dense case, they
will be significantly faster when few terms/documents match.

Both FieldCacheRangeFilter and FieldCacheTermsFilter would work on the
same SortedDocValues field. What makes you think you need two fields ?

> Another thing is Sort. I can choose between SortedNumericSortField and
> SortField. First one requires SortedNumericDocValues, another
> NumericDocValuesField. Is there any(big) difference in performance? Should
> I use SortedNumericSortField (adding another field to the index)?

SortedNumericSortField is just a helper class to sort on a
multi-valued field that stores numeric doc values (in order to know
whether the min or max value should be considered for sorting).
SortField already handles correctly both numeric and sorted doc
values, you can use either one. If you have the choice to store your
data either in a numeric doc values field or a sorted doc values
field, then the numeric field might be a bit better performance-wise
(but it only works with single-valued numerics).

> And the last one. Am I right that all corresponding DocValuesField will be
> removed from index when doc is removed? I saw an IndexWriter method for an
> update doc value but no delete method for doc value.

Yes, doc values will be removed too. The reason why there is this
method on IndexWriter is that Lucene supports updating doc values
fields without reindexing the document completely (the updateDocument
method).

-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message