lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun Kumar K <arunk...@gmail.com>
Subject Re: Lucene 4.2 DocValues
Date Wed, 29 May 2013 03:05:14 GMT
Adrein,

Thanks for spending time to explain me the things clearly. I have got the things correctly
now.

Thanks,
Arun


On 29-May-2013, at 2:13 AM, Adrien Grand <jpountz@gmail.com> wrote:

> On Tue, May 28, 2013 at 8:55 PM, Arun Kumar K <arunk786@gmail.com> wrote:
>> Thanks for clarifying the things.
>> I have some doubts regarding sorting :
>>> 
>>> While you can do that, I don't recommend it. For example, if you have
>>> 5 fields, loading all fields from stored fields requires at most 1
>>> disk seek while loading all fields from doc values requires at least 5
>>> disk seeks for disk-based doc values.
>> 
>> 
>> 1> I am assuming those mentioned 5 fields are sortable fields upon which sorting
is done.
>> In my understanding, loading stored fields takes 1 disk seek for finding file pointer
& 1 disk seek for getting all those fields.
> 
> This was correct until Lucene 4.0, but since 4.1, Lucene stores the
> doc ID -> file pointer mapping in memory, ensuring at most 1 disk
> seek.
> 
>> Since different file is maintained for a particular doc value field. We get 5 disk
seeks + 1 disk seek for file pointer.
> 
> There is no general rule since this depends on the doc values type and
> the codec implementation, but you got the idea.
> 
>> If we have only one sortable field , which could be better ? I guess no diff.
> 
> Just to make things clear, before Lucene had doc values, sorting was
> performed based on the inverted index (which was uninverted and stored
> in memory using FieldCache), not stored fields. Stored fields are bad
> for sorting because they are usually large and don't play nice with
> the file system cache.
> 
> Doc values are very similar to FieldCache except that the hard work is
> done at indexing time instead of searching time. This is good
> trade-off because it allows for faster loading of indexes and for
> off-loading data to disk. This is never a bad idea to use doc values
> for sorting.
> 
>> Also, I vaguely remember that there is some performance loss for sorting based on
string in lucene 4.0
>> Then, will the decision change for String field or based on type of field ?
> 
> I don't see why String sorting would be slower. However, it is true
> that String sorting requires a lot of memory. If your field is a
> number, you should definitely use a numeric field cache.
> 
>> 2> Also, In my understanding, if we need to use parser based queries for docvalues,
we need to have a storedfield for a doc with same name & value of the doc's docvalue.
>> Even term queries won't work. Am i right here?
> 
> QueryParser is completely unaware of your schema. If you want
> QueryParser to use doc-values-based queries, you can override
> QueryParser.newRangeQuery and/or QueryParser.newFieldQuery to return a
> new ConstantScoreQuery that wraps a FieldCacheRangeFilter.
> 
> --
> Adrien
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message