lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Grand <jpou...@gmail.com>
Subject Re: How can Docvalues so efficient
Date Mon, 30 May 2016 11:01:53 GMT
When executing queries, Lucene has an abstraction called Scorer, which is
responsible for returning matching documents in doc id order. Since doc
values are stored on disk in doc id order, reads are sequential. There is
an adversary case when few documents match since you might need to jump
over large numbers of doc ids in order to reach the next matching one, but
those queries that match few documents should be very fast anyway.

Le lun. 30 mai 2016 à 12:52, Ting Yao <ting.echo.yao@gmail.com> a écrit :

> Thank you very much for answering me.
>  But could you explain how Lucene reads the doc values files sequentially?
>
> 2016-05-30 18:15 GMT+08:00 Adrien Grand <jpountz@gmail.com>:
>
> > Doc values indeed need to read from disk. However, the fact that Lucene
> > reads the doc values files sequentially (disks perform better at
> sequential
> > access than random access) and that the filesystem cache helps keep hot
> > regions of the doc values files in memory usually helps keep perfermance
> > close to what we would get if the data was stored in memory.
> >
> > Le lun. 30 mai 2016 à 12:01, Ting Yao <ting.echo.yao@gmail.com> a écrit
> :
> >
> > > Hi all,
> > >        I am reading Lucene source code recently and we also use the
> > Elastic
> > > Search as our search engine. As far as I know, the elastic search
> > > performance is pretty good. The elastic search is based on Lucene. So I
> > am
> > > wondering that how it can search words so fast when the field data
> > > (uninverted index) are stored in disk.
> > >     The DocValues make access filed values fast. From my perspective,
> > it's
> > > of course fast when few values of a field are read. But when few fields
> > > need to access, I think it's not fast again. Because when access a
> field,
> > > all of its doc values need to read with MMap. So the system needs to
> read
> > > disk to load the data.
> > >     So could anyone help me understand the DocValues operating
> mechanism?
> > >
> > > Echo Yao
> > >
> >
>
>
>
> --
> Echo Yao
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message