lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martynas L <martynas....@gmail.com>
Subject Re: Slower document retrieval in 8.7.0 comparing to 7.5.0
Date Fri, 22 Jan 2021 15:22:04 GMT
Just played with my reading sample. I do not have a goal to show the exact
numbers, but it is a fact that document retrieval IndexSearcher.doc(int) is
much slower.
All our performance tests showed performance degradation after changing to
8.7.0, even without measurement we can "see/feel" the operations involving
documents retrieval became slower.



On Fri, Jan 22, 2021 at 4:48 PM Rob Audenaerde <rob.audenaerde@gmail.com>
wrote:

> Hi Martynas
>
> How did you measure that?
>
> I ask, because writing a good benchmark is not an easy task,  since there
> are so many factors (class loading times, JIT effects, etc). You should use
> Java Microbenchmark Harness or similar; and set up a random document
> retrieval task, with warm-up etc.etc.
>
> (I'm not aware of any big slowdowns, but as you see them, the best way is
> to build a robust benchmark and then start comparing)
>
> -Rob
>
>
> On Fri, Jan 22, 2021 at 3:43 PM Martynas L <martynas.sub@gmail.com> wrote:
>
> > Even retrieving single document 8.7.0 is more than x2 slower
> >
> > On Fri, Jan 22, 2021 at 2:28 PM Diego Ceccarelli (BLOOMBERG/ LONDON) <
> > dceccarelli4@bloomberg.net> wrote:
> >
> > > >  I think it will be similar ratio retrieving any number of documents.
> > >
> > > I'm not sure this is true, if you retrieve a huge amount of documents
> you
> > > might cause troubles to the GC.
> > >
> > > From: java-user@lucene.apache.org At: 01/22/21 12:11:19To:
> > > java-user@lucene.apache.org
> > > Subject: Re: Slower document retrieval in 8.7.0 comparing to 7.5.0
> > >
> > > The accent should not be on retrieved documents number, but on the
> > duration
> > > ratio - 8.7.0 is 3 times slower. I think it will be similar ratio
> > > retrieving any number of documents.
> > >
> > > On Fri, Jan 22, 2021 at 1:39 PM Rob Audenaerde <
> rob.audenaerde@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi Martrynas,
> > > >
> > > > In your sample code you are retrieving all (1 million!) documents
> from
> > > the
> > > > index, that surely is not a good match for lucene  :)
> > > >
> > > > Is that a good reflection of your use-case?
> > > >
> > > > On Fri, Jan 22, 2021 at 9:52 AM Martynas L <martynas.sub@gmail.com>
> > > wrote:
> > > >
> > > > >  Please see the sample at
> > > > >
> > >
> https://drive.google.com/drive/folders/1ufVZXzkugBAFnuy8HLAY6mbPWzjknrfE
> > > > >
> > > > > IndexGenerator - creates a dummy index.
> > > > > IndexReader - retrieves documents - duration time with 7.5.0
> version
> > is
> > > > > ~2s, while ~6s with 8.7.0
> > > > >
> > > > > Regards,
> > > > > Martynas
> > > > >
> > > > >
> > > > > On Thu, Jan 21, 2021 at 8:21 PM Rob Audenaerde <
> > > rob.audenaerde@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > There is no attachment in the previous email that I can see?
> Maybe
> > > you
> > > > > can
> > > > > > post it online?
> > > > > >
> > > > > > On Thu, Jan 21, 2021 at 4:54 PM Martynas L <
> martynas.sub@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > Are there any comments on this issue?
> > > > > > > If there is no workaround, we will be forced to rollback
to the
> > > 7.5.0
> > > > > > > version.
> > > > > > >
> > > > > > > Best regards,
> > > > > > > Martynas
> > > > > > >
> > > > > > > On Tue, Jan 12, 2021 at 12:27 PM Martynas L <
> > > martynas.sub@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > Please see attached sample.
> > > > > > > > IndexGenerator - creates a dummy index.
> > > > > > > > IndexReader - retrieves documents - duration time
with 7.5.0
> > > > version
> > > > > is
> > > > > > > > ~2s, while ~6s with 8.7.0
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Martynas
> > > > > > > >
> > > > > > > > On Tue, Dec 22, 2020 at 3:23 PM Vincenzo D'Amore <
> > > > v.damore@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > >> I think it would be useful to have an example
of a document
> > and,
> > > > if
> > > > > > > >> possible, an example of query that takes too long.
> > > > > > > >>
> > > > > > > >> On Mon, Dec 21, 2020 at 1:47 PM Martynas L <
> > > > martynas.sub@gmail.com>
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >> > Hello,
> > > > > > > >> >
> > > > > > > >> > I am sorry for the delay.
> > > > > > > >> >
> > > > > > > >> > Not sure what you mean by "workload". We
have a
> performance
> > > > tests,
> > > > > > > which
> > > > > > > >> > started failing after upgrading to 8.7.0.
> > > > > > > >> > So I just tried to query the index (built
form the same
> > > source)
> > > > to
> > > > > > get
> > > > > > > >> all
> > > > > > > >> > documents and compare the performance with
7.5.0.
> > > > > > > >> >
> > > > > > > >> > Document "size" is a sum of all stored string
lengths
> > (3402519
> > > > > > > >> documents):
> > > > > > > >> >
> > > > > > > >> > doc size 903 - 88s vs 22s
> > > > > > > >> >
> > > > > > > >> > doc size 36 (only one field loaded, used
> searcher.doc(docID,
> > > > > > > >> > Collections.singleton("fieldName"))) - 78s
vs 16s
> > > > > > > >> >
> > > > > > > >> > doc size 439 (some fields made not stored)
- 46s vs 14.5s
> > > > > > > >> >
> > > > > > > >> > Best regards,
> > > > > > > >> > Martynas
> > > > > > > >> >
> > > > > > > >> > On Fri, Dec 4, 2020 at 12:06 AM Adrien Grand
<
> > > jpountz@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >> >
> > > > > > > >> > > Hello Martynas,
> > > > > > > >> > >
> > > > > > > >> > > There have indeed been changes related
to stored fields
> in
> > > > 8.7.
> > > > > > What
> > > > > > > >> does
> > > > > > > >> > > your workload look like and how large
are your documents
> > on
> > > > > > average?
> > > > > > > >> > >
> > > > > > > >> > > On Thu, Dec 3, 2020 at 3:04 PM Martynas
L <
> > > > > martynas.sub@gmail.com
> > > > > > >
> > > > > > > >> > wrote:
> > > > > > > >> > >
> > > > > > > >> > > > Hi,
> > > > > > > >> > > > We've migrated from 7.5.0 to 8.7.0
and find out that
> the
> > > > index
> > > > > > > >> > > "searching"
> > > > > > > >> > > > is significantly (4-5 times) slower
in the latest
> > version.
> > > > > > > >> > > > It seems that
> > > > > > > >> > > > org.apache.lucene.search.IndexSearcher#doc(int)
> > > > > > > >> > > > is slower.
> > > > > > > >> > > >
> > > > > > > >> > > > Is it possible to have similar
performance with 8.7.0?
> > > > > > > >> > > >
> > > > > > > >> > > > Best regards,
> > > > > > > >> > > > Martynas
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > > --
> > > > > > > >> > > Adrien
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> --
> > > > > > > >> Vincenzo D'Amore
> > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message