lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Sokolov <msoko...@gmail.com>
Subject Re: [jira] [Commented] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap
Date Thu, 17 Jan 2019 02:40:44 GMT
I used the wikimedia2m data set for the second set of tests (the first test
was on a tiny index - 10k docs) -- at least I think I did! I am kind of new
to the benchmarking game. I ran the becnhmarks with python
src/python/localrun.py -source wikimedium2m, and I can see that the index
dir is 861M.


On Wed, Jan 16, 2019 at 7:18 PM Michael McCandless (JIRA) <jira@apache.org>
wrote:

>
>     [
> https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16744538#comment-16744538
> ]
>
> Michael McCandless commented on LUCENE-8635:
> --------------------------------------------
>
> Thanks [~sokolov] – those numbers look quite a bit better!  Though, your
> QPSs are kinda high overall – how many Wikipedia docs were in your index?
>
> I do wonder if we simply reversed the FST's byte[] when we create it, what
> impact that'd have on lookup performance.  Hmm even if we did that, we'd
> still have to {{readBytes}} one byte at a time since {{RandomAccessInput}}
> does not have a {{readBytes}} method?  But ... maybe {{IndexInput}} would
> give good performance in that case?  We should probably pursue that
> separately though...
>
> > Lazy loading Lucene FST offheap using mmap
> > ------------------------------------------
> >
> >                 Key: LUCENE-8635
> >                 URL: https://issues.apache.org/jira/browse/LUCENE-8635
> >             Project: Lucene - Core
> >          Issue Type: New Feature
> >          Components: core/FSTs
> >         Environment: I used below setup for es_rally tests:
> > single node i3.xlarge running ES 6.5
> > es_rally was running on another i3.xlarge instance
> >            Reporter: Ankit Jain
> >            Priority: Major
> >         Attachments: offheap.patch, ra.patch, rally_benchmark.xlsx
> >
> >
> > Currently, FST loads all the terms into heap memory during index open.
> This causes frequent JVM OOM issues if the term size gets big. A better way
> of doing this will be to lazily load FST using mmap. That ensures only the
> required terms get loaded into memory.
> >
> > Lucene can expose API for providing list of fields to load terms
> offheap. I'm planning to take following approach for this:
> >  # Add a boolean property fstOffHeap in FieldInfo
> >  # Pass list of offheap fields to lucene during index open (ALL can be
> special keyword for loading ALL fields offheap)
> >  # Initialize the fstOffHeap property during lucene index open
> >  # FieldReader invokes default FST constructor or OffHeap constructor
> based on fstOffHeap field
> >
> > I created a patch (that loads all fields offheap), did some benchmarks
> using es_rally and results look good.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Mime
View raw message