lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap
Date Tue, 15 Jan 2019 15:18:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743137#comment-16743137
] 

Michael McCandless commented on LUCENE-8635:
--------------------------------------------

Thanks for testing [~sokolov] – the results make sense: the most terms dictionary intensive
queries are impacted the most, with {{PKLookup}} being heavily impacted since that's just
purely exercising the terms dictionary with no postings visited.  Fuzzy queries, and then
queries matching few hits (conjunctions with low/medium freq terms) also spend relatively
more time in the terms dictionary ...

So net/net it looks like we should not make this the default, but expose it somehow as an
option for those use cases that don't want to dedicate heap memory to storing FSTs?

> Lazy loading Lucene FST offheap using mmap
> ------------------------------------------
>
>                 Key: LUCENE-8635
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8635
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: core/FSTs
>         Environment: I used below setup for es_rally tests:
> single node i3.xlarge running ES 6.5
> es_rally was running on another i3.xlarge instance
>            Reporter: Ankit Jain
>            Priority: Major
>         Attachments: offheap.patch, rally_benchmark.xlsx
>
>
> Currently, FST loads all the terms into heap memory during index open. This causes frequent
JVM OOM issues if the term size gets big. A better way of doing this will be to lazily load
FST using mmap. That ensures only the required terms get loaded into memory.
>  
> Lucene can expose API for providing list of fields to load terms offheap. I'm planning
to take following approach for this:
>  # Add a boolean property fstOffHeap in FieldInfo
>  # Pass list of offheap fields to lucene during index open (ALL can be special keyword
for loading ALL fields offheap)
>  # Initialize the fstOffHeap property during lucene index open
>  # FieldReader invokes default FST constructor or OffHeap constructor based on fstOffHeap
field
>  
> I created a patch (that loads all fields offheap), did some benchmarks using es_rally
and results look good.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message