lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Sokolov (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap
Date Wed, 16 Jan 2019 18:55:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16744344#comment-16744344
] 

Mike Sokolov edited comment on LUCENE-8635 at 1/16/19 6:54 PM:
---------------------------------------------------------------

Following a suggestion from [~mikemccand] I tried a slightly different version of this, making
use of randomAccessSlice to avoid some calls to seek(), and this gives better perf in the
benchmarks. I also spent some time trying to understand FST's backwards-seeking behavior.
Based on my crude understanding, and comment from Mike again, it seems as if with some work
it would be possible to make it more naturally forward-seeking, but it's not obvious that
in general you would get more local cache-friendly access patterns from that. Still you might;
probably needs some experimentation to know for sure. Here are the benchmark #s from the random-access
patch:
{noformat}
                    Task  QPS before      StdDev   QPS after      StdDev                Pct
diff
                PKLookup      133.62      (2.2%)      123.74      (1.5%)   -7.4% ( -10% -
  -3%)
              AndHighLow     3411.49      (3.2%)     3268.04      (3.1%)   -4.2% ( -10% -
   2%)
BrowseDayOfYearTaxoFacets    10067.18      (4.3%)     9828.65      (3.5%)   -2.4% (  -9% -
   5%)
                 LowTerm     3567.48      (1.2%)     3489.27      (1.7%)   -2.2% (  -5% -
   0%)
                  Fuzzy1      147.67      (3.1%)      144.65      (2.4%)   -2.0% (  -7% -
   3%)
   BrowseMonthTaxoFacets    10102.27      (4.2%)     9901.49      (4.1%)   -2.0% (  -9% -
   6%)
                  Fuzzy2       62.00      (2.8%)       60.87      (2.4%)   -1.8% (  -6% -
   3%)
                 MedTerm     2694.87      (2.0%)     2647.08      (2.1%)   -1.8% (  -5% -
   2%)
              AndHighMed     1171.52      (2.7%)     1154.25      (2.8%)   -1.5% (  -6% -
   4%)
                HighTerm     2061.53      (2.3%)     2032.84      (2.5%)   -1.4% (  -6% -
   3%)
         MedSloppyPhrase      266.60      (3.4%)      263.01      (4.2%)   -1.3% (  -8% -
   6%)
              OrHighHigh      278.90      (4.0%)      275.35      (4.7%)   -1.3% (  -9% -
   7%)
        HighSloppyPhrase      107.68      (5.5%)      106.34      (5.6%)   -1.2% ( -11% -
  10%)
                 Respell      118.26      (2.1%)      116.95      (2.2%)   -1.1% (  -5% -
   3%)
             AndHighHigh      472.93      (4.4%)      467.78      (3.3%)   -1.1% (  -8% -
   6%)
               OrHighMed      755.21      (2.9%)      748.34      (3.3%)   -0.9% (  -6% -
   5%)
             MedSpanNear      308.31      (3.3%)      305.59      (3.8%)   -0.9% (  -7% -
   6%)
                Wildcard      869.37      (3.5%)      862.74      (1.9%)   -0.8% (  -5% -
   4%)
       HighTermMonthSort      871.33      (7.1%)      865.80      (6.1%)   -0.6% ( -12% -
  13%)
               MedPhrase      449.39      (3.0%)      446.55      (2.4%)   -0.6% (  -5% -
   4%)
             LowSpanNear      391.10      (3.3%)      388.77      (3.8%)   -0.6% (  -7% -
   6%)
         LowSloppyPhrase      406.57      (3.8%)      404.23      (3.6%)   -0.6% (  -7% -
   7%)
              HighPhrase      239.84      (3.7%)      238.78      (3.3%)   -0.4% (  -7% -
   6%)
                 Prefix3     1230.56      (5.0%)     1225.52      (2.9%)   -0.4% (  -7% -
   7%)
            HighSpanNear      107.34      (5.2%)      107.20      (5.3%)   -0.1% ( -10% -
  10%)
               LowPhrase      438.52      (3.4%)      438.14      (2.5%)   -0.1% (  -5% -
   5%)
    BrowseDateTaxoFacets       11.14      (4.0%)       11.16      (7.0%)    0.2% ( -10% -
  11%)
   HighTermDayOfYearSort      606.85      (6.7%)      608.65      (5.4%)    0.3% ( -11% -
  13%)
                  IntNRQ      987.08     (12.5%)      990.96     (13.5%)    0.4% ( -22% -
  30%)
               OrHighLow      553.72      (3.2%)      558.09      (3.5%)    0.8% (  -5% -
   7%)
BrowseDayOfYearSSDVFacets       38.23      (3.9%)       38.66      (4.1%)    1.1% (  -6% -
   9%)
   BrowseMonthSSDVFacets       42.05      (3.5%)       42.57      (3.7%)    1.2% (  -5% -
   8%)

{noformat}


was (Author: sokolov):
Following a suggestion from ~mikemccand I tried a slightly different version of this, making
use of randomAccessSlice to avoid some calls to seek(), and this gives better perf in the
benchmarks. I also spent some time trying to understand FST's backwards-seeking behavior.
Based on my crude understanding, and comment from Mike again, it seems as if with some work
it would be possible to make it more naturally forward-seeking, but it's not obvious that
in general you would get more local cache-friendly access patterns from that. Still you might;
probably needs some experimentation to know for sure. Here are the benchmark #s from the random-access
patch:
{noformat}
                    Task  QPS before      StdDev   QPS after      StdDev                Pct
diff
                PKLookup      133.62      (2.2%)      123.74      (1.5%)   -7.4% ( -10% -
  -3%)
              AndHighLow     3411.49      (3.2%)     3268.04      (3.1%)   -4.2% ( -10% -
   2%)
BrowseDayOfYearTaxoFacets    10067.18      (4.3%)     9828.65      (3.5%)   -2.4% (  -9% -
   5%)
                 LowTerm     3567.48      (1.2%)     3489.27      (1.7%)   -2.2% (  -5% -
   0%)
                  Fuzzy1      147.67      (3.1%)      144.65      (2.4%)   -2.0% (  -7% -
   3%)
   BrowseMonthTaxoFacets    10102.27      (4.2%)     9901.49      (4.1%)   -2.0% (  -9% -
   6%)
                  Fuzzy2       62.00      (2.8%)       60.87      (2.4%)   -1.8% (  -6% -
   3%)
                 MedTerm     2694.87      (2.0%)     2647.08      (2.1%)   -1.8% (  -5% -
   2%)
              AndHighMed     1171.52      (2.7%)     1154.25      (2.8%)   -1.5% (  -6% -
   4%)
                HighTerm     2061.53      (2.3%)     2032.84      (2.5%)   -1.4% (  -6% -
   3%)
         MedSloppyPhrase      266.60      (3.4%)      263.01      (4.2%)   -1.3% (  -8% -
   6%)
              OrHighHigh      278.90      (4.0%)      275.35      (4.7%)   -1.3% (  -9% -
   7%)
        HighSloppyPhrase      107.68      (5.5%)      106.34      (5.6%)   -1.2% ( -11% -
  10%)
                 Respell      118.26      (2.1%)      116.95      (2.2%)   -1.1% (  -5% -
   3%)
             AndHighHigh      472.93      (4.4%)      467.78      (3.3%)   -1.1% (  -8% -
   6%)
               OrHighMed      755.21      (2.9%)      748.34      (3.3%)   -0.9% (  -6% -
   5%)
             MedSpanNear      308.31      (3.3%)      305.59      (3.8%)   -0.9% (  -7% -
   6%)
                Wildcard      869.37      (3.5%)      862.74      (1.9%)   -0.8% (  -5% -
   4%)
       HighTermMonthSort      871.33      (7.1%)      865.80      (6.1%)   -0.6% ( -12% -
  13%)
               MedPhrase      449.39      (3.0%)      446.55      (2.4%)   -0.6% (  -5% -
   4%)
             LowSpanNear      391.10      (3.3%)      388.77      (3.8%)   -0.6% (  -7% -
   6%)
         LowSloppyPhrase      406.57      (3.8%)      404.23      (3.6%)   -0.6% (  -7% -
   7%)
              HighPhrase      239.84      (3.7%)      238.78      (3.3%)   -0.4% (  -7% -
   6%)
                 Prefix3     1230.56      (5.0%)     1225.52      (2.9%)   -0.4% (  -7% -
   7%)
            HighSpanNear      107.34      (5.2%)      107.20      (5.3%)   -0.1% ( -10% -
  10%)
               LowPhrase      438.52      (3.4%)      438.14      (2.5%)   -0.1% (  -5% -
   5%)
    BrowseDateTaxoFacets       11.14      (4.0%)       11.16      (7.0%)    0.2% ( -10% -
  11%)
   HighTermDayOfYearSort      606.85      (6.7%)      608.65      (5.4%)    0.3% ( -11% -
  13%)
                  IntNRQ      987.08     (12.5%)      990.96     (13.5%)    0.4% ( -22% -
  30%)
               OrHighLow      553.72      (3.2%)      558.09      (3.5%)    0.8% (  -5% -
   7%)
BrowseDayOfYearSSDVFacets       38.23      (3.9%)       38.66      (4.1%)    1.1% (  -6% -
   9%)
   BrowseMonthSSDVFacets       42.05      (3.5%)       42.57      (3.7%)    1.2% (  -5% -
   8%)

{noformat}

> Lazy loading Lucene FST offheap using mmap
> ------------------------------------------
>
>                 Key: LUCENE-8635
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8635
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: core/FSTs
>         Environment: I used below setup for es_rally tests:
> single node i3.xlarge running ES 6.5
> es_rally was running on another i3.xlarge instance
>            Reporter: Ankit Jain
>            Priority: Major
>         Attachments: offheap.patch, ra.patch, rally_benchmark.xlsx
>
>
> Currently, FST loads all the terms into heap memory during index open. This causes frequent
JVM OOM issues if the term size gets big. A better way of doing this will be to lazily load
FST using mmap. That ensures only the required terms get loaded into memory.
>  
> Lucene can expose API for providing list of fields to load terms offheap. I'm planning
to take following approach for this:
>  # Add a boolean property fstOffHeap in FieldInfo
>  # Pass list of offheap fields to lucene during index open (ALL can be special keyword
for loading ALL fields offheap)
>  # Initialize the fstOffHeap property during lucene index open
>  # FieldReader invokes default FST constructor or OffHeap constructor based on fstOffHeap
field
>  
> I created a patch (that loads all fields offheap), did some benchmarks using es_rally
and results look good.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message