lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Sokolov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap
Date Tue, 15 Jan 2019 13:42:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743067#comment-16743067
] 

Mike Sokolov commented on LUCENE-8635:
--------------------------------------

This looked interesting to me, too, so I did run the becnhmarks with the change, but sadly
the results were not great, which is surprising given the Rally test results, which looked
positive I think? I'm not really sure how to interpret Rally output since I'm not familiar
wit hthat tool. Does it test query performance? Maybe there is a use case for this that is
different than what is being tested by the benchmarks; here is what I saw after a benchmark
run. This run is maybe a little unusual since I have some mods to the benchmark (running w/8
threads executor service, enabled indexSort, topN=500 b/c of some other tests I was running.
I can re-run with more "normal" settings, but this already looks kind of suspect.
{noformat}
                    Task  QPS before      StdDev   QPS after      StdDev                Pct
diff
                PKLookup      163.94      (2.3%)      123.50      (2.0%)  -24.7% ( -28% -
 -20%)
              AndHighLow     5096.79      (1.2%)     4860.87      (1.5%)   -4.6% (  -7% -
  -2%)
                  Fuzzy1      711.37      (2.3%)      681.03      (2.4%)   -4.3% (  -8% -
   0%)
                  Fuzzy2      203.67      (2.6%)      196.77      (2.6%)   -3.4% (  -8% -
   1%)
              AndHighMed     3460.06      (2.7%)     3346.84      (3.2%)   -3.3% (  -8% -
   2%)
               LowPhrase     3448.68      (2.8%)     3345.41      (2.7%)   -3.0% (  -8% -
   2%)
         LowSloppyPhrase     3278.72      (2.9%)     3184.03      (2.8%)   -2.9% (  -8% -
   2%)
             LowSpanNear     3123.68      (2.9%)     3040.74      (2.6%)   -2.7% (  -7% -
   2%)
                 Respell      716.61      (1.7%)      699.22      (1.8%)   -2.4% (  -5% -
   1%)
               MedPhrase     2970.83      (3.2%)     2899.18      (3.0%)   -2.4% (  -8% -
   3%)
             AndHighHigh     2626.26      (3.7%)     2563.37      (4.0%)   -2.4% (  -9% -
   5%)
         MedSloppyPhrase     2642.66      (3.6%)     2582.02      (3.3%)   -2.3% (  -8% -
   4%)
             MedSpanNear     2598.01      (3.5%)     2541.03      (3.2%)   -2.2% (  -8% -
   4%)
    BrowseDateTaxoFacets     3467.39      (2.7%)     3399.62      (3.3%)   -2.0% (  -7% -
   4%)
                 LowTerm     3896.13      (4.7%)     3824.62      (4.4%)   -1.8% ( -10% -
   7%)
            HighSpanNear     1511.97      (4.7%)     1484.42      (4.6%)   -1.8% ( -10% -
   7%)
               OrHighMed     1406.84      (5.7%)     1382.52      (5.8%)   -1.7% ( -12% -
  10%)
               OrHighLow     1484.58      (6.1%)     1460.06      (6.0%)   -1.7% ( -12% -
  11%)
              HighPhrase     1740.06      (4.5%)     1712.12      (4.4%)   -1.6% ( -10% -
   7%)
        HighSloppyPhrase     1547.60      (4.7%)     1523.48      (4.6%)   -1.6% ( -10% -
   8%)
   BrowseMonthTaxoFacets     9031.31      (2.1%)     8897.26      (2.6%)   -1.5% (  -6% -
   3%)
              OrHighHigh     1111.59      (6.3%)     1095.29      (6.5%)   -1.5% ( -13% -
  12%)
   HighTermDayOfYearSort     2197.07      (5.9%)     2166.89      (3.9%)   -1.4% ( -10% -
   8%)
                 MedTerm     2621.21      (5.3%)     2586.41      (5.0%)   -1.3% ( -11% -
   9%)
BrowseDayOfYearTaxoFacets     9011.41      (1.6%)     8907.44      (1.5%)   -1.2% (  -4% -
   1%)
       HighTermMonthSort     2449.33      (5.5%)     2421.11      (4.4%)   -1.2% ( -10% -
   9%)
                HighTerm     1629.92      (6.5%)     1612.72      (6.4%)   -1.1% ( -13% -
  12%)
                  IntNRQ      980.43      (9.1%)      973.72      (8.9%)   -0.7% ( -17% -
  19%)
                Wildcard     1779.82      (5.7%)     1771.12      (5.5%)   -0.5% ( -11% -
  11%)
                 Prefix3     1790.47      (5.9%)     1781.85      (5.8%)   -0.5% ( -11% -
  11%)
BrowseDayOfYearSSDVFacets     2038.63      (3.0%)     2032.32      (2.1%)   -0.3% (  -5% -
   4%)
   BrowseMonthSSDVFacets     2295.02      (2.5%)     2303.01      (1.9%)    0.3% (  -4% -
   4%)
{noformat}

> Lazy loading Lucene FST offheap using mmap
> ------------------------------------------
>
>                 Key: LUCENE-8635
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8635
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: core/FSTs
>         Environment: I used below setup for es_rally tests:
> single node i3.xlarge running ES 6.5
> es_rally was running on another i3.xlarge instance
>            Reporter: Ankit Jain
>            Priority: Major
>         Attachments: offheap.patch, rally_benchmark.xlsx
>
>
> Currently, FST loads all the terms into heap memory during index open. This causes frequent
JVM OOM issues if the term size gets big. A better way of doing this will be to lazily load
FST using mmap. That ensures only the required terms get loaded into memory.
>  
> Lucene can expose API for providing list of fields to load terms offheap. I'm planning
to take following approach for this:
>  # Add a boolean property fstOffHeap in FieldInfo
>  # Pass list of offheap fields to lucene during index open (ALL can be special keyword
for loading ALL fields offheap)
>  # Initialize the fstOffHeap property during lucene index open
>  # FieldReader invokes default FST constructor or OffHeap constructor based on fstOffHeap
field
>  
> I created a patch (that loads all fields offheap), did some benchmarks using es_rally
and results look good.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message