lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-8673) Use radix partitioning when merging dimensional points
Date Thu, 31 Jan 2019 21:02:00 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-8673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir updated LUCENE-8673:
--------------------------------
    Description: 
Following the advise of [~jpountz] in LUCENE-8623I have investigated using radix selection when
merging segments instead of sorting the data at the beginning. The results are pretty promising when
running Lucene geo benchmarks:

 
||Approach||Index time (sec): Dev||Index Time (sec): Base||Index Time: Diff||Force merge time
(sec): Dev||Force Merge time (sec): Base||Force Merge Time: Diff||Index size (GB): Dev||Index
size (GB): Base||Index Size: Diff||Reader heap (MB): Dev||Reader heap (MB): Base||Reader heap:
Diff
|points|241.5s|235.0s| 3%|157.2s|157.9s|-0%|0.55|0.55| 0%|1.57|1.57| 0%|
|shapes|416.1s|650.1s|-36%|306.1s|603.2s|-49%|1.29|1.29| 0%|1.61|1.61| 0%|
|geo3d|261.0s|360.1s|-28%|170.2s|279.9s|-39%|0.75|0.75| 0%|1.58|1.58| 0%|
 
edited: table formatting to be a jira table
 

In 2D the index throughput is more or less equal but for higher dimensions the impact is quite
big. In all cases the merging process requires much less disk space, I am attaching plots
showing the different behaviour and I am opening a pull request.

 

 

 

  was:
Following the advise of [~jpountz] in LUCENE-8623I have investigated using radix selection when
merging segments instead of sorting the data at the beginning. The results are pretty promising when
running Lucene geo benchmarks:

 
{code:java}
||Approach||Index time (sec)||Force merge time (sec)||Index size (GB)||Reader heap (MB)||
          ||Dev||Base||Diff ||Dev  ||Base  ||diff   ||Dev||Base||Diff||Dev||Base||Diff
||
|points|241.5s|235.0s| 3%|157.2s|157.9s|-0%|0.55|0.55| 0%|1.57|1.57| 0%|
|shapes|416.1s|650.1s|-36%|306.1s|603.2s|-49%|1.29|1.29| 0%|1.61|1.61| 0%|
|geo3d|261.0s|360.1s|-28%|170.2s|279.9s|-39%|0.75|0.75| 0%|1.58|1.58| 0%|{code}
 

 

In 2D the index throughput is more or less equal but for higher dimensions the impact is quite
big. In all cases the merging process requires much less disk space, I am attaching plots
showing the different behaviour and I am opening a pull request.

 

 

 


> Use radix partitioning when merging dimensional points
> ------------------------------------------------------
>
>                 Key: LUCENE-8673
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8673
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Ignacio Vera
>            Priority: Major
>         Attachments: Geo3D.png, LatLonPoint.png, LatLonShape.png
>
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Following the advise of [~jpountz] in LUCENE-8623I have investigated using radix selection when
merging segments instead of sorting the data at the beginning. The results are pretty promising when
running Lucene geo benchmarks:
>  
> ||Approach||Index time (sec): Dev||Index Time (sec): Base||Index Time: Diff||Force merge
time (sec): Dev||Force Merge time (sec): Base||Force Merge Time: Diff||Index size (GB): Dev||Index
size (GB): Base||Index Size: Diff||Reader heap (MB): Dev||Reader heap (MB): Base||Reader heap:
Diff
> |points|241.5s|235.0s| 3%|157.2s|157.9s|-0%|0.55|0.55| 0%|1.57|1.57| 0%|
> |shapes|416.1s|650.1s|-36%|306.1s|603.2s|-49%|1.29|1.29| 0%|1.61|1.61| 0%|
> |geo3d|261.0s|360.1s|-28%|170.2s|279.9s|-39%|0.75|0.75| 0%|1.58|1.58| 0%|
>  
> edited: table formatting to be a jira table
>  
> In 2D the index throughput is more or less equal but for higher dimensions the impact
is quite big. In all cases the merging process requires much less disk space, I am attaching
plots showing the different behaviour and I am opening a pull request.
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message