lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-7122) BytesRefArray can be more efficient for fixed width values
Date Mon, 21 Mar 2016 08:52:25 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15203913#comment-15203913
] 

Michael McCandless commented on LUCENE-7122:
--------------------------------------------

bq. if it's really a gain worth the short

Hmm, to clarify here: I'm not trying to optimize away that on-disk short (2 bytes) here. 
We could do that, later.

I'm trying to optimize away the in-RAM int (4 bytes) that {{OfflineSorter}} (because of {{BytesRefArray}})
now uses when sorting each in-heap partition.

I do think this is an important/worthwhile optimization:

E.g. if you are indexing 1D {{IntPoint}} s, which I suspect is a common case, today we need
12 bytes per value, and with this patch, 8 bytes per value, which means {{OfflineSorter}}
can sort more values in heap before it must spill to disk, can create larger initial segments,
so it can index more values before requiring 2nd level merges, etc.

The gains are still sizable for the 2D cases, e.g. a {{LatLonPoint}} would only need 12 bytes
per value vs the 16 bytes today.


> BytesRefArray can be more efficient for fixed width values
> ----------------------------------------------------------
>
>                 Key: LUCENE-7122
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7122
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: master, 6.1
>
>         Attachments: LUCENE-7122.patch
>
>
> Today {{BytesRefArray}} uses one int ({{int[]}}, overallocated) per
> value to hold the length, but for dimensional points these values are
> always the same length. 
> This can save another 4 bytes of heap per indexed dimensional point,
> which is a big improvement (more points can fit in heap at once) for
> 1D and 2D lat/lon points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message