lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-5722) Speed up MMapDirectory.seek()
Date Sun, 01 Jun 2014 20:27:01 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Uwe Schindler updated LUCENE-5722:
----------------------------------

    Attachment: LUCENE-5722.patch

Here is my patch. I also added a new test for slices of slices, so we can be sure that offsets
are correctly passed through (this was the hardest part). The workaround is the protected
method added t the base class, which is implemented only for the offsets-aware impl.

While debugging the stuff and also looking around for the TODO Robert added, I found out what
is wrong: The issue is bug LUCENE-5658, which is now understood.

The main problem was that when calculating the limit of the last buffer in the slice, this
blows up for the special case when the slice ends exactly at a chunkSize boundary. So this
commit would also fix LUCENE-5658.

Nevertheless, I will commit the fix for LUCENE-5658 separately.

> Speed up MMapDirectory.seek()
> -----------------------------
>
>                 Key: LUCENE-5722
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5722
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: LUCENE-5722.patch, LUCENE-5722.patch
>
>
> For traditional lucene access which is mostly sequential, occasional advance(), I think
this method gets drowned out in noise.
> But for access like docvalues, its important. Unfortunately seek() is complex today because
of mapping multiple buffers.
> However, the very common case is that only one map is used for a given clone or slice.
> When there is the possibility to use only a single mapped buffer, we should instead take
advantage of ByteBuffer.slice(), which will adjust the internal mmap address and remove the
offset calculation. furthermore we don't need the shift/mask or even the negative check, as
they are then all handled with the ByteBuffer api: seek is a one-liner (with try/catch of
course to convert exceptions).
> This makes docvalues access 20% faster, I havent tested conjunctions or anyhting like
that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message