lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-5722) Speed up MMapDirectory.seek()
Date Sun, 01 Jun 2014 16:13:01 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14015012#comment-14015012
] 

Uwe Schindler commented on LUCENE-5722:
---------------------------------------

I looked at the code very long time, also at Roberts patch.

I found out: the subclassing issue can be solved quite easily: We dont need to make ByteBufferIndexInput
abstract, the solution would be to pass some "unmapper" instance to the constructor that does
the unmapping, so freeBuffers does not need to be abstract. In that case we can use ByteBufferIndexInput
as concrete class.

The second thing that is an issue in MultiMmap-Seek is the problem with the offset. The offset
is in ByteBufferIndexInput only used in seek and when creating slices/clones. The idea is
now, to completely remove the offset from the base class. The base class is useable for the
case when offset=0 and multiple buffers are used. The whole chekcs at the beginning of seek()
are then useless, because they only apply for the case offset=0. In all other cases we already
catch the out-of-bounds cases by AIOOBE and similar.

The special cases would then be:
- SingleByteBufferIndexInput extends ByteBufferIndexInput: we can remove the assert, because
offset no longer exists in this base class. We always use ByteBuffer.slice here.
- The other special case is offset!=0 for multi-mmap: In that case we have a second concreate
subclass, that just overrides seek() to do the offset checks at the beginning and if all is
adjusted call super.seek().

The cloning/slicing can be done much easier and we just include the offset here.

Furthermore, I made a small improvement to the ByteBufferIndexInput.seek() for the case if
seeking happens inside the same buffer. With the optimizations above the whole thing is then
mostly a simple position() call on the byte buffer with a few calculations.

I will resort all this stuff an provide a patch!

> Speed up MMapDirectory.seek()
> -----------------------------
>
>                 Key: LUCENE-5722
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5722
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: LUCENE-5722.patch
>
>
> For traditional lucene access which is mostly sequential, occasional advance(), I think
this method gets drowned out in noise.
> But for access like docvalues, its important. Unfortunately seek() is complex today because
of mapping multiple buffers.
> However, the very common case is that only one map is used for a given clone or slice.
> When there is the possibility to use only a single mapped buffer, we should instead take
advantage of ByteBuffer.slice(), which will adjust the internal mmap address and remove the
offset calculation. furthermore we don't need the shift/mask or even the negative check, as
they are then all handled with the ByteBuffer api: seek is a one-liner (with try/catch of
course to convert exceptions).
> This makes docvalues access 20% faster, I havent tested conjunctions or anyhting like
that.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message