lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] [Commented] (LUCENE-7583) Can we improve OutputStreamIndexOutput's byte buffering when writing each BKD leaf block?
Date Tue, 06 Dec 2016 15:34:58 GMT


Michael McCandless commented on LUCENE-7583:

bq. Are we sure that we do not open the IndexOutput in one thread and had it over to another

Yeah, the {{IndexOutput}} is opened in {{Lucene60PointsWriter}}, and then that same thread
goes and writes all points via {{writeField}}.  At IW flush time it's an indexing thread,
and at merge time it's a merge thread, but it should only ever be a single thread touching
that {{IndexOutput}}.  The benchmark I'm running only ever uses a single thread anyway ...

bq. we should also make all references to the IndexOutput private, so it cannot escape the
current thread (to help hotspot). This means: no non-private fields holding the reference
to the stream.

I'll try to do this; there's at least one place where it's protected, but that's way high
up in the stack ({{Lucene60PointsWriter}}).

bq. If we are really required to fork the buffered stream, we may use:
(but without the DataOutput interface impl).

I'll test that too.

Thanks [~thetaphi].

> Can we improve OutputStreamIndexOutput's byte buffering when writing each BKD leaf block?
> -----------------------------------------------------------------------------------------
>                 Key: LUCENE-7583
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>             Fix For: master (7.0), 6.4
>         Attachments: LUCENE-7583-hardcode-writeVInt.patch, LUCENE-7583.patch
> When BKD writes its leaf blocks, it's essentially a lot of tiny writes (vint, int, short,
etc.), and I've seen deep thread stacks through our IndexOutput impl ({{OutputStreamIndexOutput}})
when pulling hot threads while BKD is writing.
> So I tried a small change, to have BKDWriter do its own buffering, by first writing each
leaf block into a {{RAMOutputStream}}, and then dumping that (in 1 KB byte[] chunks) to the
actual IndexOutput.
> This gives a non-trivial reduction (~6%) in the total time for BKD writing + merging
time on the 20M NYC taxis nightly benchmark (2 times each):
> Trunk, sparse:
>   - total: 64.691 sec
>   - total: 64.702 sec
> Patch, sparse:
>   - total: 60.820 sec
>   - total: 60.965 sec
> Trunk dense:
>   - total: 62.730 sec
>   - total: 62.383 sec
> Patch dense:
>   - total: 58.805 sec
>   - total: 58.742 sec
> The results seem to be consistent and reproducible.  I'm using Java 1.8.0_101 on a fast
SSD on Ubuntu 16.04.
> It's sort of weird and annoying that this helps so much, because {{OutputStreamIndexOutput}}
already uses java's {{BufferedOutputStream}} (default 8 KB buffer) to buffer writes.
> [~thetaphi] suggested maybe hotspot is failing to inline/optimize the {{writeByte}} /
the call stack just has too many layers.
> We could commit this patch (it's trivial) but it'd be nice to understand and fix why
buffering writes is somehow costly so any other Lucene codec components that write lots of
little things can be improved too.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message