lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Busch (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
Date Tue, 18 Jan 2011 16:20:45 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12983246#action_12983246
] 

Michael Busch commented on LUCENE-2324:
---------------------------------------

{quote}
I ran a quick perf test here: I built the 10M Wikipedia index,
Standard codec, using 6 threads. Trunk took 541.6 sec; RT took 518.2
sec (only a bit faster), but the test wasn't really fair because it
flushed @ docCount=12870.
{quote}

Thanks for running the tests!
Hmm that's a bit disappointing - we were hoping for more speedup.  
Flushing by docCount is currently per DWPT, so every initial segment
in your test had 12870 docs. I guess there's a lot of merging happening.

Maybe you could rerun with higher docCount?

bq. But I can't test flush by RAM - that's not working yet on RT right?

True.  I'm going to add that soonish.  There's one thread-safety bug 
related to deletes that needs to be fixed too.

{quote}
Then I ran a single-threaded test. Trunk took 1097.1 sec and RT took
1040.5 sec - a bit faster! Presumably in the noise (we don't expect
a speedup?), but excellent that it's not slower...
{quote}

Yeah I didn't expect much speedup - cool! :)  Maybe because some 
code is gone, like the WaitQueue, not sure how much overhead that 
added in the single-threaded case.

{quote}
I think we lost infoStream output on the details of flushing? I can't
see when which DWPTs are flushing...
{quote}

Oh yeah, good point, I'll add some infoStream messages to DWPT!

> Per thread DocumentsWriters that write their own private segments
> -----------------------------------------------------------------
>
>                 Key: LUCENE-2324
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2324
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Michael Busch
>            Priority: Minor
>             Fix For: Realtime Branch
>
>         Attachments: LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch,
LUCENE-2324-SMALL.patch, LUCENE-2324-SMALL.patch, LUCENE-2324.patch, LUCENE-2324.patch, LUCENE-2324.patch,
lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out, test.out, test.out, test.out
>
>
> See LUCENE-2293 for motivation and more details.
> I'm copying here Mike's summary he posted on 2293:
> Change the approach for how we buffer in RAM to a more isolated
> approach, whereby IW has N fully independent RAM segments
> in-process and when a doc needs to be indexed it's added to one of
> them. Each segment would also write its own doc stores and
> "normal" segment merging (not the inefficient merge we now do on
> flush) would merge them. This should be a good simplification in
> the chain (eg maybe we can remove the *PerThread classes). The
> segments can flush independently, letting us make much better
> concurrent use of IO & CPU.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message