lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley (JIRA)" <>
Subject [jira] Commented: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
Date Thu, 03 Jul 2008 13:36:45 GMT


Yonik Seeley commented on LUCENE-1316:

{quote}but ... thinking about how SegmentTermDocs are constructed for a moment,
isn't the (unsynchronized) usage of deletedDocs in
prone to the same concern you had about removing (or reducing) the
synchronization in SegmentReader.isDeleted ... "deletes aren't instantly
visible across threads" ... are they?

No, deletes aren't instantly visible across threads (when one thread has started a query and
the other thread does a delete).  AFAIK it's always been that way, so I think it's probably
acceptable.  On a practical level, seeking on a TermDocs crosses synchronization points, so
deletes won't go unrecognized by other threads forever either.

But there is a thread-safety issue I've been mulling over since I wrote this patch. is fine... unsynchronized reads look benign on the BitVector class
since writes just change a byte at a time.  "count" could be off, but that's OK too... it
will simply be a stale value since updates to it are atomic.

The issue is the possibility of grabbing a partially constructed BitVector object to begin
with.  Notice how I synchronize to avoid this in AllTermDocs:
  protected AllTermDocs(SegmentReader parent) {
    synchronized (parent) {
      this.deletedDocs = parent.deletedDocs;
    this.maxDoc = parent.maxDoc();

That should probably be done in SegmentTermDocs too.  Without it, a null pointer exception
or an array out of bounds exception could result when checking the BitVector.

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>                 Key: LUCENE-1316
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: LUCENE_1316.patch, LUCENE_1316.patch, LUCENE_1316.patch,
>   Original Estimate: 1h
>  Remaining Estimate: 1h
> The isDeleted() method on IndexReader has been mentioned a number of times as a potential
synchronization bottleneck. However, the reason this  bottleneck occurs is actually at a higher
level that wasn't focused on (at least in the threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, higher in
the stack you see the method. In Solr paricularly, this scorer is used
for "NOT" queries. We saw incredibly poor performance (order of magnitude) on our load tests
for NOT queries, due to this bottleneck. The problem is that every single document is run
through this isDeleted() method, which is synchronized. Having an optimized index exacerbates
this issues, as there is only a single SegmentReader to synchronize on, causing a major thread
pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in the reader,
much of this can be avoided. Especially in a read-only environment for production where you
have slaves doing all the high load searching.
> I modified line 67 in the MatchAllDocsQuery
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance improvement.
 We also got the same query results. I don't believe this will improve the situation for indexes
that have deletions. 
> Please consider making this adjustment for a future bug fix release.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message