lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Timothy Potter (JIRA)" <>
Subject [jira] [Updated] (SOLR-7332) Seed version buckets with max version from index
Date Fri, 03 Apr 2015 15:42:53 GMT


Timothy Potter updated SOLR-7332:
    Attachment: SOLR-7332.patch

Ignore that previous - it had a dead-lock condition in it when doing core reloads :-(

I think this updated patch is close to commit for trunk. I've added a distributed test that
uses multiple threads to send docs, reload the collection, and commit data - beast passes
20 of 20. However, there are 2 areas that need review:

1) How I'm calling {{UpdateLog.onFirstSearcher}} in SolrCore. I was getting a multiple on-deck
searcher warning during a core reload because the getSearcher method gets called twice during
a reload and if the max version lookup took a little time, then the warning would occur. So
I'm calling this as part of the main thread vs. in the background executor. This is of course
will block the reload until it finishes but I think given the importance of getting the version
buckets seeded correctly, that's OK. Let me know if there's a better way.

2) Originally, I was synchronizing the seedBucketVersionHighestFromIndex method in UpdateLog,
but that led to dead-lock when doing reloads because updates continue to flow in while reload
occurs (and DistributedUpdateProcessor versionAdd gets the lock on versionBuckets and calls
synchronized methods on UpdateLog). So I've switched to using the versionInfo.blockUpdates
while looking up the max version from the index, see: {{UpdateLog.onFirstSearcher}}. My thinking
here is that we actually want to block updates briefly after a reload when getting the max
from the index so that we don't end up setting the version too low.

Also, minor, but I removed the SortedNumericDocValues stuff from the {{VersionInfo#seedBucketVersionHighestFromIndex}}
method from the previous patch since Solr doesn't have support for that yet and it was a mis-understanding
on my part of how that type of field works. So now the lookup of max either uses terms if
version is indexed or a range query if not indexed.

> Seed version buckets with max version from index
> ------------------------------------------------
>                 Key: SOLR-7332
>                 URL:
>             Project: Solr
>          Issue Type: Sub-task
>          Components: SolrCloud
>            Reporter: Timothy Potter
>            Assignee: Timothy Potter
>         Attachments: SOLR-7332.patch, SOLR-7332.patch, SOLR-7332.patch
> See full discussion with Yonik and I in SOLR-6816.
> The TL;DR of that discussion is that we should initialize highest for each version bucket
to the MAX value of the {{__version__}} field in the index as early as possible, such as after
the first soft- or hard- commit. This will ensure that bulk adds where the docs don't exist
avoid an unnecessary lookup for a non-existent document in the index.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message