lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Megan Carey (Jira)" <>
Subject [jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud
Date Mon, 09 Sep 2019 22:36:00 GMT


Megan Carey commented on SOLR-13101:

h2. SolrCloud + Blobstore
h3. Overview

This repo introduces a new framework which allows SolrCloud to integrate with an external
(typically cloud-based) blobstore. Instead of maintaining a copy of the index on each Solr
host, replicating updates to peers, and using a transaction log to maintain consistent ordered
updates, Solr hosts will push and pull cores to/from this external store.

TL;DR: For now, SolrCloud can be configured to use blobstore at a collection level. Collections
backed by blobstore use a new SHARED replica type. When a Solr node makes an update request
to a shared shard, it indexes locally and then pushes the change through to a shared blobstore.
Zookeeper manages index versioning and provides a source of truth in the case of concurrent
writes. Solr nodes in a cluster will no longer use peer-to-peer replication, and instead will
pull updates directly from the shared blobstore.

Please note that this project is a work in progress, and is by no means production-ready.
This code is being published early get feedback, which we will incorporate in future work.

In order to modularize these changes and maintain existing functionality, most of the blobstore-related
code is isolated to the _solr/core/src/java/org/apache/solr/store/blob directory_. However,
there some key integration touchpoints in _HttpSolrCall#init_, _DistributedZkUpdateProcessor_,
and _CoreContainer#load_. These classes all have special handling for blobstore-based shards.
h3. Pulling from Blobstore

Core pulls are, for the most part, asynchronous. When a replica is queried, it enqueues a
pull from blobstore but doesn’t wait for the pull to complete before it executes the query,
unless the replica is missing a copy of that core altogether. If your operation requires that
local cores are in-sync with blobstore, use the method _BlobStoreUtils#syncLocalCoreWithSharedStore_.

A more in-depth walkthrough of the pull code:
 * _BlobCoreSyncer_: manages threads that sync between local and blob store, so that if a
pull is in progress, we do not create duplicate work.
 * Calls into _CorePullTracker_: creates _PullCoreInfo_ object containing data about the core
to be pulled and adds to a deduplicated list.
 * This queue of pull objects is polled by the _CorePullerFeeder_, which uses threads from
its dedicated thread pool to execute _CorePullTask_s.
 * _CorePullTask_: checks if a pull is already underway for this core; if not, executes a
pull from blob store. Resolves differences between blob’s version of the core and local
version, and stores the updated core

h3. Pushing to Blobstore

This happens synchronously. On every local commit, we push to blobstore and only ack that
the update was successful when it is committed both locally and in the shared store.

A more in-depth walkthrough of the push code:
 * _DistributedZkUpdateProcessor_: once a commit is complete for a _SHARED_ replica (_onFinish_),
we _writeToShareStore_.
 * This calls into _CoreUpdateTracker_, which creates a _PushPullData_ object containing data
about the collection, core, and most recently pulled version of the core on this replica.
 * _CorePusher_: resolves the differences between blob’s version of the core and local version,
and pushes the updated version to blob store

h3. Resolving Local and Blobstore

The _SharedStoreResolutionUtil_ handles resolving diffs between the Solr node’s local copy
of a core and the copy in blobstore. It does so by pulling the metadata for the core from
blobstore (_BlobCoreMetadata_), comparing against the local metadata (_ServerSideMetadata_),
and creating a list of segments to push or pull.
h3. Version Management

Only the leader node can push updates to blobstore. Because a new leader can be elected at
any time, there is still a possibility for race conditions on writes to blobstore. In order
to maintain a consistent global view of the latest version of a core, we keep version data
in Zookeeper.

Zookeeper stores this version data as a random string called _metadataSuffix_. When a SolrCloud
node makes an update request, it first pushes the files to blobstore and then makes a conditional
update to the metadataSuffix variable. If Zookeeper rejects the conditional update, the update
request fails, and the failure is propagated back to the client.

This communication with Zookeeper is coordinated in the _SharedShardMetadataController_. The
SharedShardMetadataController belongs to the _Overseer_ (i.e. the leader replica).
h3. Try it yourself

If you want to try this out locally, you can start up SolrCloud with the given blobstore code.
The code will default to using the local blobstore client, with "/tmp/BlobStoreLocal" as the
blobstore directory (see _LocalStorageClient_). You can create a shared collection through
the Solr admin UI by setting “shared store based” to true.

Note: if you want to try testing with the _S3StorageClient_, you need to store a valid S3
bucket name and credentials as environment variables (see _S3StorageClient#AmazonS3Configs_).

> Shared storage support in SolrCloud
> -----------------------------------
>                 Key: SOLR-13101
>                 URL:
>             Project: Solr
>          Issue Type: New Feature
>          Components: SolrCloud
>            Reporter: Yonik Seeley
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
> Solr should have first-class support for shared storage (blob/object stores like S3,
google cloud storage, etc. and shared filesystems like HDFS, NFS, etc).
> The key component will likely be a new replica type for shared storage.  It would have
many of the benefits of the current "pull" replicas (not indexing on all replicas, all shards
identical with no shards getting out-of-sync, etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>    - durability not linked to number of replcias.. a single replica could be common for
write workloads
>    - could drop to 0 replicas for a shard when not needed (blob store always has index)
>  - Allow for higher performance write workloads by skipping the transaction log
>    - don't pay for what you don't need
>    - a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with blob stores.
 We probably want one that treats local disk as a cache since the latency to remote storage
is so large.  I think there are still some "locking" issues to be solved here (ensuring that
more than one writer to the same index won't corrupt it).  This should probably be pulled
out into a different JIRA issue.

This message was sent by Atlassian Jira

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message