lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Olivér Szabó (JIRA) <>
Subject [jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud
Date Tue, 08 Jan 2019 16:25:00 GMT


Olivér Szabó commented on SOLR-13101:

[], FYI, i had a POC with these, see: (custom solr build based on
solr tarball)
(used hdfs client ... worked on only real environments... but included localstack, gcs emulator
as a container..actually s3a setup can work against localstack, but that one is broken)
some notes:
- i replaced hadoop jars with custom hwx ones (those with 2.7.x build contains some classes
that is not there in apache maven repo ones)
- s3n looked good, s3a seems to be broken but it would require some changes in aws-sdk (requires
to use shared connection pool, that can be set in http client).
- wasb/wasbs looked good
- adlsV2 had some ssl related issues (although it did not used ssl) - some cipher problems,
i used solr with jdk10 in docker, maybe that caused some issues
- gcs connector uses guava 27, solr is using like 14, so that results a ClassDefNotFound exception
during loading the gcs fs implementation, maybe that can be solved with updating to a new
guava or shade gcs-connector jar with the dependencies

what i have mostly see, i could create shards then adding documents as well. interestingly
a simple delete query only deleted like 40% of the documents (then request failed)
also after stopping solr containers, write.lock files needs to be deleted from cloud storage,
it would be nice if we would have an option to delete those on startup (not sure solr already
have this or not)

> Shared storage support in SolrCloud
> -----------------------------------
>                 Key: SOLR-13101
>                 URL:
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>            Reporter: Yonik Seeley
>            Priority: Major
> Solr should have first-class support for shared storage (blob/object stores like S3,
google cloud storage, etc. and shared filesystems like HDFS, NFS, etc).
> The key component will likely be a new replica type for shared storage.  It would have
many of the benefits of the current "pull" replicas (not indexing on all replicas, all shards
identical with no shards getting out-of-sync, etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>    - durability not linked to number of replcias.. a single replica could be common for
write workloads
>    - could drop to 0 replicas for a shard when not needed (blob store always has index)
>  - Allow for higher performance write workloads by skipping the transaction log
>    - don't pay for what you don't need
>    - a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with blob stores.
 We probably want one that treats local disk as a cache since the latency to remote storage
is so large.  I think there are still some "locking" issues to be solved here (ensuring that
more than one writer to the same index won't corrupt it).  This should probably be pulled
out into a different JIRA issue.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message