lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Lucene Index Cloud Replication
Date Tue, 09 Jul 2019 12:50:16 GMT
+1 to share code for doing 1) and 3) both of which are tricky!

Safely moving / copying bytes around is a notoriously difficult problem ...
but Lucene's "end to end checksums" and per-segment-file-GUID make this

I think Lucene's replicator module is a good place for this?

Mike McCandless

On Wed, Jul 3, 2019 at 4:15 PM Michael Froh <> wrote:

> Hi there,
> I was talking with Varun at Berlin Buzzwords a couple of weeks ago about
> storing and retrieving Lucene indexes in S3, and realized that "uploading a
> Lucene directory to the cloud and downloading it on other machines" is a
> pretty common problem and one that's surprisingly easy to do poorly. In my
> current job, I'm on my third team that needed to do this.
> In my experience, there are three main pieces that need to be implemented:
> 1. Uploading/downloading individual files (i.e. the blob store), which can
> be eventually consistent if you write once.
> 2. Describing the metadata for a specific commit point (basically what the
> Replicator module does with the "Revision" class). In particular, we want a
> downloader to reliably be able to know if they already have specific files
> (and don't need to download them again).
> 3. Sharing metadata with some degree of consistency, so that multiple
> writers don't clobber each other's metadata, and so readers can discover
> the metadata for the latest commit/revision and trust that they'll
> (eventually) be able to download the relevant files.
> I'd like to share what I've got for 1 and 3, based on S3 and DynamoDB, but
> I'd like to do it with  interfaces that lend themselves to other
> implementations for blob and metadata storage.
> Is it worth opening a Jira issue for this? Is this something that would
> benefit the Lucene community?
> Thanks,
> Michael Froh

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message