lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lanny Ripple (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-7820) IndexFetcher should calculate ahead of time how much space is needed for full snapshot based recovery and cleanly abort instead of trying and running out of space on a node
Date Wed, 01 Jun 2016 22:20:59 GMT

    [ https://issues.apache.org/jira/browse/SOLR-7820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311244#comment-15311244
] 

Lanny Ripple commented on SOLR-7820:
------------------------------------

Experiencing this right now since as a startup pinching penny's isn't optional.  We're about
70% allocated on disk with 60 or so shards over a dozen or two collections.  If any couple
of replicas throw a hissy it's not a big deal for Solr to recover.  If a node goes down, or
in one case the AWS instance starts being flaky, then we fill disk and get to spend a lot
of time baby sitting the recovery.

If Solr sequencing recovery to avoid blowing disk isn't a good idea then please at least expose
tooling to make it easier for a human to do the same thing.  Even a way to start Solr without
immediately trying to sync would be a win.  When Solr goes all-in to recover then the collections
API times out on DELETEREPLICA.

> IndexFetcher should calculate ahead of time how much space is needed for full snapshot
based recovery and cleanly abort instead of trying and running out of space on a node
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-7820
>                 URL: https://issues.apache.org/jira/browse/SOLR-7820
>             Project: Solr
>          Issue Type: Improvement
>          Components: replication (java)
>            Reporter: Timothy Potter
>
> When a replica is trying to recover and it's IndexFetcher decides it needs to pull the
full index from a peer (isFullCopyNeeded == true), then the existing index directory should
be deleted before the full copy is started to free up disk to pull a fresh index, otherwise
the server will potentially need 2x the disk space (old + incoming new). Currently, the IndexFetcher
removes the index directory after the new is downloaded; however, once the fetcher decides
a full copy is needed, what is the value of the existing index? It's clearly out-of-date and
should not serve queries. Since we're deleting data preemptively, maybe this should be an
advanced configuration property, only to be used by those that are disk-space constrained
(which I'm seeing more and more with people deploying high-end SSDs - they typically don't
have 2x the disk capacity required by an index).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message