lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-12259) Robustly upgrade indexes
Date Thu, 01 Nov 2018 17:56:00 GMT

    [ https://issues.apache.org/jira/browse/SOLR-12259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16671938#comment-16671938
] 

Erick Erickson commented on SOLR-12259:
---------------------------------------

[~cpoerschke] So right here on my list it says "create a test" for the work here. WDYT of
co-optingĀ 

UninvertDocValuesMergePolicyTest? IOW, how lazy can I be?

I took a quick look at it and I think all I'd need to do is replace the optimize step at line
114 with a call to my new spiffy rewriteWithPolicy update parameter.

Hmmm, I suppose that really I could just randomize the optimize and rewriteWithPolicy approaches.

I want to  emphasize to everyone that overloading update is a PoC bit only so we can poke
holes in the general approach, TBD is where this stuff really lives.

> Robustly upgrade indexes
> ------------------------
>
>                 Key: SOLR-12259
>                 URL: https://issues.apache.org/jira/browse/SOLR-12259
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>            Priority: Major
>
> The general problem statement is that the current upgrade path is trappy and cumbersome.
 It would be a great help "in the field" to make the upgrade process less painful.
> Additionally one of the most common things users want to do is enable docValues, but
currently they often have to re-index.
> Issues:
> 1> if I upgrade from 5x to 6x and then 7x, theres no guarantee that when I go to 7x
all the segments have been rewritten in 6x format. Say I have a segment at max size that has
no deletions. It'll never be rewritten until it has deleted docs. And perhaps 50% deleted
docs currently.
> 2> IndexUpgraderTool explicitly does a forcemerge to 1 segment, which is bad.
> 3> in a large distributed system, running IndexUpgraderTool on all the nodes is cumbersome
even if <2> is acceptable.
> 4> Users who realize specifying docValues on a field would be A Good Thing have to
re-index. We have UninvertDocValuesMergePolicyFactory. Wouldn't it be nice to be able to have
this done all at once without forceMerging to one segment.
> Proposal:
> Somehow avoid the above. Currently LUCENE-7976 is a start in that direction. It will
make TMP respect max segments size so can avoid forceMerges that result in one segment. What
it does _not_ do is rewrite segments with zero (or a small percentage) deleted documents.
> So it  doesn't seem like a huge stretch to be able to specify to TMP the option to rewrite
segments that have no deleted documents. Perhaps a new parameter to optimize?
> This would likely require another change to TMP or whatever.
> So upgrading to a new solr would look like
> 1> install the new Solr
> 2> execute "http://node:port/solr/collection_or_core/update?optimize=true&upgradeAllSegments=true"
> What's not clear to me is whether we'd require UninvertDocValuesMergePolicyFactory to
be specified and wrap TMP or not.
> Anyway, let's discuss. I'll create yet another LUCENE JIRA for TMP do rewrite all segments
that I'll link.
> I'll also link several other JIRAs in here, they're coalescing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message