lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Smiley (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-11653) create next time collection based on a fixed time gap
Date Wed, 03 Jan 2018 04:50:00 GMT

     [ https://issues.apache.org/jira/browse/SOLR-11653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

David Smiley updated SOLR-11653:
--------------------------------
    Attachment: SOLR-11653.patch

New patch...
* renamed "head" terminology to "mostRecent"
* URP:
** No longer will retry 5 times if makes no progress. This is probably best and it simplifies
understanding the loop.  As long as the alias is getting updated after createCollectionAfter
is called (thus it appears we're making progress), we retry.
** refactored a updateParsedCollectionAliases method out of findTargetCollectionGivenTimestamp
so we can call it separately.
** createCollectionAfter now forces an update to the aliases to ensure we'll see changes (if
there were any).  Otherwise, it's possible the ZK watcher is slow to update and we may think
no alias state progress has occurred when it's imminent
** added a locking mechanism in the URP to avoid needless concurrent messages to the Overseer
from the same JVM to add another collection
* added parallel updates to test
* Improved the OverseerTaskProcessor flow I referred to before

I beasted the test a bit; and it has survived.  Earlier I was stumped by consistently getting
a failure "collection already exists: myalias_2017-10-24" whenever tests.iters was > 1.
 This is the first collection the test creates that is not ultimately deleted.  Yet I thought
it wasn't necessary to clean up unused collections in tests... so maybe there is a test infrastructure
bug here?  I found some other similar but simpler test, ConfigSetsAPITest, and tried to see
if it fails similarly but it does not.  I haven't dug into why but it's at least easy to deal
with -- just clean up when done.

I wonder what would happen if the collection is partially created but not completely for whatever
reason (e.g. overloaded system).  Would it ultimately recover?  Probably not; you'd have to
manually (via e.g. HTTP API call) delete the collection that was never ultimately added to
the alias.

> create next time collection based on a fixed time gap
> -----------------------------------------------------
>
>                 Key: SOLR-11653
>                 URL: https://issues.apache.org/jira/browse/SOLR-11653
>             Project: Solr
>          Issue Type: Sub-task
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>            Reporter: David Smiley
>            Assignee: David Smiley
>         Attachments: SOLR-11653.patch, SOLR-11653.patch, SOLR-11653.patch
>
>
> For time series collections (as part of a collection Alias with certain metadata), we
want to automatically add new collections. In this issue, this is about creating the next
collection based on a configurable fixed time gap.  And we will also add this collection synchronously
once a document flowing through the URP chain exceeds the gap, as opposed to asynchronously
in advance.  There will be some Alias metadata to define in this issue.  The preponderance
of the implementation will be in TimePartitionedUpdateProcessor or perhaps a helper to this
URP.
> note: other issues will implement pre-emptive creation and capping collections by size.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message