lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Smiley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-11299) Time partitioned collections (umbrella issue)
Date Sat, 14 Oct 2017 20:30:00 GMT

    [ https://issues.apache.org/jira/browse/SOLR-11299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16204809#comment-16204809
] 

David Smiley commented on SOLR-11299:
-------------------------------------

bq. the cost of DateFormat.parse() of some number of partition names for every doc

Ah, yes.  So it seems we could grab the Aliases instance and if it's the very same one as
the last instance, then the previous parsing is still valid.  In other words, cache the parsing.

> Time partitioned collections (umbrella issue)
> ---------------------------------------------
>
>                 Key: SOLR-11299
>                 URL: https://issues.apache.org/jira/browse/SOLR-11299
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>            Reporter: David Smiley
>            Assignee: David Smiley
>
> Solr ought to have the ability to manage large-scale time-series data (think logs or
sensor data / IOT) itself without a lot of manual/external work.  The most naive and painless
approach today is to create a collection with a high numShards with hash routing but this
isn't as good as partitioning the underlying indexes by time for these reasons:
> * Easy to scale up/down horizontally as data/requirements change.  (No need to over-provision,
use shard splitting, or re-index with different config)
> * Faster queries: 
>     ** can search fewer shards, reducing overall load
>     ** realtime search is more tractable (since most shards are stable -- good caches)
>     ** "recent" shards (that might be queried more) can be allocated to faster hardware
>     ** aged out data is simply removed, not marked as deleted.  Deleted docs still have
search overhead.
> * Outages of a shard result in a degraded but sometimes a useful system nonetheless (compare
to random subset missing)
> Ideally you could set this up once and then simply work with a collection (potentially
actually an alias) in a normal way (search or update), letting Solr handle the addition of
new partitions, removing of old ones, and appropriate routing of requests depending on their
nature.
> This issue is an umbrella issue for the particular tasks that will make it all happen
-- either subtasks or issue linking.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message