lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl (JIRA) <j...@apache.org>
Subject [jira] [Commented] (SOLR-9562) Minimize queried collections for time series alias
Date Thu, 06 Oct 2016 16:00:23 GMT

    [ https://issues.apache.org/jira/browse/SOLR-9562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15552329#comment-15552329
] 

Jan Høydahl commented on SOLR-9562:
-----------------------------------

bq.  it'd be wonderful if Solr supported this directly!
Yes. If {{dateRange}} was a property of each shard, stored in ZK, then {{shards}} param could
be intelligently chosen during querying.
Also, perhaps Solr at some point could have replication on demand, i.e. be able to change
{{replicationFactor}} on a per-shard basis based on which one gets the most traffic. Many
time-series apps have a "last 30 days" search mode which would only hit a few shards, which
would need some more replicas than the older data. Hey, it could even unload cores of shards
that have not been used in a certain time, and load on demand. Wrt deleting content older
than N days we already have TTL for that, but perhaps Solr would need to detect empty shards
and delete them?

> Minimize queried collections for time series alias
> --------------------------------------------------
>
>                 Key: SOLR-9562
>                 URL: https://issues.apache.org/jira/browse/SOLR-9562
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Eungsop Yoo
>            Priority: Minor
>         Attachments: SOLR-9562-v2.patch, SOLR-9562.patch
>
>
> For indexing time series data(such as large log data), we can create a new collection
regularly(hourly, daily, etc.) with a write alias and create a read alias for all of those
collections. But all of the collections of the read alias are queried even if we search over
very narrow time window. In this case, the docs to be queried may be stored in very small
portion of collections. So we don't need to do that.
> I suggest this patch for read alias to minimize queried collections. Three parameters
for CREATEALIAS action are added.
> || Key || Type || Required || Default || Description ||
> | timeField | string | No | | The time field name for time series data. It should be
date type. |
> | dateTimeFormat | string | No | | The format of timestamp for collection creation. Every
collection should has a suffix(start with "_") with this format. 
> Ex. dateTimeFormat: yyyyMMdd, collectionName: col_20160927
> See [DateTimeFormatter|https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html].
|
> | timeZone | string | No | | The time zone information for dateTimeFormat parameter.
> Ex. GMT+9. 
> See [DateTimeFormatter|https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html].
|
> And then when we query with filter query like this "timeField:\[fromTime TO toTime\]",
only the collections have the docs for a given time range will be queried.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message