drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: OpenTSDB plugin development for Drill
Date Wed, 01 Feb 2017 19:46:34 GMT
Moving user@drill to bcc.

The key questions here have to do with the following:

1) how is the best way to write rules that will help drill force the query
into recognizable form

2) is there a good way to pass parameters to the data source. Things like
resampling parameters and resampling aggregator have no good corollary in
SQL so putting them into the table specification seems reasonable. But how
does that work? Are there examples?

3) assuming that the query can be massaged via optimizer rules or the users
can be trained, what is the best way to pick off parts of the query for
inclusion?  For instance, all data points in TSDB are tagged with keys and
values. That would make it seem like this query:

    select avg(t.value), max(t.value)
    from table(`tsdb/memory`, sample='1m', aggregate='avg') t
    where t.tags.cluster = 'c1'
    group by t.tags.host;

would be nice. The variables cluster and host would refer to tags while
value would refer to the value itself. How would these be accessed inside
the data source for push-down into TSDB? Are there examples?

Also, how would the parameters of the table function be accessed? How does
Drill know that the TSDB data source is to be used?







On Wed, Feb 1, 2017 at 2:01 AM, Dmintriy Gavrilovich <
dhavrilovich@cybervisiontech.com> wrote:

> Hi everyone.
>
> TLTR;
>
> I have started to develop an OpenTSDB Plugin for Drill available here:
> https://github.com/mapr-demos/drill/tree/openTSDB-plugin/
> contrib/storage-opentsdb
>
> This is a work in progress and I have some ideas, and questions, see below
>
> DETAILS
>
>
> I am developing a storage plugin for OpenTSDB time series DB and I faced
> some problems due to completely  different APIs that drill expect and TSDB
> uses.
>
> As OpenTSDB do not have any java client or jdbc driver, only REST API.
> Here is a sample json call to tsdb:
> {
>     "start": 1356998400,
>     "end": 1356998460,
>     "queries": [
>         {
>             "aggregator": "sum",
>             "metric": "sys.cpu.0",
>             "rate": "true",
>             "tags": {
>                 "host": "*",
>                 "dc": "lga"
>             }
>         },
>         {
>             "aggregator": "sum",
>             "tsuids": [
>                 "000001000002000042",
>                 "000001000002000043"
>               ]
>             }
>         }
>     ]
> }
>
> Sample query with filters:
> {
>     "start": 1356998400,
>     "end": 1356998460,
>     "queries": [
>         {
>             "aggregator": "sum",
>             "metric": "sys.cpu.0",
>             "rate": "true",
>             "filters": [
>                 {
>                    "type":"wildcard",
>                    "tagk":"host",
>                    "filter":"*",
>                    "groupBy":true
>                 },
>                 {
>                    "type":"literal_or",
>                    "tagk":"dc",
>                    "filter":"lga|lga1|lga2",
>                    "groupBy":false
>                 },
>             ]
>         },
>         {
>             "aggregator": "sum",
>             "tsuids": [
>                 "000001000002000042",
>                 "000001000002000043"
>               ]
>             }
>         }
>     ]
> }
>
> Sample response:
> [
>     {
>         "metric": "tsd.hbase.puts",
>         "tags": {},
>         "aggregatedTags": [
>             "host"
>         ],
>         "annotations": [
>             {
>                 "tsuid": "00001C0000FB0000FB",
>                 "description": "Testing Annotations",
>                 "notes": "These would be details about the event, the
> description is just a summary",
>                 "custom": {
>                     "owner": "jdoe",
>                     "dept": "ops"
>                 },
>                 "endTime": 0,
>                 "startTime": 1365966062
>             }
>         ],
>         "globalAnnotations": [
>             {
>                 "description": "Notice",
>                 "notes": "DAL was down during this period",
>                 "custom": null,
>                 "endTime": 1365966164,
>                 "startTime": 1365966064
>             }
>         ],
>         "tsuids": [
>             "0023E3000002000008000006000001"
>         ],
>         "dps": {
>             "1365966001": 25595461080,
>             "1365966061": 25595542522,
>             "1365966062": 25595543979,
> ...
>             "1365973801": 25717417859
>         }
>     }
> ]
>
> So the main problem is to convert values from SQL syntax to OpenTSDB
> values and push it to the API. Also we do not have fixed columns. We have a
> map in our tag column and each tag can be a search filter. This cause
> problems then we try to perform search using where clause.
>
> Query string like where host = * and dc = lga should be transformed like
> this:
> "tags": {
>                 "host": "*",
>                 "dc": "lga"
>             }
>
> I have already a working prototype available here:
> https://github.com/mapr-demos/drill/tree/openTSDB-plugin/
> contrib/storage-opentsdb
>
> With the following supported SQL statement:
>
> SELECT * FROM <table_name:aggregation_function>;
>
>
> Now I  would like to go further and implement more time series related
> features for example:
>
> 1- select avg|sum|min|max(speedmetric.value)
> 2- from openTSDB(metric=sensor.speed, downsample='1m', interpolate='avg')
> speedmetric
> 3- where speedmetric.tags.id in (001, 002)
> 4- and speedmetric.timestamp >='value' and speedmetric.timestamp <= 'value'
> 5- group by speedmetric.tags.hostname
>
>
> Where:
>
> 1 - Where the aggregation function, should be pushed down to the OpenTSDB
> REST Call
> -> How can I override the aggregation function for my plugin
>
> 2 - I currently working on converting string from this clause to map to
> use it in TSDB query
> 3 - tags what we are searching for
> 4 - time period for search. In fact is is two timestamp values “from” and
> “to”. This values are required
> 5 - don’t exactly know how transform this to the TSDB API.
>
>
> Now we are using this syntax to use aggregation function :
>
> The syntax for SELECT query with aggregation function is:
> SELECT * FROM <table_name:aggregation_function>;
>
> It transforms it such api request:
> `/api/query?start=5y-ago&m=sum:warp.speed` as get request. More
> complicated requests should use post requests.
>
> Many thanks, Dmitriy Gavrilovich
> dhavrilovich@cybervisiontech.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message