drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Time for a fun Query Question
Date Tue, 04 Dec 2018 21:02:39 GMT
I would parse the timestamp into seconds since epoch. Then divide by use
floor(ts/600) as the key to group on 10 minute boundaries.

This works because:

- all timezones are multiples of 10 minutes away from UTC

- all leap seconds are hidden in the seconds since epoch conversions

- the epoch was on an even 10 minute interval

These all apply for any time interval that is a multiple of 10 minutes as
well (30 minutes, hourly, daily, weekly). More advanced calendar
manipulation is required for things like monthly grouping with timezone and
DST awareness. String hacking might work if you can take on some important
assumptions. Putting the calendar manipulations into a UDF wouldn't be hard.


On Wed, Dec 5, 2018 at 1:40 AM John Omernik <john@omernik.com> wrote:

> So one approach I have is this:
>
>
>
> select ds, ts_win, opt_string, opt_id, max(eday) as max_eday from (
> select a.ds, a.ts_win, a.opt_string, b.opt_id, b.eday, b.ts_win as
> rpt_ts_win, b.ts
> FROM
> (
> select distinct ds, opt_string, opt_id, concat(substr(ts, 1, 4), '0:00') as
> ts_win
> from dfs.prod.view_optdata where ds = '2018-12-02' order by ds, ts_win,
> opt_string
> ) a JOIN
> (
> select opt_id, ds, opt_string, concat(substr(ts, 1, 4), '0:00') as ts_win,
> ts, eday
> from dfs.prod.view_optdata where ds = '2018-12-02'
> ) b ON a.ds = b.ds and a.opt_string = b.opt_string
> WHERE b.ts <= a.ts_win
> ) c
> group by ds, ts_win, opt_string, opt_id
> order by ds, ts_win ASC, opt_string ASC, opt_id ASC
>
> This breaks up my day into 10 minute intervals (based on string slicing)
> and then shows me what's reported before that... I think :)
>
> This is limited in that I can do it only in time intervals that can be
> substringed out of a time field in  string format.  Still open to other
> ideas :)
>
>
> On Tue, Dec 4, 2018 at 10:05 AM John Omernik <john@omernik.com> wrote:
>
> > Time for a fun question: How to be clever with queries!
> >
> >
> > I have a table that takes readings from an IoT type device
> >
> >
> > opt_id         dt           ts          eday   opt_string
> > 2.1.1       2018-12-01   10:43:43       12.5       1
> > 2.1.2       2018-12-01   10:32:43       5.5        1
> > 2.1.3       2018-12-01   10:55:02       20.5       1
> > 2.2.1       2018-12-01   10:43:43       12.2       2
> > 2.2.2       2018-12-01   10:12:56       43.7       2
> > 2.2.3       2018-12-01   10:50:23       12.2       2
> > 2.1.1       2018-12-01   13:43:43       45.5       1
> > 2.1.2       2018-12-01   13:32:43       20.5       1
> > 2.1.3       2018-12-01   13:55:02       45.5       1
> > 2.2.1       2018-12-01   13:43:43       30.2       2
> > 2.2.2       2018-12-01   13:12:56       81.7       2
> > 2.2.3       2018-12-01   13:50:23       50.2       2
> >
> >
> > Pretty simple. Essentially, each "device" (opt_id) reports throughout the
> > day, what it's cumulative value (eday), at that time, for that day
> >
> > These devices belong to groups which is it's own column (opt_string)
> >
> >
> > So if I run a query like this:
> >
> > select opt_id, ds, max(opt_string) as opt_string, max(ts) as max_ts,
> > max(Eday) as max_Eday
> > from `mydata`
> > where ds = '2018-12-01'
> > group by ds, opt_id, order by opt_id ASC
> >
> > I get:
> >
> > opt_id ds opt_string max_ts max_Eday
> > 2.1.1 2018-12-02 1 15:56:33 181.000
> > 2.1.3 2018-12-02 1 15:47:02 162.500
> > 2.1.4 2018-12-02 1 15:54:19 122.250
> > 2.1.5 2018-12-02 1 15:40:15 132.000
> > 2.1.6 2018-12-02 1 15:45:04 183.250
> > 2.1.7 2018-12-02 1 15:41:15 161.000
> > 2.1.8 2018-12-02 1 15:42:10 166.250
> > 2.1.9 2018-12-02 1 15:39:23 126.250
> > 2.2.1 2018-12-02 2 16:02:54 163.250
> > 2.2.2 2018-12-02 2 15:57:39 178.750
> > 2.2.3 2018-12-02 2 15:46:53 148.250
> > 2.2.4 2018-12-02 2 15:40:27 163.500
> > 2.2.5 2018-12-02 2 15:50:49 147.500
> > 2.2.6 2018-12-02 2 15:40:24 156.000
> > 2.2.7 2018-12-02 2 15:43:00 104.750
> > 2.2.8 2018-12-02 2 15:41:55 170.750
> > 2.2.9 2018-12-02 2 15:41:04 127.250
> >
> > Which shows me the last time each device reported on a day, and what that
> > report time was, Great.
> >
> >
> > So what if I wanted to show the values through the day? Is there a way
> > from a query perpective, to perhaps use Windowing or something to show
> > results like the above raw table, but instead of just having the random
> > times returned, group by opt_id and get the max_ts as I am doing, but do
> > that for each say 5 minute, or 30 minute, or 60 minute block in a day?
> >
> > The best example to explain what I am looking for is using 3 groups of 1
> > hour blocks: I could use a UNION like this:
> >
> > select * from (
> > select opt_id, ds,  max(opt_string) as opt_string, '10:00:00' as max_ts,
> > max(Eday) as max_Eday
> > from dfs.prod.view_optdata where ds = '2018-12-02' and ts <= '10:00:00'
> > group by ds, opt_id
> > order by opt_id ASC
> > )
> > UNION
> > (select opt_id, ds,  max(opt_string) as opt_string, '11:00:00' as max_ts,
> > max(Eday) as max_Eday
> > from dfs.prod.view_optdata where ds = '2018-12-02' and ts <= '11:00:00'
> > group by ds, opt_id
> > order by opt_id ASC)
> > UNION
> > (select opt_id, ds,  max(opt_string) as opt_string, '12:00:00' as max_ts,
> > max(Eday) as max_Eday
> > from dfs.prod.view_optdata where ds = '2018-12-02' and ts <= '12:00:00'
> > group by ds, opt_id
> > order by opt_id ASC)
> >
> >
> > Where I would get something like:
> >
> > index opt_id ds opt_string max_ts max_Eday
> > 0 2.1.1 12/2/18 1 10:00:00 18
> > 1 2.1.3 12/2/18 1 10:00:00 15.75
> > 2 2.1.4 12/2/18 1 10:00:00 12.5
> > 3 2.1.5 12/2/18 1 10:00:00 12.75
> > 4 2.1.6 12/2/18 1 10:00:00 18.75
> > 5 2.1.7 12/2/18 1 10:00:00 7.25
> > 6 2.1.8 12/2/18 1 10:00:00 16.25
> > 7 2.1.9 12/2/18 1 10:00:00 9.75
> > 8 2.2.1 12/2/18 2 10:00:00 18
> > 9 2.2.2 12/2/18 2 10:00:00 15.25
> > 10 2.2.3 12/2/18 2 10:00:00 2.75
> > 11 2.2.4 12/2/18 2 10:00:00 6.5
> > 12 2.2.5 12/2/18 2 10:00:00 16.25
> > 13 2.2.6 12/2/18 2 10:00:00 11.25
> > 14 2.2.7 12/2/18 2 10:00:00 13.5
> > 15 2.2.8 12/2/18 2 10:00:00 13.75
> > 16 2.2.9 12/2/18 2 10:00:00 14.5
> > 17 2.1.1 12/2/18 1 11:00:00 41
> > 18 2.1.3 12/2/18 1 11:00:00 32
> > 19 2.1.4 12/2/18 1 11:00:00 25
> > 20 2.1.5 12/2/18 1 11:00:00 26.75
> > 21 2.1.6 12/2/18 1 11:00:00 42.25
> > 22 2.1.7 12/2/18 1 11:00:00 27
> > 23 2.1.8 12/2/18 1 11:00:00 35.75
> > 24 2.1.9 12/2/18 1 11:00:00 22
> > 25 2.2.1 12/2/18 2 11:00:00 34.5
> > 26 2.2.2 12/2/18 2 11:00:00 42.5
> > 27 2.2.3 12/2/18 2 11:00:00 19
> > 28 2.2.4 12/2/18 2 11:00:00 29.25
> > 29 2.2.5 12/2/18 2 11:00:00 32.25
> > 30 2.2.6 12/2/18 2 11:00:00 30.25
> > 31 2.2.7 12/2/18 2 11:00:00 26.75
> > 32 2.2.8 12/2/18 2 11:00:00 33.5
> > 33 2.2.9 12/2/18 2 11:00:00 28.5
> > 34 2.1.1 12/2/18 1 12:00:00 81.5
> > 35 2.1.3 12/2/18 1 12:00:00 62.75
> > 36 2.1.4 12/2/18 1 12:00:00 48.5
> > 37 2.1.5 12/2/18 1 12:00:00 48.25
> > 38 2.1.6 12/2/18 1 12:00:00 80.5
> > 39 2.1.7 12/2/18 1 12:00:00 62
> > 40 2.1.8 12/2/18 1 12:00:00 66.25
> > 41 2.1.9 12/2/18 1 12:00:00 43.75
> > 42 2.2.1 12/2/18 2 12:00:00 62.75
> > 43 2.2.2 12/2/18 2 12:00:00 83.5
> > 44 2.2.3 12/2/18 2 12:00:00 49
> > 45 2.2.4 12/2/18 2 12:00:00 66.25
> > 46 2.2.5 12/2/18 2 12:00:00 56.5
> > 47 2.2.6 12/2/18 2 12:00:00 68
> > 48 2.2.7 12/2/18 2 12:00:00 43
> > 49 2.2.8 12/2/18 2 12:00:00 72.5
> > 50 2.2.9 12/2/18 2 12:00:00 49.25
> >
> >
> > That's great, but if I wanted to say all the hours in a day, that Union
> > would get ugly, or what about 30 minute blocks? 5 minute blocks?
> >
> > Is there a way to get these types of results without the Union and being
> > explicit about the times?  Any advice, including "Look at X" would be
> > helpful. Or if you need a diversion, dig on in!
> >
> > John
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message