drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Altekruse <altekruseja...@gmail.com>
Subject Re: Using Apache Drill - Sample Data and Sample Queries - Examples
Date Thu, 14 Aug 2014 20:02:51 GMT
Interestingly it appears to be re-directing to a URL that looks like it has
a video ID on it. I haven't used vimeo, but is it possible it was uploaded
but not yet made public?

-Jason


On Thu, Aug 14, 2014 at 12:57 PM, Timothy Chen <tnachen@gmail.com> wrote:

> Hi Jim,
>
> The vimeo link gave me 404 though.
>
> Tim
>
> On Thu, Aug 14, 2014 at 12:31 PM, Jim Scott <jscott@maprtech.com> wrote:
> > I recently gave a presentation on how to use Apache Drill with some
> > demonstrations.
> >
> > The video for the presentation is available here:
> > http://vimeo.com/chug/using-apache-drill
> > The slides are available here:
> >
> http://www.slideshare.net/ChicagoHUG/using-apache-drill-chug-august-2014-jim-scott
> >
> > *The data used in this presentation can be found and downloaded (1.3GB
> CSV
> > file and a 100KB CSV file) *
> > *here:*
> >
> https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2
> > https://data.cityofchicago.org/Public-Safety/Sex-Offenders/vc9r-bqvy
> >
> > Once you start your drill session you can connect to it via:
> > http://localhost:8047/
> >
> >
> > *Queries (pay attention to back ticks) *
> > To see a full stack trace when errors occur:
> > alter session set `exec.errors.verbose`=true;
> >
> >
> > *Looking through the information schema:*SELECT * from
> > INFORMATION_SCHEMA.SCHEMATA;
> > SELECT * from INFORMATION_SCHEMA.`TABLES`;
> > SELECT * from INFORMATION_SCHEMA.`COLUMNS`;
> > SELECT COLUMN_NAME, DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS WHERE
> > TABLE_SCHEMA = 'sys' and TABLE_NAME = 'options';
> >
> >
> > *How many of each type of incident:*select count(*) as incidents,
> > columns[5] as category from
> > dfs.`/home/jscott/Downloads/chicago/Crimes_-_2001_to_present.csv` group
> by
> > columns[5] order by incidents desc limit 10;
> >
> >
> > *How many of each type of incident and subtype:*select count(*) as
> > incidents, columns[5] as type, columns[6] as subtype from
> > dfs.`/home/jscott/Downloads/chicago/Crimes_-_2001_to_present.csv` group
> by
> > columns[6], columns[5] order by incidents desc limit 10;
> >
> >
> > *CREATE A TABLE in PARQUET format:*
> >
> > *Please note that I created a workspace "db" at my path of /opt/drill/db
> > and I gave it a default storage format of parquet. *
> >
> > create table dfs.db.`crime_data` as select columns[0] as REFID,
> columns[1]
> > as CaseNumber, columns[2] as DateTime, columns[3] as Block, columns[4] as
> > IUCR, columns[5] as PrimaryType, columns[6] as Description, columns[7] as
> > LocationDescription, columns[8] as Arrest, columns[9] as Domestic,
> > columns[10] as Beat, columns[11] as District, columns[12] as Ward,
> > columns[13] as CommunityArea, columns[14] as FBICode, columns[15] as
> > XCoordinate, columns[16] as YCoordinate, columns[17] as YEAR_DATE,
> > columns[18] as UpdatedOn, columns[19] as Latitude, columns[20] as
> > Longitude, columns[21] as Location from
> > dfs.`/home/jscott/Downloads/chicago/Crimes_-_2001_to_present.csv`;
> > create table dfs.db.`sex_offenders` as select columns[0] as
> LAST,columns[1]
> > as FIRST,columns[2] as BLOCK,columns[3] as GENDER,columns[4] as
> > RACE,columns[5] as BIRTHDATE,columns[6] as AGE,columns[7] as
> > HEIGHT,columns[8] as WEIGHT,columns[9] as VICTIM_MINOR from
> > dfs.`/home/jscott/Downloads/chicago/Sex_Offenders.csv`;
> >
> >
> > *Run the previous query on the newly created table:*select count(*) as
> > incidents, primarytype, description from dfs.db.`crime_data` group by
> > description, primarytype order by incidents desc limit 10;
> >
> >
> > *How many crimes occurred on a block with a registered sex
> offender?*SELECT
> > count(*) as incidents FROM dfs.db.`crime_data` crimes JOIN
> > dfs.db.`sex_offenders` offenders ON crimes.block = offenders.block ORDER
> BY
> > incidents DESC;
> >
> >
> > *What types of incidents occurred?*SELECT count(*) as incidents,
> > crimes.primarytype as type, crimes.description as subtype FROM
> > dfs.db.`crime_data` crimes JOIN dfs.db.`sex_offenders` offenders ON
> > crimes.block = offenders.block GROUP BY crimes.description,
> > crimes.primarytype ORDER BY incidents DESC LIMIT 20;
> >
> >
> > *What all is in the BATTERY category?*SELECT count(*) as incidents,
> > crimes.description as subtype FROM dfs.db.`crime_data` crimes JOIN
> > dfs.db.`sex_offenders` offenders ON crimes.block = offenders.block WHERE
> > crimes.primarytype = 'BATTERY' GROUP BY crimes.description ORDER BY
> > incidents DESC;
> >
> >
> > *How many total BATTERY incidents?*SELECT count(*) as incidents FROM
> > dfs.db.`crime_data` crimes JOIN dfs.db.`sex_offenders` offenders ON
> > crimes.block = offenders.block WHERE crimes.primarytype = 'BATTERY';
> >
> >
> > *How many "BATTERY" incidents occurred on those blocks by year (has it
> > gotten better or worse over time)?*SELECT count(*) as incidents,
> > extract(year from to_date(crimes.datetime, 'MM/DD/YYYY hh:mm:ss a')) as
> > `year` FROM dfs.db.`crime_data` crimes JOIN dfs.db.`sex_offenders`
> > offenders ON crimes.block = offenders.block WHERE crimes.primarytype =
> > 'BATTERY' and crimes.datetime not like '%Date%' GROUP BY extract(year
> from
> > to_date(crimes.datetime, 'MM/DD/YYYY hh:mm:ss a')) ORDER BY `year` ASC;
> >
> >
> > Do you have queries against this data set that you have written that you
> > would like to share with others? Reply to the thread...
> >
> >
> > --
> > *Jim Scott*
> > Director, Enterprise Strategy & Architecture
> >
> >  <http://www.mapr.com/>
> > [image: MapR Technologies] <http://www.mapr.com>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message