Added: drill/site/trunk/content/drill/docs/querying-a-file-system/index.html URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/querying-a-file-system/index.html?rev=1651949&view=auto ============================================================================== --- drill/site/trunk/content/drill/docs/querying-a-file-system/index.html (added) +++ drill/site/trunk/content/drill/docs/querying-a-file-system/index.html Thu Jan 15 05:11:44 2015 @@ -0,0 +1,139 @@ + + + + + + + + +Querying a File System - Apache Drill + + + + + + + + + + + + + + + + + + +
+ + + + + +
+

Querying a File System

+ +
+ +

Files and directories are like standard SQL tables to Drill. You can specify a +file system "database" as a prefix in queries when you refer to objects across +databases. In Drill, a file system database consists of a storage plugin name +followed by an optional workspace name, for example . or hdfs.logs.

+ +

The following example shows a query on a file system database in a Hadoop +distributed file system:

+ +

SELECT * FROM hdfs.logs.`AppServerLogs/20104/Jan/01/part0001.txt`;

+ +

The default dfs storage plugin instance registered with Drill has a +default workspace. If you query data in the default workspace, you do not +need to include the workspace in the query. Refer to +Workspaces for +more information.

+ +

Drill supports the following file types:

+ + + +

The extensions for these file types must match the configuration settings for +your registered storage plugins. For example, PSV files may be defined with a +.tbl extension, while CSV files are defined with a .csv extension.

+ +

Click on any of the following links for more information about querying +various file types:

+ + +
+ + + + + + + + Added: drill/site/trunk/content/drill/docs/querying-complex-data/index.html URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/querying-complex-data/index.html?rev=1651949&view=auto ============================================================================== --- drill/site/trunk/content/drill/docs/querying-complex-data/index.html (added) +++ drill/site/trunk/content/drill/docs/querying-complex-data/index.html Thu Jan 15 05:11:44 2015 @@ -0,0 +1,155 @@ + + + + + + + + +Querying Complex Data - Apache Drill + + + + + + + + + + + + + + + + + + +
+ + + + + +
+

Querying Complex Data

+ +
+ +

Apache Drill queries do not require prior knowledge of the actual data you are +trying to access, regardless of its source system or its schema and data +types. The sweet spot for Apache Drill is a SQL query workload against +"complex data": data made up of various types of records and fields, rather +than data in a recognizable relational form (discrete rows and columns). Drill +is capable of discovering the form of the data when you submit the query. +Nested data formats such as JSON (JavaScript Object Notation) files and +Parquet files are not only accessible: Drill provides special operators and +functions that you can use to _drill down _into these files and ask +interesting analytic questions.

+ +

These operators and functions include:

+ + + +

The SQL query developer needs to know the data well enough to write queries +that identify values of interest in the target file. For example, the writer +needs to know what a record consists of, and its data types, in order to +reliably request the right "columns" in the select list. Although these data +values do not manifest themselves as columns in the source file, Drill will +return them in the result set as if they had the predictable form of columns +in a table. Drill also optimizes queries by treating the data as "columnar" +rather than reading and analyzing complete records. (Drill uses similar +parallel execution and optimization capabilities to commercial columnar MPP +databases.)

+ +

Given a basic knowledge of the input file, the developer needs to know how to +use the SQL extensions that Drill provides and how to use them to "reach into" +the nested data. The following examples show how to write both simple queries +against JSON files and interesting queries that unpack the nested data. The +examples show how to use the Drill extensions in the context of standard SQL +SELECT statements. For the most part, the extensions use standard JavaScript +notation for referencing data elements in a hierarchy.

+ +

Before You Begin

+ +

The examples in this section operate on JSON data files. In order to write +your own queries, you need to be aware of the basic data types in these files:

+ + + +

Check that you have the following configuration setting for JSON files in the +Drill Web UI (dfs storage plugin configuration):

+
"json" : {
+  "type" : "json"
+}
+
+

Click on any of the following links to see examples of complex queries:

+ + +
+ + + + + + + + Added: drill/site/trunk/content/drill/docs/querying-directories/index.html URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/querying-directories/index.html?rev=1651949&view=auto ============================================================================== --- drill/site/trunk/content/drill/docs/querying-directories/index.html (added) +++ drill/site/trunk/content/drill/docs/querying-directories/index.html Thu Jan 15 05:11:44 2015 @@ -0,0 +1,173 @@ + + + + + + + + +Querying Directories - Apache Drill + + + + + + + + + + + + + + + + + + +
+ + + + + +
+

Querying Directories

+ +
+ +

You can store multiple files in a directory and query them as if they were a +single entity. You do not have to explicitly join the files. The files must be +compatible, in the sense that they must have comparable data types and columns +in the same order. This type of query is not limited to text files; you can +also query directories of JSON files, for example.

+ +

For example, assume that a testdata directory contains two files with the +same structure: plays.csv and moreplays.csv. The first file contains 7 +records and the second file contains 3 records. The following query returns +the "union" of the two files, ordered by the first column:

+
0: jdbc:drill:zk=local> select columns[0] as `Year`, columns[1] as Play 
+from dfs.`/Users/brumsby/drill/testdata` order by 1;
+
++------------+------------+
+|    Year    |    Play    |
++------------+------------+
+| 1594       | Comedy of Errors |
+| 1595       | Romeo and Juliet |
+| 1596       | The Merchant of Venice |
+| 1599       | As You Like It |
+| 1599       | Hamlet     |
+| 1601       | Twelfth Night |
+| 1606       | Macbeth    |
+| 1606       | King Lear  |
+| 1609       | The Winter's Tale |
+| 1610       | The Tempest |
++------------+------------+
+10 rows selected (0.296 seconds)
+
+

You can drill down further and automatically query subdirectories as well. For +example, assume that you have a logs directory that contains a subdirectory +for each year and subdirectories for each month (1 through 12). The month +directories contain JSON files.

+
[root@ip-172-16-1-200 logs]# pwd
+/mapr/drilldemo/labs/clicks/logs
+[root@ip-172-16-1-200 logs]# ls
+2012  2013  2014
+[root@ip-172-16-1-200 logs]# cd 2013
+[root@ip-172-16-1-200 2013]# ls
+1  10  11  12  2  3  4  5  6  7  8  9
+
+

You can query all of these files, or a subset, by referencing the file system +once in a Drill query. For example, the following query counts the number of +records in all of the files inside the 2013 directory:

+
0: jdbc:drill:> select count(*) from MFS.`/mapr/drilldemo/labs/clicks/logs/2013` ;
++------------+
+|   EXPR$0   |
++------------+
+| 24000      |
++------------+
+1 row selected (2.607 seconds)
+
+

You can also use variables dir0, dir1, and so on, to refer to +subdirectories in your workspace path. For example, assume that bob.logdata +is a workspace that points to the logs directory, which contains multiple +subdirectories: 2012, 2013, and 2014. The following query constrains +files inside the subdirectory named 2013. The variable dir0 refers to the +first level down from logs, dir1 to the next level, and so on.

+
0: jdbc:drill:> use bob.logdata;
++------------+------------+
+|     ok     |  summary   |
++------------+------------+
+| true       | Default schema changed to 'bob.logdata' |
++------------+------------+
+1 row selected (0.305 seconds)
+
+0: jdbc:drill:> select * from logs where dir0='2013' limit 10;
++------------+------------+------------+------------+------------+------------+------------+------------+------------+-------------+
+|    dir0    |    dir1    |  trans_id  |    date    |    time    |  cust_id   |   device   |   state    |  camp_id   |  keywords   |
++------------+------------+------------+------------+------------+------------+------------+------------+------------+-------------+
+| 2013       | 2          | 12115      | 02/23/2013 | 19:48:24   | 3          | IOS5       | az         | 5          | who's       |
+| 2013       | 2          | 12127      | 02/26/2013 | 19:42:03   | 11459      | IOS5       | wa         | 10         | for         |
+| 2013       | 2          | 12138      | 02/09/2013 | 05:49:01   | 1          | IOS6       | ca         | 7          | minutes     |
+| 2013       | 2          | 12139      | 02/23/2013 | 06:58:20   | 1          | AOS4.4     | ms         | 7          | i           |
+| 2013       | 2          | 12145      | 02/10/2013 | 10:14:56   | 10         | IOS5       | mi         | 6          | wrong       |
+| 2013       | 2          | 12157      | 02/15/2013 | 02:49:22   | 102        | IOS5       | ny         | 5          | want        |
+| 2013       | 2          | 12176      | 02/19/2013 | 08:39:02   | 28         | IOS5       | or         | 0          | and         |
+| 2013       | 2          | 12194      | 02/24/2013 | 08:26:17   | 125445     | IOS5       | ar         | 0          | say         |
+| 2013       | 2          | 12236      | 02/05/2013 | 01:40:05   | 10         | IOS5       | nj         | 2          | sir         |
+| 2013       | 2          | 12249      | 02/03/2013 | 04:45:47   | 21725      | IOS5       | nj         | 5          | no          |
++------------+------------+------------+------------+------------+------------+------------+------------+------------+-------------+
+10 rows selected (0.583 seconds)
+
+ + + + + + + + Added: drill/site/trunk/content/drill/docs/querying-hbase/index.html URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/querying-hbase/index.html?rev=1651949&view=auto ============================================================================== --- drill/site/trunk/content/drill/docs/querying-hbase/index.html (added) +++ drill/site/trunk/content/drill/docs/querying-hbase/index.html Thu Jan 15 05:11:44 2015 @@ -0,0 +1,257 @@ + + + + + + + + +Querying HBase - Apache Drill + + + + + + + + + + + + + + + + + + +
+ + + + + +
+

Querying HBase

+ +
+ +

This is a simple exercise that provides steps for creating a “students” table +and a “clicks” table in HBase that you can query with Drill.

+ +

To create the HBase tables and query them with Drill, complete the following +steps:

+ +
    +
  1. Issue the following command to start the HBase shell:

    +
    hbase shell
    +
  2. +
  3. Issue the following commands to create a ‘students’ table and a ‘clicks’ table with column families in HBase:

    + +

    echo "create 'students','account','address'" | hbase shell

    +
    ``echo "create 'clicks','clickinfo','iteminfo'" | hbase shell``
    +
  4. +
  5. Issue the following command with the provided data to create a testdata.txt file:

    + +

    cat > testdata.txt

    + +

    Sample Data

    +
    put 'students','student1','account:name','Alice'
    +put 'students','student1','address:street','123 Ballmer Av'
    +put 'students','student1','address:zipcode','12345'
    +put 'students','student1','address:state','CA'
    +put 'students','student2','account:name','Bob'
    +put 'students','student2','address:street','1 Infinite Loop'
    +put 'students','student2','address:zipcode','12345'
    +put 'students','student2','address:state','CA'
    +put 'students','student3','account:name','Frank'
    +put 'students','student3','address:street','435 Walker Ct'
    +put 'students','student3','address:zipcode','12345'
    +put 'students','student3','address:state','CA'
    +put 'students','student4','account:name','Mary'
    +put 'students','student4','address:street','56 Southern Pkwy'
    +put 'students','student4','address:zipcode','12345'
    +put 'students','student4','address:state','CA'
    +put 'clicks','click1','clickinfo:studentid','student1'
    +put 'clicks','click1','clickinfo:url','http://www.google.com'
    +put 'clicks','click1','clickinfo:time','2014-01-01 12:01:01.0001'
    +put 'clicks','click1','iteminfo:itemtype','image'
    +put 'clicks','click1','iteminfo:quantity','1'
    +put 'clicks','click2','clickinfo:studentid','student1'
    +put 'clicks','click2','clickinfo:url','http://www.amazon.com'
    +put 'clicks','click2','clickinfo:time','2014-01-01 01:01:01.0001'
    +put 'clicks','click2','iteminfo:itemtype','image'
    +put 'clicks','click2','iteminfo:quantity','1'
    +put 'clicks','click3','clickinfo:studentid','student2'
    +put 'clicks','click3','clickinfo:url','http://www.google.com'
    +put 'clicks','click3','clickinfo:time','2014-01-01 01:02:01.0001'
    +put 'clicks','click3','iteminfo:itemtype','text'
    +put 'clicks','click3','iteminfo:quantity','2'
    +put 'clicks','click4','clickinfo:studentid','student2'
    +put 'clicks','click4','clickinfo:url','http://www.ask.com'
    +put 'clicks','click4','clickinfo:time','2013-02-01 12:01:01.0001'
    +put 'clicks','click4','iteminfo:itemtype','text'
    +put 'clicks','click4','iteminfo:quantity','5'
    +put 'clicks','click5','clickinfo:studentid','student2'
    +put 'clicks','click5','clickinfo:url','http://www.reuters.com'
    +put 'clicks','click5','clickinfo:time','2013-02-01 12:01:01.0001'
    +put 'clicks','click5','iteminfo:itemtype','text'
    +put 'clicks','click5','iteminfo:quantity','100'
    +put 'clicks','click6','clickinfo:studentid','student3'
    +put 'clicks','click6','clickinfo:url','http://www.google.com'
    +put 'clicks','click6','clickinfo:time','2013-02-01 12:01:01.0001'
    +put 'clicks','click6','iteminfo:itemtype','image'
    +put 'clicks','click6','iteminfo:quantity','1'
    +put 'clicks','click7','clickinfo:studentid','student3'
    +put 'clicks','click7','clickinfo:url','http://www.ask.com'
    +put 'clicks','click7','clickinfo:time','2013-02-01 12:45:01.0001'
    +put 'clicks','click7','iteminfo:itemtype','image'
    +put 'clicks','click7','iteminfo:quantity','10'
    +put 'clicks','click8','clickinfo:studentid','student4'
    +put 'clicks','click8','clickinfo:url','http://www.amazon.com'
    +put 'clicks','click8','clickinfo:time','2013-02-01 22:01:01.0001'
    +put 'clicks','click8','iteminfo:itemtype','image'
    +put 'clicks','click8','iteminfo:quantity','1'
    +put 'clicks','click9','clickinfo:studentid','student4'
    +put 'clicks','click9','clickinfo:url','http://www.amazon.com'
    +put 'clicks','click9','clickinfo:time','2013-02-01 22:01:01.0001'
    +put 'clicks','click9','iteminfo:itemtype','image'
    +put 'clicks','click9','iteminfo:quantity','10'
    +
  6. +
  7. Issue the following command to verify that the data is in the testdata.txt file:

    + +

    cat testdata.txt | hbase shell

  8. +
  9. Issue exit to leave the hbase shell.

  10. +
  11. Start Drill. Refer to Starting/Stopping Drill for instructions.

  12. +
  13. Use Drill to issue the following SQL queries on the “students” and “clicks” tables: +a. Issue the following query to see the data in the “students” table:

    +
    ``SELECT * FROM hbase.`students`;``
    +
    +The query returns binary results:
    +
    +`Query finished, fetching results ...`
    +
    +`+----------+----------+----------+-----------+----------+----------+----------+-----------+`
    +
    +`|id    | name        | state       | street      | zipcode |`
    +
    +`+----------+----------+----------+-----------+----------+-----------+----------+-----------`
    +
    +`| [B@1ee37126 | [B@661985a1 | [B@15944165 | [B@385158f4 | [B@3e08d131 |`
    +
    +`| [B@64a7180e | [B@161c72c2 | [B@25b229e5 | [B@53dc8cb8 |[B@1d11c878 |`
    +
    +`| [B@349aaf0b | [B@175a1628 | [B@1b64a812 | [B@6d5643ca |[B@147db06f |`
    +
    +`| [B@3a7cbada | [B@52cf5c35 | [B@2baec60c | [B@5f4c543b |[B@2ec515d6 |`
    +
    +

    Since Drill does not require metadata, you must use the SQL CAST function in +some queries to get readable query results.

    + +

    b. Issue the following query, that includes the CAST function, to see the data in the “students” table:

    + +

    SELECT CAST(students.clickinfo.studentid as VarChar(20)), +CAST(students.account.name as VarChar(20)), CAST (students.address.state as +VarChar(20)), CAST (students.address.street as VarChar(20)), CAST +(students.address.zipcode as VarChar(20)), FROM hbase.students;

    + +

    Note: Use the following format when you query a column in an HBase table:
    + tablename.columnfamilyname.columnname
    + For more information about column families, refer to 5.6. Column +Family.

    + +

    The query returns the data:

    + +

    Query finished, fetching results ...

    + +

    +----------+-------+-------+------------------+---------+

    + +

    | studentid | name | state | street | zipcode |

    + +

    +----------+-------+-------+------------------+---------+

    + +

    | student1 | Alice | CA | 123 Ballmer Av | 12345 |

    + +

    | student2 | Bob | CA | 1 Infinite Loop | 12345 |

    + +

    | student3 | Frank | CA | 435 Walker Ct | 12345 |

    + +

    | student4 | Mary | CA | 56 Southern Pkwy | 12345 |

    + +

    +----------+-------+-------+------------------+---------+

  14. +
+ +

c. Issue the following query on the “clicks” table to find out which students clicked on google.com:

+
   ``SELECT CAST(clicks.clickinfo.studentid as VarChar(200)), CAST(clicks.clickinfo.url as VarChar(200)) FROM hbase.`clicks` WHERE URL LIKE '%google%';``
+
+   The query returns the data:
+
+
+   `Query finished, fetching results ...`
+
+   `+---------+-----------+-------------------------------+-----------------------+----------+----------+`
+
+   `| clickid | studentid | time                          | url                   | itemtype | quantity |`
+
+   `+---------+-----------+-------------------------------+-----------------------+----------+----------+`
+
+   `| click1  | student1  | 2014-01-01 12:01:01.000100000 | http://www.google.com | image    | 1        |`
+
+   `| click3  | student2  | 2014-01-01 01:02:01.000100000 | http://www.google.com | text     | 2        |`
+
+   `| click6  | student3  | 2013-02-01 12:01:01.000100000 | http://www.google.com | image    | 1        |`
+
+   `+---------+-----------+-------------------------------+-----------------------+----------+----------+`
+
+ + + + + + + + Added: drill/site/trunk/content/drill/docs/querying-hive/index.html URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/querying-hive/index.html?rev=1651949&view=auto ============================================================================== --- drill/site/trunk/content/drill/docs/querying-hive/index.html (added) +++ drill/site/trunk/content/drill/docs/querying-hive/index.html Thu Jan 15 05:11:44 2015 @@ -0,0 +1,151 @@ + + + + + + + + +Querying Hive - Apache Drill + + + + + + + + + + + + + + + + + + +
+ + + + + +
+

Querying Hive

+ +
+ +

This is a simple exercise that provides steps for creating a Hive table and +inserting data that you can query using Drill. Before you perform the steps, +download the customers.csv file.

+ +

To create a Hive table and query it with Drill, complete the following steps:

+ +
    +
  1. Issue the following command to start the Hive shell:

    +
    hive
    +
  2. +
  3. Issue the following command from the Hive shell to import the customers.csv file and create a table:

    +
    hive> create table customers(FirstName string,
    +LastName string,Company string,Address string,
    +City string,County string,State string,Zip string,
    +Phone string,Fax string,Email string,Web string)
    +row format delimited fields terminated by ',' stored as textfile;
    +
  4. +
  5. Issue the following command to load the customer data into the customers table:

    + +

    Hive> load data local inpath '/<directory path>/customers.csv' overwrite into table customers;

  6. +
  7. Issue quit or exit to leave the Hive shell.

  8. +
  9. Start Drill. Refer to Starting/Stopping Drill for instructions.

  10. +
  11. Issue the following query to Drill to get the first and last names of the first ten customers in the Hive table:

    + +

    0: jdbc:drill:schema=hiveremote> SELECT firstname,lastname FROM hiveremote.customerslimit 10;

    + +

    The query returns the following results:

    + +

    +------------+------------+

    + +

    | firstname | lastname |

    + +

    +------------+------------+

    + +

    | Essie | Vaill |

    + +

    | Cruz | Roudabush |

    + +

    | Billie | Tinnes |

    + +

    | Zackary | Mockus |

    + +

    | Rosemarie | Fifield |

    + +

    | Bernard | Laboy |

    + +

    | Sue | Haakinson |

    + +

    | Valerie | Pou |

    + +

    | Lashawn | Hasty |

    + +

    | Marianne | Earman |

    + +

    +------------+------------+

    + +

    10 rows selected (1.5 seconds)

    + +

    0: jdbc:drill:schema=hiveremote>

  12. +
+
+ + + + + + + + Added: drill/site/trunk/content/drill/docs/querying-json-files/index.html URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/querying-json-files/index.html?rev=1651949&view=auto ============================================================================== --- drill/site/trunk/content/drill/docs/querying-json-files/index.html (added) +++ drill/site/trunk/content/drill/docs/querying-json-files/index.html Thu Jan 15 05:11:44 2015 @@ -0,0 +1,127 @@ + + + + + + + + +Querying JSON Files - Apache Drill + + + + + + + + + + + + + + + + + + +
+ + + + + +
+

Querying JSON Files

+ +
+ +

Your Drill installation includes a sample JSON file located in Drill's +classpath. The sample JSON file, employee.json, contains fictitious employee +data. Use SQL syntax to query the sample JSON file.

+ +

To view the data in the employee.json file, submit the following SQL query +to Drill:

+ +

0: jdbc:drill:zk=local> SELECT * FROM cp.`employee.json`;

+ +

The query returns the following results:

+ +

Example of partial output

+
+-------------+------------+------------+------------+-------------+-----------+
+| employee_id | full_name  | first_name | last_name  | position_id | position_ |
++-------------+------------+------------+------------+-------------+-----------+
+| 1101        | Steve Eurich | Steve      | Eurich     | 16          | Store T |
+| 1102        | Mary Pierson | Mary       | Pierson    | 16          | Store T |
+| 1103        | Leo Jones  | Leo        | Jones      | 16          | Store Tem |
+| 1104        | Nancy Beatty | Nancy      | Beatty     | 16          | Store T |
+| 1105        | Clara McNight | Clara      | McNight    | 16          | Store  |
+| 1106        | Marcella Isaacs | Marcella   | Isaacs     | 17          | Stor |
+| 1107        | Charlotte Yonce | Charlotte  | Yonce      | 17          | Stor |
+| 1108        | Benjamin Foster | Benjamin   | Foster     | 17          | Stor |
+| 1109        | John Reed  | John       | Reed       | 17          | Store Per |
+| 1110        | Lynn Kwiatkowski | Lynn       | Kwiatkowski | 17          | St |
+| 1111        | Donald Vann | Donald     | Vann       | 17          | Store Pe |
+| 1112        | William Smith | William    | Smith      | 17          | Store  |
+| 1113        | Amy Hensley | Amy        | Hensley    | 17          | Store Pe |
+| 1114        | Judy Owens | Judy       | Owens      | 17          | Store Per |
+| 1115        | Frederick Castillo | Frederick  | Castillo   | 17          | S |
+| 1116        | Phil Munoz | Phil       | Munoz      | 17          | Store Per |
+| 1117        | Lori Lightfoot | Lori       | Lightfoot  | 17          | Store |
+...
++-------------+------------+------------+------------+-------------+-----------+
+1,155 rows selected (0.762 seconds)
+0: jdbc:drill:zk=local>
+
+ + + + + + + + Added: drill/site/trunk/content/drill/docs/querying-parquet-files/index.html URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/querying-parquet-files/index.html?rev=1651949&view=auto ============================================================================== --- drill/site/trunk/content/drill/docs/querying-parquet-files/index.html (added) +++ drill/site/trunk/content/drill/docs/querying-parquet-files/index.html Thu Jan 15 05:11:44 2015 @@ -0,0 +1,187 @@ + + + + + + + + +Querying Parquet Files - Apache Drill + + + + + + + + + + + + + + + + + + +
+ + + + + +
+

Querying Parquet Files

+ +
+ +

Your Drill installation includes a sample-date directory with Parquet files +that you can query. Use SQL syntax to query the region.parquet and +nation.parquet files in the sample-data directory.

+ +

Note: Your Drill installation location may differ from the examples used here. The examples assume that Drill was installed in embedded mode on your machine following the Apache Drill in 10 Minutes tutorial. If you installed Drill in distributed mode, or your sample-data directory differs from the location used in the examples, make sure to change the sample-data directory to the correct location before you run the queries.

+ +

Region File

+ +

If you followed the Apache Drill in 10 Minutes instructions to install Drill +in embedded mode, the path to the parquet file varies between operating +systems.

+ +

To view the data in the region.parquet file, issue the query appropriate for +your operating system:

+ + + +

The query returns the following results:

+
+------------+------------+
+|   EXPR$0   |   EXPR$1   |
++------------+------------+
+| AFRICA     | lar deposits. blithely final packages cajole. regular waters ar |
+| AMERICA    | hs use ironic, even requests. s |
+| ASIA       | ges. thinly even pinto beans ca |
+| EUROPE     | ly final courts cajole furiously final excuse |
+| MIDDLE EAST | uickly special accounts cajole carefully blithely close reques |
++------------+------------+
+5 rows selected (0.165 seconds)
+0: jdbc:drill:zk=local>
+
+

Nation File

+ +

If you followed the Apache Drill in 10 Minutes instructions to install Drill +in embedded mode, the path to the parquet file varies between operating +systems.

+ +

To view the data in the nation.parquet file, issue the query appropriate for +your operating system:

+ + + +

The query returns the following results:

+
+------------+------------+------------+------------+
+|   EXPR$0   |   EXPR$1   |   EXPR$2   |   EXPR$3   |
++------------+------------+------------+------------+
+| 0          | 0          | ALGERIA    |  haggle. carefully final deposits det |
+| 1          | 1          | ARGENTINA  | al foxes promise slyly according to t |
+| 2          | 1          | BRAZIL     | y alongside of the pending deposits.  |
+| 3          | 1          | CANADA     | eas hang ironic, silent packages. sly |
+| 4          | 4          | EGYPT      | y above the carefully unusual theodol |
+| 5          | 0          | ETHIOPIA   | ven packages wake quickly. regu |
+| 6          | 3          | FRANCE     | refully final requests. regular, iron |
+| 7          | 3          | GERMANY    | l platelets. regular accounts x-ray:  |
+| 8          | 2          | INDIA      | ss excuses cajole slyly across the pa |
+| 9          | 2          | INDONESIA  |  slyly express asymptotes. regular de |
+| 10         | 4          | IRAN       | efully alongside of the slyly final d |
+| 11         | 4          | IRAQ       | nic deposits boost atop the quickly f |
+| 12         | 2          | JAPAN      | ously. final, express gifts cajole a |
+| 13         | 4          | JORDAN     | ic deposits are blithely about the ca |
+| 14         | 0          | KENYA      |  pending excuses haggle furiously dep |
+| 15         | 0          | MOROCCO    | rns. blithely bold courts among the c |
+| 16         | 0          | MOZAMBIQUE | s. ironic, unusual asymptotes wake bl |
+| 17         | 1          | PERU       | platelets. blithely pending dependenc |
+| 18         | 2          | CHINA      | c dependencies. furiously express not |
+| 19         | 3          | ROMANIA    | ular asymptotes are about the furious |
+| 20         | 4          | SAUDI ARABIA | ts. silent requests haggle. closely |
+| 21         | 2          | VIETNAM    | hely enticingly express accounts. eve |
+| 22         | 3          | RUSSIA     |  requests against the platelets use n |
+| 23         | 3          | UNITED KINGDOM | eans boost carefully special requ |
+| 24         | 1          | UNITED STATES | y final packages. slow foxes cajol |
++------------+------------+------------+------------+
+25 rows selected (2.401 seconds)
+0: jdbc:drill:zk=local>
+
+ + + + + + + + Added: drill/site/trunk/content/drill/docs/querying-plain-text-files/index.html URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/querying-plain-text-files/index.html?rev=1651949&view=auto ============================================================================== --- drill/site/trunk/content/drill/docs/querying-plain-text-files/index.html (added) +++ drill/site/trunk/content/drill/docs/querying-plain-text-files/index.html Thu Jan 15 05:11:44 2015 @@ -0,0 +1,202 @@ + + + + + + + + +Querying Plain Text Files - Apache Drill + + + + + + + + + + + + + + + + + + +
+ + + + + +
+

Querying Plain Text Files

+ +
+ +

You can use Drill to access both structured file types and plain text files +(flat files). This section shows a few simple examples that work on flat +files:

+ + + +

The examples here show CSV files, but queries against TSV and PSV files return +equivalent results. However, make sure that your registered storage plugins +recognize the appropriate file types and extensions. For example, the +following configuration expects PSV files (files with a pipe delimiter) to +have a tbl extension, not a psv extension. Drill returns a "file not +found" error if references to files in queries do not match these conditions.

+
"formats": {
+    "psv": {
+      "type": "text",
+      "extensions": [
+        "tbl"
+      ],
+      "delimiter": "|"
+    }
+
+

SELECT * FROM a CSV File

+ +

The first query selects rows from a .csv text file. The file contains seven +records:

+
$ more plays.csv
+
+1599,As You Like It
+1601,Twelfth Night
+1594,Comedy of Errors
+1595,Romeo and Juliet
+1596,The Merchant of Venice
+1610,The Tempest
+1599,Hamlet
+
+

Drill recognizes each row as an array of values and returns one column for +each row.

+ +

0: jdbc:drill:zk=local> select * from dfs./Users/brumsby/drill/plays.csv;

+
+------------+
+|  columns   |
++------------+
+| ["1599","As You Like It"] |
+| ["1601","Twelfth Night"] |
+| ["1594","Comedy of Errors"] |
+| ["1595","Romeo and Juliet"] |
+| ["1596","The Merchant of Venice"] |
+| ["1610","The Tempest"] |
+| ["1599","Hamlet"] |
++------------+
+7 rows selected (0.089 seconds)
+
+

Columns[n] Syntax

+ +

You can use the COLUMNS[n] syntax in the SELECT list to return these CSV +rows in a more readable, column by column, format. (This syntax uses a zero- +based index, so the first column is column 0.)

+ +

0: jdbc:drill:zk=local> select columns[0], columns[1] +from dfs./Users/brumsby/drill/plays.csv;

+
+------------+------------+
+|   EXPR$0   |   EXPR$1   |
++------------+------------+
+| 1599       | As You Like It |
+| 1601       | Twelfth Night |
+| 1594       | Comedy of Errors |
+| 1595       | Romeo and Juliet |
+| 1596       | The Merchant of Venice |
+| 1610       | The Tempest |
+| 1599       | Hamlet     |
++------------+------------+
+7 rows selected (0.137 seconds)
+
+

You can use aliases to return meaningful column names. Note that YEAR is a +reserved word, so the Year alias must be enclosed by back ticks.

+
0: jdbc:drill:zk=local> select columns[0] as `Year`, columns[1] as Play 
+from dfs.`/Users/brumsby/drill/plays.csv`;
+
++------------+------------+
+|    Year    |    Play    |
++------------+------------+
+| 1599       | As You Like It |
+| 1601       | Twelfth Night |
+| 1594       | Comedy of Errors |
+| 1595       | Romeo and Juliet |
+| 1596       | The Merchant of Venice |
+| 1610       | The Tempest |
+| 1599       | Hamlet     |
++------------+------------+
+7 rows selected (0.113 seconds)
+
+

You cannot refer to the aliases in subsequent clauses of the query. Use the +original columns[n] syntax, as shown in the WHERE clause for the following +example:

+ +

0: jdbc:drill:zk=local> select columns[0] as Year, columns[1] as Play +from dfs./Users/brumsby/drill/plays.csv where columns[0]>1599;

+
+------------+------------+
+|    Year    |    Play    |
++------------+------------+
+| 1601       | Twelfth Night |
+| 1610       | The Tempest |
++------------+------------+
+2 rows selected (0.201 seconds)
+
+

Note that the restriction with the use of aliases applies to queries against +all data sources.

+
+ + + + + + + + Added: drill/site/trunk/content/drill/docs/querying-system-tables/index.html URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/querying-system-tables/index.html?rev=1651949&view=auto ============================================================================== --- drill/site/trunk/content/drill/docs/querying-system-tables/index.html (added) +++ drill/site/trunk/content/drill/docs/querying-system-tables/index.html Thu Jan 15 05:11:44 2015 @@ -0,0 +1,247 @@ + + + + + + + + +Querying System Tables - Apache Drill + + + + + + + + + + + + + + + + + + +
+ + + + + +
+

Querying System Tables

+ +
+ +

Drill has a sys database that contains system tables. You can query the system +tables for information about Drill, including Drill ports, the Drill version +running on the system, and available Drill options. View the databases in +Drill to identify the sys database, and then use the sys database to view +system tables that you can query.

+ +

View Drill Databases

+ +

Issue the SHOW DATABASES command to view Drill databases.

+
0: jdbc:drill:zk=10.10.100.113:5181> show databases;
++-------------+
+| SCHEMA_NAME |
++-------------+
+| M7          |
+| hive.default|
+| dfs.default |
+| dfs.root    |
+| dfs.views   |
+| dfs.tmp     |
+| dfs.tpcds   |
+| sys         |
+| cp.default  |
+| hbase       |
+| INFORMATION_SCHEMA |
++-------------+
+11 rows selected (0.162 seconds)
+
+

Drill returns sys in the database results.

+ +

Use the Sys Database

+ +

Issue the USE command to select the sys database for subsequent SQL +requests.

+
0: jdbc:drill:zk=10.10.100.113:5181> use sys;
++------------+--------------------------------+
+|   ok     |  summary                         |
++------------+--------------------------------+
+| true     | Default schema changed to 'sys'  |
++------------+--------------------------------+
+1 row selected (0.101 seconds)
+
+

View Tables

+ +

Issue the SHOW TABLES command to view the tables in the sys database.

+
0: jdbc:drill:zk=10.10.100.113:5181> show tables;
++--------------+------------+
+| TABLE_SCHEMA | TABLE_NAME |
++--------------+------------+
+| sys          | drillbits  |
+| sys          | version    |
+| sys          | options    |
++--------------+------------+
+3 rows selected (0.934 seconds)
+0: jdbc:drill:zk=10.10.100.113:5181>
+
+

Query System Tables

+ +

Query the drillbits, version, and options tables in the sys database.

+ +
Query the drillbits table.
+
0: jdbc:drill:zk=10.10.100.113:5181> select * from drillbits;
++------------------+------------+--------------+------------+---------+
+|   host            | user_port | control_port | data_port  |  current|
++-------------------+------------+--------------+------------+--------+
+| qa-node115.qa.lab | 31010     | 31011        | 31012      | true    |
+| qa-node114.qa.lab | 31010     | 31011        | 31012      | false   |
+| qa-node116.qa.lab | 31010     | 31011        | 31012      | false   |
++------------+------------+--------------+------------+---------------+
+3 rows selected (0.146 seconds)
+
+ + +
Query the version table.
+
0: jdbc:drill:zk=10.10.100.113:5181> select * from version;
++------------+----------------+-------------+-------------+------------+
+| commit_id  | commit_message | commit_time | build_email | build_time |
++------------+----------------+-------------+-------------+------------+
+| 108d29fce3d8465d619d45db5f6f433ca3d97619 | DRILL-1635: Additional fix for validation exceptions. | 14.11.2014 @ 02:32:47 UTC | Unknown    | 14.11.2014 @ 03:56:07 UTC |
++------------+----------------+-------------+-------------+------------+
+1 row selected (0.144 seconds)
+
+ + +
Query the options table.
+ +

Drill provides system, session, and boot options that you can query.

+ +

The following example shows a query on the system options:

+
0: jdbc:drill:zk=10.10.100.113:5181> select * from options where type='SYSTEM' limit 10;
++------------+------------+------------+------------+------------+------------+------------+
+|    name   |   kind    |   type    |  num_val   | string_val |  bool_val  | float_val  |
++------------+------------+------------+------------+------------+------------+------------+
+| exec.max_hash_table_size | LONG       | SYSTEM    | 1073741824 | null     | null      | null      |
+| planner.memory.max_query_memory_per_node | LONG       | SYSTEM    | 2048       | null     | null      | null      |
+| planner.join.row_count_estimate_factor | DOUBLE   | SYSTEM    | null      | null      | null      | 1.0       |
+| planner.affinity_factor | DOUBLE  | SYSTEM    | null      | null      | null       | 1.2      |
+| exec.errors.verbose | BOOLEAN | SYSTEM    | null      | null      | false      | null     |
+| planner.disable_exchanges | BOOLEAN   | SYSTEM    | null      | null      | false      | null     |
+| exec.java_compiler_debug | BOOLEAN    | SYSTEM    | null      | null      | true      | null      |
+| exec.min_hash_table_size | LONG       | SYSTEM    | 65536     | null      | null      | null       |
+| exec.java_compiler_janino_maxsize | LONG       | SYSTEM   | 262144    | null      | null      | null      |
+| planner.enable_mergejoin | BOOLEAN    | SYSTEM    | null      | null      | true      | null       |
++------------+------------+------------+------------+------------+------------+------------+
+10 rows selected (0.334 seconds)  
+
+ + +

For information about how to configure Drill system and session options, see +Planning and Execution Options.

+ +

For information about how to configure Drill start-up options, see Start-Up +Options.

+
+ + + + + + + + Added: drill/site/trunk/content/drill/docs/querying-the-information-schema/index.html URL: http://svn.apache.org/viewvc/drill/site/trunk/content/drill/docs/querying-the-information-schema/index.html?rev=1651949&view=auto ============================================================================== --- drill/site/trunk/content/drill/docs/querying-the-information-schema/index.html (added) +++ drill/site/trunk/content/drill/docs/querying-the-information-schema/index.html Thu Jan 15 05:11:44 2015 @@ -0,0 +1,193 @@ + + + + + + + + +Querying the INFORMATION SCHEMA - Apache Drill + + + + + + + + + + + + + + + + + + +
+ + + + + +
+

Querying the INFORMATION SCHEMA

+ +
+ +

When you are using Drill to connect to multiple data sources, you need a +simple mechanism to discover what each data source contains. The information +schema is an ANSI standard set of metadata tables that you can query to return +information about all of your Drill data sources (or schemas). Data sources +may be databases or file systems; they are all known as "schemas" in this +context. You can query the following INFORMATION_SCHEMA tables:

+ + + +

SCHEMATA

+ +

The SCHEMATA table contains the CATALOG_NAME and SCHEMA_NAME columns. To allow +maximum flexibility inside BI tools, the only catalog that Drill supports is +DRILL.

+
0: jdbc:drill:zk=local> select CATALOG_NAME, SCHEMA_NAME as all_my_data_sources from INFORMATION_SCHEMA.SCHEMATA order by SCHEMA_NAME;
++--------------+---------------------+
+| CATALOG_NAME | all_my_data_sources |
++--------------+---------------------+
+| DRILL        | INFORMATION_SCHEMA  |
+| DRILL        | cp.default          |
+| DRILL        | dfs.default         |
+| DRILL        | dfs.root            |
+| DRILL        | dfs.tmp             |
+| DRILL        | HiveTest.SalesDB    |
+| DRILL        | maprfs.logs         |
+| DRILL        | sys                 |
++--------------+---------------------+
+
+

The INFORMATION_SCHEMA name and associated keywords are case-sensitive. You +can also return a list of schemas by running the SHOW DATABASES command:

+
0: jdbc:drill:zk=local> show databases;
++-------------+
+| SCHEMA_NAME |
++-------------+
+| dfs.default |
+| dfs.root    |
+| dfs.tmp     |
+...
+
+

CATALOGS

+ +

The CATALOGS table returns only one row, with the hardcoded DRILL catalog name +and description.

+ +

TABLES

+ +

The TABLES table returns the table name and type for each table or view in +your databases. (Type means TABLE or VIEW.) Note that Drill does not return +files available for querying in file-based data sources. Instead, use SHOW +FILES to explore these data sources.

+ +

COLUMNS

+ +

The COLUMNS table returns the column name and other metadata (such as the data +type) for each column in each table or view.

+ +

VIEWS

+ +

The VIEWS table returns the name and definition for each view in your +databases. Note that file schemas are the canonical repository for views in +Drill. Depending on how you create a view, the may only be displayed in Drill +after it has been used.

+ +

Useful Queries

+ +

Run an INFORMATION_SCHEMA.`TABLES`query to view all of the tables and views +within a database. TABLES is a reserved word in Drill and requires back ticks +(`).

+ +

For example, the following query identifies all of the tables and views that +Drill can access:

+
SELECT TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE
+FROM INFORMATION_SCHEMA.`TABLES`
+ORDER BY TABLE_NAME DESC;
+----------------------------------------------------------------
+TABLE_SCHEMA             TABLE_NAME            TABLE_TYPE
+----------------------------------------------------------------
+HiveTest.CustomersDB     Customers             TABLE
+HiveTest.SalesDB         Orders                TABLE
+HiveTest.SalesDB         OrderLines            TABLE
+HiveTest.SalesDB         USOrders              VIEW
+dfs.default              CustomerSocialProfile VIEW
+----------------------------------------------------------------
+
+

Note: Currently, Drill only supports querying Drill views; Hive views are not yet supported.

+ +

You can run a similar query to identify columns in tables and the data types +of those columns:

+
SELECT COLUMN_NAME, DATA_TYPE 
+FROM INFORMATION_SCHEMA.COLUMNS 
+WHERE TABLE_NAME = 'Orders' AND TABLE_SCHEMA = 'HiveTest.SalesDB' AND COLUMN_NAME LIKE '%Total';
++-------------+------------+
+| COLUMN_NAME | DATA_TYPE  |
++-------------+------------+
+| OrderTotal  | Decimal    |
++-------------+------------+
+
+ + + + + + + +