drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sivasubramaniam, Latha" <Latha.Sivasubraman...@Aspect.com>
Subject RE: Unable to query data from hdfs
Date Wed, 08 Apr 2015 21:06:11 GMT
Ramana,

Please find attached dservices.tar file.

Thanks for your help.

-Latha

From: Sivasubramaniam, Latha
Sent: Wednesday, April 08, 2015 1:33 PM
To: 'user@drill.apache.org'
Subject: RE: Unable to query data from hdfs

Thanks for all the responses.

Once I renamed files within directories to have extensions .csv, then it worked. So looks
like for csv format, having extension is a must. It would be nice, if it does not allow "null"
in the extension description.

Now in the next step of my proof of concept, I am trying to access parquet files. I have parquet
files(tables) created for the tables using impala, I am assuming that I should be able to
access those files via drill as well.

My parquet tables are placed under /user/hive/warehouse, like listed below here


[root@rtr-poc-imp1 sample-data]# hdfs dfs -ls /user/hive/warehouse
Found 19 items
drwxrwxrwt   - impala hive          0 2015-03-31 16:00 /user/hive/warehouse/dim_agent_status_parq
drwxrwxrwt   - impala hive          0 2015-03-31 16:00 /user/hive/warehouse/dim_agent_status_reasons_parq
drwxrwxrwt   - impala hive          0 2015-03-27 12:27 /user/hive/warehouse/dim_agents_parquet
drwxrwxrwt   - impala hive          0 2015-03-31 16:00 /user/hive/warehouse/dim_call_action_reasons_parq
drwxrwxrwt   - impala hive          0 2015-03-31 14:09 /user/hive/warehouse/dim_call_actions_parq
drwxrwxrwt   - impala hive          0 2015-03-31 13:54 /user/hive/warehouse/dim_call_types_parq
drwxrwxrwt   - impala hive          0 2015-03-31 15:59 /user/hive/warehouse/dim_dispositions_parq
drwxrwxrwt   - impala hive          0 2015-03-31 15:20 /user/hive/warehouse/dim_resource_groups_parq
drwxrwxrwt   - impala hive          0 2015-03-31 13:33 /user/hive/warehouse/dim_services_parq
drwxrwxrwt   - impala hive          0 2015-03-31 14:00 /user/hive/warehouse/dim_sites_parq
drwxrwxrwt   - impala hive          0 2015-03-31 15:25 /user/hive/warehouse/dim_workgroups_parq
drwxrwxrwx   - root   hive          0 2015-04-08 14:36 /user/hive/warehouse/dservices
drwxrwxrwt   - impala hive          0 2015-03-27 11:48 /user/hive/warehouse/edwpoc.db
drwxrwxrwt   - impala hive          0 2015-03-31 12:47 /user/hive/warehouse/fact_agent_activity_detail_12m_partparq
drwxrwxrwt   - impala hive          0 2015-03-30 13:03 /user/hive/warehouse/fact_contact_detail_12m_partparq
drwxrwxrwt   - impala hive          0 2015-03-27 13:36 /user/hive/warehouse/fact_contact_detail_partparq
-rw-r--r--   3 root   hive        455 2015-04-08 14:55 /user/hive/warehouse/region.parq
drwxrwxrwt   - impala hive          0 2015-03-25 22:29 /user/hive/warehouse/sample_07
drwxrwxrwt   - impala hive          0 2015-03-25 22:29 /user/hive/warehouse/sample_08

example listing from one of the directory

hdfs dfs -ls /user/hive/warehouse/dim_services_parq
Found 3 items
-rw-r--r--   3 impala hive      55121 2015-03-31 13:33 /user/hive/warehouse/dim_services_parq/4645c4221dafa337-250888c6ac1de29b_1376355963_data.0.parq
-rw-r--r--   3 impala hive      71075 2015-03-31 13:33 /user/hive/warehouse/dim_services_parq/4645c4221dafa337-250888c6ac1de29c_2123191845_data.0.parq
drwxrwxrwt   - impala hive          0 2015-03-31 13:33 /user/hive/warehouse/dim_services_parq/_impala_insert_staging
[root@rtr-poc-imp1 sample-data]#

There is nothing under impala staging directory, this is primarily used when insert operation
is performed.

I copied dim_services_parq directory to dservices and below is the listing of dservices directory.

[root@rtr-poc-imp1 sample-data]#  hdfs dfs -ls /user/hive/warehouse/dservices
Found 2 items
-rwxrwxrwx   3 root hive      55121 2015-04-08 14:12 /user/hive/warehouse/dservices/service0.parquet
-rwxrwxrwx   3 root hive      71075 2015-04-08 14:12 /user/hive/warehouse/dservices/service1.parquet

Now when I try, I get the below error

select * from hdfs.drillpoc.`/dservices`;
Query failed: RemoteRpcException: Failure while running fragment., java.lang.UnsupportedOperationException
[ cfca83ec-986a-43c0-a967-5aee102401dd on rtr-poc-imp2.labs.aspect.com:31010 ]
[ cfca83ec-986a-43c0-a967-5aee102401dd on rtr-poc-imp2.labs.aspect.com:31010 ]

I also copied the drill sample parquet file region.parquet to the same location and that works
fine like below.

select * from hdfs.drillpoc.`region.parq`;
+-------------+------------+------------+
| R_REGIONKEY |   R_NAME   | R_COMMENT  |
+-------------+------------+------------+
| 0           | AFRICA     | lar deposits. blithe |
| 1           | AMERICA    | hs use ironic, even  |
| 2           | ASIA       | ges. thinly even pin |
| 3           | EUROPE     | ly final courts cajo |
| 4           | MIDDLE EAST | uickly special accou |
+-------------+------------+------------+
5 rows selected (0.122 seconds)

So far what I have read, impala created parquet file should be like any other parquet file,
there should not be a problem. If this does not work, I need to convert all my tables in text
format to parquet format and access it with drill. Is there any utility to do that.

Thanks for all the help.
Latha







From: Sivasubramaniam, Latha
Sent: Wednesday, April 08, 2015 8:00 AM
To: 'user@drill.apache.org'
Subject: RE: Unable to query data from hdfs

Hi,

Thanks for your responses. Even though I had done use hdfs, only when I fully qualified the
file name it worked. But I am not able to access files without .csv extension.

I modified

"csv": {
      "type": "text",
      "extensions": [
        "csv"
      ],
      "delimiter": ","

To

"csv": {
      "type": "text",
      "extensions":  null,
      "delimiter": ","

And tried to access hdfs file 'DIM_Agents' and I get the same error. With null extensions,
I can't access 'test.csv' also, once I reverted back csv format description then I could access
test.csv again, but I cannot access other files with either of the format descriptions.

Below are what I tried. Is '_'  (underscore) a problem in the file name. All my hdfs files
are in text format.

0: jdbc:drill:zk=rtr-poc-imp1:2181> select * from hdfs.root.`/test.csv`;
+------------+------------+
|  columns   |    dir0    |
+------------+------------+
| ["1","Latha"] | root       |
| ["2","Roshan"] | root       |
+------------+------------+
2 rows selected (0.276 seconds)
0: jdbc:drill:zk=rtr-poc-imp1:2181> select * from hdfs.root.`/DIM_Agents`;
Query failed: SqlValidatorException: Table 'hdfs.root./DIM_Agents' not found

Error: exception while executing query: Failure while executing query. (state=,code=0)
0: jdbc:drill:zk=rtr-poc-imp1:2181> select * from hdfs.root.`/DIM_Agents`;
Query failed: SqlValidatorException: Table 'hdfs.root./DIM_Agents' not found

Error: exception while executing query: Failure while executing query. (state=,code=0)
0: jdbc:drill:zk=rtr-poc-imp1:2181> select * from hdfs.root.`/test.csv`;
Query failed: SqlValidatorException: Table 'hdfs.root./test.csv' not found

Error: exception while executing query: Failure while executing query. (state=,code=0)
0: jdbc:drill:zk=rtr-poc-imp1:2181> select * from hdfs.root.`/DIM_Agents`;
Query failed: SqlValidatorException: Table 'hdfs.root./DIM_Agents' not found

Error: exception while executing query: Failure while executing query. (state=,code=0)
0: jdbc:drill:zk=rtr-poc-imp1:2181> select * from hdfs.root.`/test.csv`;
Query failed: SqlValidatorException: Table 'hdfs.root./test.csv' not found

Error: exception while executing query: Failure while executing query. (state=,code=0)
0: jdbc:drill:zk=rtr-poc-imp1:2181> select * from hdfs.root.`/test.csv`;
+------------+------------+
|  columns   |    dir0    |
+------------+------------+
| ["1","Latha"] | root       |
| ["2","Roshan"] | root       |
+------------+------------+
2 rows selected (0.112 seconds)

Appreciate your help.

Thanks,
Latha
This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain
information that is confidential. If you have received this message in error, please do not
read, copy or forward this message. Please notify the sender immediately, delete it from your
system and destroy any copies. You may not further disclose or distribute this email or its
attachments.

Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message