Ramana,

 

Please find attached dservices.tar file.

 

Thanks for your help.

 

-Latha

 

From: Sivasubramaniam, Latha
Sent: Wednesday, April 08, 2015 1:33 PM
To: 'user@drill.apache.org'
Subject: RE: Unable to query data from hdfs

 

Thanks for all the responses.

 

Once I renamed files within directories to have extensions .csv, then it worked. So looks like for csv format, having extension is a must. It would be nice, if it does not allow “null” in the extension description.

 

Now in the next step of my proof of concept, I am trying to access parquet files. I have parquet files(tables) created for the tables using impala, I am assuming that I should be able to access those files via drill as well.

 

My parquet tables are placed under /user/hive/warehouse, like listed below here

 

 

[root@rtr-poc-imp1 sample-data]# hdfs dfs -ls /user/hive/warehouse

Found 19 items

drwxrwxrwt   - impala hive          0 2015-03-31 16:00 /user/hive/warehouse/dim_agent_status_parq

drwxrwxrwt   - impala hive          0 2015-03-31 16:00 /user/hive/warehouse/dim_agent_status_reasons_parq

drwxrwxrwt   - impala hive          0 2015-03-27 12:27 /user/hive/warehouse/dim_agents_parquet

drwxrwxrwt   - impala hive          0 2015-03-31 16:00 /user/hive/warehouse/dim_call_action_reasons_parq

drwxrwxrwt   - impala hive          0 2015-03-31 14:09 /user/hive/warehouse/dim_call_actions_parq

drwxrwxrwt   - impala hive          0 2015-03-31 13:54 /user/hive/warehouse/dim_call_types_parq

drwxrwxrwt   - impala hive          0 2015-03-31 15:59 /user/hive/warehouse/dim_dispositions_parq

drwxrwxrwt   - impala hive          0 2015-03-31 15:20 /user/hive/warehouse/dim_resource_groups_parq

drwxrwxrwt   - impala hive          0 2015-03-31 13:33 /user/hive/warehouse/dim_services_parq

drwxrwxrwt   - impala hive          0 2015-03-31 14:00 /user/hive/warehouse/dim_sites_parq

drwxrwxrwt   - impala hive          0 2015-03-31 15:25 /user/hive/warehouse/dim_workgroups_parq

drwxrwxrwx   - root   hive          0 2015-04-08 14:36 /user/hive/warehouse/dservices

drwxrwxrwt   - impala hive          0 2015-03-27 11:48 /user/hive/warehouse/edwpoc.db

drwxrwxrwt   - impala hive          0 2015-03-31 12:47 /user/hive/warehouse/fact_agent_activity_detail_12m_partparq

drwxrwxrwt   - impala hive          0 2015-03-30 13:03 /user/hive/warehouse/fact_contact_detail_12m_partparq

drwxrwxrwt   - impala hive          0 2015-03-27 13:36 /user/hive/warehouse/fact_contact_detail_partparq

-rw-r--r--   3 root   hive        455 2015-04-08 14:55 /user/hive/warehouse/region.parq

drwxrwxrwt   - impala hive          0 2015-03-25 22:29 /user/hive/warehouse/sample_07

drwxrwxrwt   - impala hive          0 2015-03-25 22:29 /user/hive/warehouse/sample_08

 

example listing from one of the directory

 

hdfs dfs -ls /user/hive/warehouse/dim_services_parq

Found 3 items

-rw-r--r--   3 impala hive      55121 2015-03-31 13:33 /user/hive/warehouse/dim_services_parq/4645c4221dafa337-250888c6ac1de29b_1376355963_data.0.parq

-rw-r--r--   3 impala hive      71075 2015-03-31 13:33 /user/hive/warehouse/dim_services_parq/4645c4221dafa337-250888c6ac1de29c_2123191845_data.0.parq

drwxrwxrwt   - impala hive          0 2015-03-31 13:33 /user/hive/warehouse/dim_services_parq/_impala_insert_staging

[root@rtr-poc-imp1 sample-data]#

 

There is nothing under impala staging directory, this is primarily used when insert operation is performed.

 

I copied dim_services_parq directory to dservices and below is the listing of dservices directory.

 

[root@rtr-poc-imp1 sample-data]#  hdfs dfs -ls /user/hive/warehouse/dservices

Found 2 items

-rwxrwxrwx   3 root hive      55121 2015-04-08 14:12 /user/hive/warehouse/dservices/service0.parquet

-rwxrwxrwx   3 root hive      71075 2015-04-08 14:12 /user/hive/warehouse/dservices/service1.parquet

 

Now when I try, I get the below error

 

select * from hdfs.drillpoc.`/dservices`;

Query failed: RemoteRpcException: Failure while running fragment., java.lang.UnsupportedOperationException [ cfca83ec-986a-43c0-a967-5aee102401dd on rtr-poc-imp2.labs.aspect.com:31010 ]

[ cfca83ec-986a-43c0-a967-5aee102401dd on rtr-poc-imp2.labs.aspect.com:31010 ]

 

I also copied the drill sample parquet file region.parquet to the same location and that works fine like below.

 

select * from hdfs.drillpoc.`region.parq`;

+-------------+------------+------------+

| R_REGIONKEY |   R_NAME   | R_COMMENT  |

+-------------+------------+------------+

| 0           | AFRICA     | lar deposits. blithe |

| 1           | AMERICA    | hs use ironic, even  |

| 2           | ASIA       | ges. thinly even pin |

| 3           | EUROPE     | ly final courts cajo |

| 4           | MIDDLE EAST | uickly special accou |

+-------------+------------+------------+

5 rows selected (0.122 seconds)

 

So far what I have read, impala created parquet file should be like any other parquet file, there should not be a problem. If this does not work, I need to convert all my tables in text format to parquet format and access it with drill. Is there any utility to do that.

 

Thanks for all the help.

Latha

 

 

 

 

 

 

 

From: Sivasubramaniam, Latha
Sent: Wednesday, April 08, 2015 8:00 AM
To: 'user@drill.apache.org'
Subject: RE: Unable to query data from hdfs

 

Hi,

 

Thanks for your responses. Even though I had done use hdfs, only when I fully qualified the file name it worked. But I am not able to access files without .csv extension.

 

I modified

 

"csv": {

      "type": "text",

      "extensions": [

        "csv"

      ],

      "delimiter": ","

 

To

 

"csv": {

      "type": "text",

      "extensions":  null,

      "delimiter": ","

 

And tried to access hdfs file ‘DIM_Agents’ and I get the same error. With null extensions, I can’t access ‘test.csv’ also, once I reverted back csv format description then I could access test.csv again, but I cannot access other files with either of the format descriptions.

 

Below are what I tried. Is ‘_’  (underscore) a problem in the file name. All my hdfs files are in text format.

 

0: jdbc:drill:zk=rtr-poc-imp1:2181> select * from hdfs.root.`/test.csv`;

+------------+------------+

|  columns   |    dir0    |

+------------+------------+

| ["1","Latha"] | root       |

| ["2","Roshan"] | root       |

+------------+------------+

2 rows selected (0.276 seconds)

0: jdbc:drill:zk=rtr-poc-imp1:2181> select * from hdfs.root.`/DIM_Agents`;

Query failed: SqlValidatorException: Table 'hdfs.root./DIM_Agents' not found

 

Error: exception while executing query: Failure while executing query. (state=,code=0)

0: jdbc:drill:zk=rtr-poc-imp1:2181> select * from hdfs.root.`/DIM_Agents`;

Query failed: SqlValidatorException: Table 'hdfs.root./DIM_Agents' not found

 

Error: exception while executing query: Failure while executing query. (state=,code=0)

0: jdbc:drill:zk=rtr-poc-imp1:2181> select * from hdfs.root.`/test.csv`;

Query failed: SqlValidatorException: Table 'hdfs.root./test.csv' not found

 

Error: exception while executing query: Failure while executing query. (state=,code=0)

0: jdbc:drill:zk=rtr-poc-imp1:2181> select * from hdfs.root.`/DIM_Agents`;

Query failed: SqlValidatorException: Table 'hdfs.root./DIM_Agents' not found

 

Error: exception while executing query: Failure while executing query. (state=,code=0)

0: jdbc:drill:zk=rtr-poc-imp1:2181> select * from hdfs.root.`/test.csv`;

Query failed: SqlValidatorException: Table 'hdfs.root./test.csv' not found

 

Error: exception while executing query: Failure while executing query. (state=,code=0)

0: jdbc:drill:zk=rtr-poc-imp1:2181> select * from hdfs.root.`/test.csv`;

+------------+------------+

|  columns   |    dir0    |

+------------+------------+

| ["1","Latha"] | root       |

| ["2","Roshan"] | root       |

+------------+------------+

2 rows selected (0.112 seconds)

 

Appreciate your help.

 

Thanks,

Latha

This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.