drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ramana Inukonda <rinuko...@maprtech.com>
Subject Re: Unable to query data from hdfs
Date Wed, 08 Apr 2015 20:36:54 GMT
Hey Latha,

Would it be possible for you to share a sample file generated from impala
which you cannot read? I am testing these out and it will be great to see
what is not working.
If the data is sensitive if you can share impala metadata that would be
enormously helpful as well.

Regards
Ramana


On Wed, Apr 8, 2015 at 1:32 PM, Sivasubramaniam, Latha <
Latha.Sivasubramaniam@aspect.com> wrote:

> Thanks for all the responses.
>
> Once I renamed files within directories to have extensions .csv, then it
> worked. So looks like for csv format, having extension is a must. It would
> be nice, if it does not allow "null" in the extension description.
>
> Now in the next step of my proof of concept, I am trying to access parquet
> files. I have parquet files(tables) created for the tables using impala, I
> am assuming that I should be able to access those files via drill as well.
>
> My parquet tables are placed under /user/hive/warehouse, like listed below
> here
>
>
> [root@rtr-poc-imp1 sample-data]# hdfs dfs -ls /user/hive/warehouse
> Found 19 items
> drwxrwxrwt   - impala hive          0 2015-03-31 16:00
> /user/hive/warehouse/dim_agent_status_parq
> drwxrwxrwt   - impala hive          0 2015-03-31 16:00
> /user/hive/warehouse/dim_agent_status_reasons_parq
> drwxrwxrwt   - impala hive          0 2015-03-27 12:27
> /user/hive/warehouse/dim_agents_parquet
> drwxrwxrwt   - impala hive          0 2015-03-31 16:00
> /user/hive/warehouse/dim_call_action_reasons_parq
> drwxrwxrwt   - impala hive          0 2015-03-31 14:09
> /user/hive/warehouse/dim_call_actions_parq
> drwxrwxrwt   - impala hive          0 2015-03-31 13:54
> /user/hive/warehouse/dim_call_types_parq
> drwxrwxrwt   - impala hive          0 2015-03-31 15:59
> /user/hive/warehouse/dim_dispositions_parq
> drwxrwxrwt   - impala hive          0 2015-03-31 15:20
> /user/hive/warehouse/dim_resource_groups_parq
> drwxrwxrwt   - impala hive          0 2015-03-31 13:33
> /user/hive/warehouse/dim_services_parq
> drwxrwxrwt   - impala hive          0 2015-03-31 14:00
> /user/hive/warehouse/dim_sites_parq
> drwxrwxrwt   - impala hive          0 2015-03-31 15:25
> /user/hive/warehouse/dim_workgroups_parq
> drwxrwxrwx   - root   hive          0 2015-04-08 14:36
> /user/hive/warehouse/dservices
> drwxrwxrwt   - impala hive          0 2015-03-27 11:48
> /user/hive/warehouse/edwpoc.db
> drwxrwxrwt   - impala hive          0 2015-03-31 12:47
> /user/hive/warehouse/fact_agent_activity_detail_12m_partparq
> drwxrwxrwt   - impala hive          0 2015-03-30 13:03
> /user/hive/warehouse/fact_contact_detail_12m_partparq
> drwxrwxrwt   - impala hive          0 2015-03-27 13:36
> /user/hive/warehouse/fact_contact_detail_partparq
> -rw-r--r--   3 root   hive        455 2015-04-08 14:55
> /user/hive/warehouse/region.parq
> drwxrwxrwt   - impala hive          0 2015-03-25 22:29
> /user/hive/warehouse/sample_07
> drwxrwxrwt   - impala hive          0 2015-03-25 22:29
> /user/hive/warehouse/sample_08
>
> example listing from one of the directory
>
> hdfs dfs -ls /user/hive/warehouse/dim_services_parq
> Found 3 items
> -rw-r--r--   3 impala hive      55121 2015-03-31 13:33
> /user/hive/warehouse/dim_services_parq/4645c4221dafa337-250888c6ac1de29b_1376355963_data.0.parq
> -rw-r--r--   3 impala hive      71075 2015-03-31 13:33
> /user/hive/warehouse/dim_services_parq/4645c4221dafa337-250888c6ac1de29c_2123191845_data.0.parq
> drwxrwxrwt   - impala hive          0 2015-03-31 13:33
> /user/hive/warehouse/dim_services_parq/_impala_insert_staging
> [root@rtr-poc-imp1 sample-data]#
>
> There is nothing under impala staging directory, this is primarily used
> when insert operation is performed.
>
> I copied dim_services_parq directory to dservices and below is the listing
> of dservices directory.
>
> [root@rtr-poc-imp1 sample-data]#  hdfs dfs -ls
> /user/hive/warehouse/dservices
> Found 2 items
> -rwxrwxrwx   3 root hive      55121 2015-04-08 14:12
> /user/hive/warehouse/dservices/service0.parquet
> -rwxrwxrwx   3 root hive      71075 2015-04-08 14:12
> /user/hive/warehouse/dservices/service1.parquet
>
> Now when I try, I get the below error
>
> select * from hdfs.drillpoc.`/dservices`;
> Query failed: RemoteRpcException: Failure while running fragment.,
> java.lang.UnsupportedOperationException [
> cfca83ec-986a-43c0-a967-5aee102401dd on rtr-poc-imp2.labs.aspect.com:31010
> ]
> [ cfca83ec-986a-43c0-a967-5aee102401dd on
> rtr-poc-imp2.labs.aspect.com:31010 ]
>
> I also copied the drill sample parquet file region.parquet to the same
> location and that works fine like below.
>
> select * from hdfs.drillpoc.`region.parq`;
> +-------------+------------+------------+
> | R_REGIONKEY |   R_NAME   | R_COMMENT  |
> +-------------+------------+------------+
> | 0           | AFRICA     | lar deposits. blithe |
> | 1           | AMERICA    | hs use ironic, even  |
> | 2           | ASIA       | ges. thinly even pin |
> | 3           | EUROPE     | ly final courts cajo |
> | 4           | MIDDLE EAST | uickly special accou |
> +-------------+------------+------------+
> 5 rows selected (0.122 seconds)
>
> So far what I have read, impala created parquet file should be like any
> other parquet file, there should not be a problem. If this does not work, I
> need to convert all my tables in text format to parquet format and access
> it with drill. Is there any utility to do that.
>
> Thanks for all the help.
> Latha
>
>
>
>
>
>
>
> From: Sivasubramaniam, Latha
> Sent: Wednesday, April 08, 2015 8:00 AM
> To: 'user@drill.apache.org'
> Subject: RE: Unable to query data from hdfs
>
> Hi,
>
> Thanks for your responses. Even though I had done use hdfs, only when I
> fully qualified the file name it worked. But I am not able to access files
> without .csv extension.
>
> I modified
>
> "csv": {
>       "type": "text",
>       "extensions": [
>         "csv"
>       ],
>       "delimiter": ","
>
> To
>
> "csv": {
>       "type": "text",
>       "extensions":  null,
>       "delimiter": ","
>
> And tried to access hdfs file 'DIM_Agents' and I get the same error. With
> null extensions, I can't access 'test.csv' also, once I reverted back csv
> format description then I could access test.csv again, but I cannot access
> other files with either of the format descriptions.
>
> Below are what I tried. Is '_'  (underscore) a problem in the file name.
> All my hdfs files are in text format.
>
> 0: jdbc:drill:zk=rtr-poc-imp1:2181> select * from hdfs.root.`/test.csv`;
> +------------+------------+
> |  columns   |    dir0    |
> +------------+------------+
> | ["1","Latha"] | root       |
> | ["2","Roshan"] | root       |
> +------------+------------+
> 2 rows selected (0.276 seconds)
> 0: jdbc:drill:zk=rtr-poc-imp1:2181> select * from hdfs.root.`/DIM_Agents`;
> Query failed: SqlValidatorException: Table 'hdfs.root./DIM_Agents' not
> found
>
> Error: exception while executing query: Failure while executing query.
> (state=,code=0)
> 0: jdbc:drill:zk=rtr-poc-imp1:2181> select * from hdfs.root.`/DIM_Agents`;
> Query failed: SqlValidatorException: Table 'hdfs.root./DIM_Agents' not
> found
>
> Error: exception while executing query: Failure while executing query.
> (state=,code=0)
> 0: jdbc:drill:zk=rtr-poc-imp1:2181> select * from hdfs.root.`/test.csv`;
> Query failed: SqlValidatorException: Table 'hdfs.root./test.csv' not found
>
> Error: exception while executing query: Failure while executing query.
> (state=,code=0)
> 0: jdbc:drill:zk=rtr-poc-imp1:2181> select * from hdfs.root.`/DIM_Agents`;
> Query failed: SqlValidatorException: Table 'hdfs.root./DIM_Agents' not
> found
>
> Error: exception while executing query: Failure while executing query.
> (state=,code=0)
> 0: jdbc:drill:zk=rtr-poc-imp1:2181> select * from hdfs.root.`/test.csv`;
> Query failed: SqlValidatorException: Table 'hdfs.root./test.csv' not found
>
> Error: exception while executing query: Failure while executing query.
> (state=,code=0)
> 0: jdbc:drill:zk=rtr-poc-imp1:2181> select * from hdfs.root.`/test.csv`;
> +------------+------------+
> |  columns   |    dir0    |
> +------------+------------+
> | ["1","Latha"] | root       |
> | ["2","Roshan"] | root       |
> +------------+------------+
> 2 rows selected (0.112 seconds)
>
> Appreciate your help.
>
> Thanks,
> Latha
> This email (including any attachments) is proprietary to Aspect Software,
> Inc. and may contain information that is confidential. If you have received
> this message in error, please do not read, copy or forward this message.
> Please notify the sender immediately, delete it from your system and
> destroy any copies. You may not further disclose or distribute this email
> or its attachments.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message