drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andries Engelbrecht <aengelbre...@maprtech.com>
Subject Re: CSV header issue
Date Thu, 02 Apr 2015 15:18:12 GMT
Or you you can use a simple predicate to filter out the header.

Something like 

Select ….. from …. where columns[0] <> ‘“date1”’;

In case it doesn’t display correctly it is single quote ‘ then double quote “ then date1
followed by double quote and then single quote.

—Andries


On Apr 1, 2015, at 11:58 PM, Junjun Olympia <romeo.olympia@gmail.com> wrote:

> While waiting for DRILL-951
> <https://issues.apache.org/jira/browse/DRILL-951>, maybe you can use
> something like this:
> 
> select sum(cast(trim(columns[6]) as int)) from HDFS.`/test.csv` where
> trim(columns[6]) similar to '^(\+|-)?[0-9]+(\.[0-9]+)?';
> 
> Cheers,
> 
> Junjun
> 
> 
> On Thu, Apr 2, 2015 at 2:43 PM, Mahesh Sankaran <sankarmahesh37@gmail.com>
> wrote:
> 
>> we are waiting for Apache Drill 1.0.Thanks for the information.
>> 
>> On Thu, Apr 2, 2015 at 12:04 PM, Aman Sinha <asinha@maprtech.com> wrote:
>> 
>>> The exact release date depends on a variety of factors - I will let folks
>>> who manage the release timeline chime in.
>>> 
>>> On Wed, Apr 1, 2015 at 11:19 PM, Mahesh Sankaran <
>> sankarmahesh37@gmail.com
>>>> 
>>> wrote:
>>> 
>>>> thank you aman.May i know the release date of apache drill 1.0.
>>>> 
>>>> On Thu, Apr 2, 2015 at 11:40 AM, Aman Sinha <asinha@maprtech.com>
>> wrote:
>>>> 
>>>>> Hi Mahesh,
>>>>> Please see https://issues.apache.org/jira/browse/DRILL-951  for the
>>>> issue
>>>>> of CSV headers.  It is a feature that will be addressed in an
>> upcoming
>>>>> release (currently tagged for 1.0).
>>>>> 
>>>>> Aman
>>>>> 
>>>>> On Wed, Apr 1, 2015 at 10:52 PM, Mahesh Sankaran <
>>>> sankarmahesh37@gmail.com
>>>>>> 
>>>>> wrote:
>>>>> 
>>>>>> Hi ,
>>>>>>         I am currently working in Apache Drill to analyse CSV
>>> files.My
>>>>>> problem is, If the CSV file has headers means we cant do any sum
>>>> query.It
>>>>>> shows the following errors.
>>>>>> 
>>>>>> 0: jdbc:drill:zk=nn01:2181,dn02:2181,dn03:218> select
>>>> sum(cast(columns[6]
>>>>>> as int)) from HDFS.`/test.csv` limit 10;
>>>>>> Query failed: RemoteRpcException: Failure while running fragment.,
>>>>> rcvdbyte
>>>>>> [ 584925d6-dab6-42ce-8eb3-fa7abfb0e0f2 on nn01:31010 ]
>>>>>> [ 584925d6-dab6-42ce-8eb3-fa7abfb0e0f2 on nn01:31010 ]
>>>>>> 
>>>>>> 
>>>>>> Error: exception while executing query: Failure while executing
>>> query.
>>>>>> (state=,code=0)
>>>>>> 
>>>>>> *But the above query is working well without headers.There is any
>> way
>>>> to
>>>>>> sum the columns in CSV files with headers in Apache Drill.*
>>>>>> 
>>>>>> *This is our example file:*
>>>>>> 0: jdbc:drill:zk=nn01:2181,dn02:2181,dn03:218> select * from
>>>>>> HDFS.`/test.csv` limit 10;
>>>>>> +------------+------------+
>>>>>> |  columns   |    dir0    |
>>>>>> +------------+------------+
>>>>>> |
>> ["date1","time1","srcip","dstip","service","sentbyte","rcvdbyte"] |
>>>>>> nn01:9000  |
>>>>>> |
>>>>> 
>>> ["2015-01-01","00:00:00","10.10.100.74","192.168.0.12","DNS","0","193"] |
>>>>>> nn01:9000  |
>>>>>> |
>>>>> 
>>> ["2015-01-01","00:00:00","10.10.100.74","192.168.0.12","DNS","0","166"] |
>>>>>> nn01:9000  |
>>>>>> |
>>>>> 
>>> ["2015-01-01","00:00:00","10.10.100.74","192.168.0.12","DNS","60","359"]
>>>>>> | nn01:9000  |
>>>>>> |
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> ["2015-01-01","00:00:00","10.10.50.195","106.10.193.45","php","717","359","0","0"]
>>>>>> | nn01:9000  |
>>>>>> |
>>>>> 
>>> ["2015-01-01","00:00:00","111.123.180.44","117.239.67.36","9064","0","0"]
>>>>>> | nn01:9000  |
>>>>>> |
>>>>> 
>>> ["2015-01-01","00:00:00","111.123.180.44","117.239.67.37","9064","0","0"]
>>>>>> | nn01:9000  |
>>>>>> |
>>>>> 
>>> ["2015-01-01","00:00:00","111.123.180.44","117.239.67.38","9064","0","0"]
>>>>>> | nn01:9000  |
>>>>>> |
>>>>> 
>>> ["2015-01-01","00:00:00","111.123.180.44","117.239.67.34","9064","0","0"]
>>>>>> | nn01:9000  |
>>>>>> |
>>>>> 
>>> ["2015-01-01","00:00:00","111.123.180.44","117.239.67.44","9064","0","0"]
>>>>>> | nn01:9000  |
>>>>>> 
>>>>>> 
>>>>>> Thanks and Regards,
>>>>>> 
>>>>>> Mahesh Sankaran
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 


Mime
View raw message