Or you you can use a simple predicate to filter out the header.
Something like
Select ….. from …. where columns[0] <> ‘“date1”’;
In case it doesn’t display correctly it is single quote ‘ then double quote “ then date1
followed by double quote and then single quote.
—Andries
On Apr 1, 2015, at 11:58 PM, Junjun Olympia <romeo.olympia@gmail.com> wrote:
> While waiting for DRILL-951
> <https://issues.apache.org/jira/browse/DRILL-951>, maybe you can use
> something like this:
>
> select sum(cast(trim(columns[6]) as int)) from HDFS.`/test.csv` where
> trim(columns[6]) similar to '^(\+|-)?[0-9]+(\.[0-9]+)?';
>
> Cheers,
>
> Junjun
>
>
> On Thu, Apr 2, 2015 at 2:43 PM, Mahesh Sankaran <sankarmahesh37@gmail.com>
> wrote:
>
>> we are waiting for Apache Drill 1.0.Thanks for the information.
>>
>> On Thu, Apr 2, 2015 at 12:04 PM, Aman Sinha <asinha@maprtech.com> wrote:
>>
>>> The exact release date depends on a variety of factors - I will let folks
>>> who manage the release timeline chime in.
>>>
>>> On Wed, Apr 1, 2015 at 11:19 PM, Mahesh Sankaran <
>> sankarmahesh37@gmail.com
>>>>
>>> wrote:
>>>
>>>> thank you aman.May i know the release date of apache drill 1.0.
>>>>
>>>> On Thu, Apr 2, 2015 at 11:40 AM, Aman Sinha <asinha@maprtech.com>
>> wrote:
>>>>
>>>>> Hi Mahesh,
>>>>> Please see https://issues.apache.org/jira/browse/DRILL-951 for the
>>>> issue
>>>>> of CSV headers. It is a feature that will be addressed in an
>> upcoming
>>>>> release (currently tagged for 1.0).
>>>>>
>>>>> Aman
>>>>>
>>>>> On Wed, Apr 1, 2015 at 10:52 PM, Mahesh Sankaran <
>>>> sankarmahesh37@gmail.com
>>>>>>
>>>>> wrote:
>>>>>
>>>>>> Hi ,
>>>>>> I am currently working in Apache Drill to analyse CSV
>>> files.My
>>>>>> problem is, If the CSV file has headers means we cant do any sum
>>>> query.It
>>>>>> shows the following errors.
>>>>>>
>>>>>> 0: jdbc:drill:zk=nn01:2181,dn02:2181,dn03:218> select
>>>> sum(cast(columns[6]
>>>>>> as int)) from HDFS.`/test.csv` limit 10;
>>>>>> Query failed: RemoteRpcException: Failure while running fragment.,
>>>>> rcvdbyte
>>>>>> [ 584925d6-dab6-42ce-8eb3-fa7abfb0e0f2 on nn01:31010 ]
>>>>>> [ 584925d6-dab6-42ce-8eb3-fa7abfb0e0f2 on nn01:31010 ]
>>>>>>
>>>>>>
>>>>>> Error: exception while executing query: Failure while executing
>>> query.
>>>>>> (state=,code=0)
>>>>>>
>>>>>> *But the above query is working well without headers.There is any
>> way
>>>> to
>>>>>> sum the columns in CSV files with headers in Apache Drill.*
>>>>>>
>>>>>> *This is our example file:*
>>>>>> 0: jdbc:drill:zk=nn01:2181,dn02:2181,dn03:218> select * from
>>>>>> HDFS.`/test.csv` limit 10;
>>>>>> +------------+------------+
>>>>>> | columns | dir0 |
>>>>>> +------------+------------+
>>>>>> |
>> ["date1","time1","srcip","dstip","service","sentbyte","rcvdbyte"] |
>>>>>> nn01:9000 |
>>>>>> |
>>>>>
>>> ["2015-01-01","00:00:00","10.10.100.74","192.168.0.12","DNS","0","193"] |
>>>>>> nn01:9000 |
>>>>>> |
>>>>>
>>> ["2015-01-01","00:00:00","10.10.100.74","192.168.0.12","DNS","0","166"] |
>>>>>> nn01:9000 |
>>>>>> |
>>>>>
>>> ["2015-01-01","00:00:00","10.10.100.74","192.168.0.12","DNS","60","359"]
>>>>>> | nn01:9000 |
>>>>>> |
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>> ["2015-01-01","00:00:00","10.10.50.195","106.10.193.45","php","717","359","0","0"]
>>>>>> | nn01:9000 |
>>>>>> |
>>>>>
>>> ["2015-01-01","00:00:00","111.123.180.44","117.239.67.36","9064","0","0"]
>>>>>> | nn01:9000 |
>>>>>> |
>>>>>
>>> ["2015-01-01","00:00:00","111.123.180.44","117.239.67.37","9064","0","0"]
>>>>>> | nn01:9000 |
>>>>>> |
>>>>>
>>> ["2015-01-01","00:00:00","111.123.180.44","117.239.67.38","9064","0","0"]
>>>>>> | nn01:9000 |
>>>>>> |
>>>>>
>>> ["2015-01-01","00:00:00","111.123.180.44","117.239.67.34","9064","0","0"]
>>>>>> | nn01:9000 |
>>>>>> |
>>>>>
>>> ["2015-01-01","00:00:00","111.123.180.44","117.239.67.44","9064","0","0"]
>>>>>> | nn01:9000 |
>>>>>>
>>>>>>
>>>>>> Thanks and Regards,
>>>>>>
>>>>>> Mahesh Sankaran
>>>>>>
>>>>>
>>>>
>>>
>>
|