drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt <bsg...@gmail.com>
Subject Re: CTAS error with CSV data
Date Thu, 04 Feb 2016 22:55:06 GMT
Is there any more information I can supply in this issue?

Its a blocker for Drill adoption for us, and my ability to diagnose 
exceptions in Java based systems is very limited ;)


On 27 Jan 2016, at 14:30, Matt wrote:

> https://issues.apache.org/jira/browse/DRILL-4317
>
> On 26 Jan 2016, at 23:50, Abdel Hakim Deneche wrote:
>
>> This definitely looks like a bug, could you open a JIRA and share as 
>> much
>> information as possible about the structure of the CSV file and the 
>> number
>> of records.
>>
>> On Tue, Jan 26, 2016 at 7:38 PM, Matt <bsg075@gmail.com> wrote:
>>
>>> The CTAS with fails with:
>>>
>>> ~~~
>>> Error: SYSTEM ERROR: IllegalArgumentException: length: -260 
>>> (expected: >=
>>> 0)
>>>
>>> Fragment 1:2
>>>
>>> [Error Id: 1807615e-4385-4f85-8402-5900aaa568e9 on es07:31010]
>>>
>>> (java.lang.IllegalArgumentException) length: -260 (expected: >= 0)
>>> io.netty.buffer.AbstractByteBuf.checkIndex():1131
>>> io.netty.buffer.PooledUnsafeDirectByteBuf.nioBuffer():344
>>> io.netty.buffer.WrappedByteBuf.nioBuffer():727
>>> io.netty.buffer.UnsafeDirectLittleEndian.nioBuffer():26
>>> io.netty.buffer.DrillBuf.nioBuffer():356
>>>
>>> org.apache.drill.exec.store.ParquetOutputRecordWriter$VarCharParquetConverter.writeField():1842
>>> org.apache.drill.exec.store.EventBasedRecordWriter.write():62
>>> org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext():106
>>> org.apache.drill.exec.record.AbstractRecordBatch.next():162
>>> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>>>
>>> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():93
>>> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>>> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():256
>>> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():250
>>> java.security.AccessController.doPrivileged():-2
>>> javax.security.auth.Subject.doAs():415
>>> org.apache.hadoop.security.UserGroupInformation.doAs():1657
>>> org.apache.drill.exec.work.fragment.FragmentExecutor.run():250
>>> org.apache.drill.common.SelfCleaningRunnable.run():38
>>> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
>>> java.lang.Thread.run():745 (state=,code=0)
>>> ~~~
>>>
>>> And a simple SELECT * fails with:
>>>
>>> ~~~
>>> java.lang.IndexOutOfBoundsException: index: 547681, length: 1 
>>> (expected:
>>> range(0, 547681))
>>>  at
>>> io.netty.buffer.AbstractByteBuf.checkIndex(AbstractByteBuf.java:1134)
>>>  at
>>> io.netty.buffer.PooledUnsafeDirectByteBuf.getBytes(PooledUnsafeDirectByteBuf.java:136)
>>>  at io.netty.buffer.WrappedByteBuf.getBytes(WrappedByteBuf.java:289)
>>>  at
>>> io.netty.buffer.UnsafeDirectLittleEndian.getBytes(UnsafeDirectLittleEndian.java:26)
>>>  at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586)
>>>  at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586)
>>>  at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586)
>>>  at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586)
>>>  at
>>> org.apache.drill.exec.vector.VarCharVector$Accessor.get(VarCharVector.java:443)
>>>  at
>>> org.apache.drill.exec.vector.accessor.VarCharAccessor.getBytes(VarCharAccessor.java:125)
>>>  at
>>> org.apache.drill.exec.vector.accessor.VarCharAccessor.getString(VarCharAccessor.java:146)
>>>  at
>>> org.apache.drill.exec.vector.accessor.VarCharAccessor.getObject(VarCharAccessor.java:136)
>>>  at
>>> org.apache.drill.exec.vector.accessor.VarCharAccessor.getObject(VarCharAccessor.java:94)
>>>  at
>>> org.apache.drill.exec.vector.accessor.BoundCheckingAccessor.getObject(BoundCheckingAccessor.java:148)
>>>  at
>>> org.apache.drill.jdbc.impl.TypeConvertingSqlAccessor.getObject(TypeConvertingSqlAccessor.java:795)
>>>  at
>>> org.apache.drill.jdbc.impl.AvaticaDrillSqlAccessor.getObject(AvaticaDrillSqlAccessor.java:179)
>>>  at
>>> net.hydromatic.avatica.AvaticaResultSet.getObject(AvaticaResultSet.java:351)
>>>  at
>>> org.apache.drill.jdbc.impl.DrillResultSetImpl.getObject(DrillResultSetImpl.java:420)
>>>  at sqlline.Rows$Row.<init>(Rows.java:157)
>>>  at sqlline.IncrementalRows.hasNext(IncrementalRows.java:63)
>>>  at
>>> sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
>>>  at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
>>>  at sqlline.SqlLine.print(SqlLine.java:1593)
>>>  at sqlline.Commands.execute(Commands.java:852)
>>>  at sqlline.Commands.sql(Commands.java:751)
>>>  at sqlline.SqlLine.dispatch(SqlLine.java:746)
>>>  at sqlline.SqlLine.begin(SqlLine.java:621)
>>>  at sqlline.SqlLine.start(SqlLine.java:375)
>>>  at sqlline.SqlLine.main(SqlLine.java:268)
>>> ~~~
>>>
>>> It also looks like if I run the SELECT from a bash shell as "sqlline 
>>> -u
>>> ... -f test.sql 2>&1 > test.out" upon the error the sqlline session

>>> "locks
>>> up". No errors spool to the out file and the Java thread can only be
>>> terminated with a kill -9. It can be backgrounded with ^z, but won't
>>> respond to a ^c.
>>>
>>>
>>> On 26 Jan 2016, at 14:07, Abdel Hakim Deneche wrote:
>>>
>>> It's an internal buffer index. Can you try enabling verbose errors 
>>> and run
>>>> the query again, this should provide us with more details about the 
>>>> error.
>>>> You can enable verbose error by running the following before the 
>>>> select *:
>>>>
>>>> alter session set `exec.errors.verbose`=true;
>>>>
>>>> thanks
>>>>
>>>> On Tue, Jan 26, 2016 at 11:01 AM, Matt <bsg075@gmail.com> wrote:
>>>>
>>>> Putting the "select * from
>>>>> `/csv/customer/hourly/customer_201510170000.csv`;" in a local .sql 
>>>>> file,
>>>>> and executing it with sqlline > /dev/null (to avoid a ton of 
>>>>> scrolling)
>>>>> results in:
>>>>>
>>>>> ~~~
>>>>> index: 418719, length: 2 (expected: range(0, 418719))
>>>>>                                                  Aborting command
>>>>> set because "force" is false and command failed: "select * from
>>>>> `/csv/customer/hourly/customer_201510170000.csv`;"
>>>>> Closing: org.apache.drill.jdbc.impl.DrillConnectionImpl
>>>>> ~~~
>>>>>
>>>>> Is that index a byte or line offset?
>>>>>
>>>>>
>>>>> On 26 Jan 2016, at 12:55, Abdel Hakim Deneche wrote:
>>>>>
>>>>> Does a select * on the same data also fail ?
>>>>>
>>>>>>
>>>>>> On Tue, Jan 26, 2016 at 9:44 AM, Matt <bsg075@gmail.com> wrote:
>>>>>>
>>>>>> Getting some errors when attempting to create Parquet files from

>>>>>> CSV
>>>>>> data,
>>>>>>
>>>>>>> and trying to determine if it is due to the format of the source

>>>>>>> data.
>>>>>>>
>>>>>>> Its a fairly simple format of
>>>>>>> "datetime,key,key,key,numeric,numeric,numeric, ..." with 32 of

>>>>>>> those
>>>>>>> numeric columns in total.
>>>>>>>
>>>>>>> The source data does contain a lot missing values for the 
>>>>>>> numeric
>>>>>>> columns,
>>>>>>> and those are represented by as consecutive delimiters:
>>>>>>> ""datetime,key,key,key,numeric,,,,,,..."
>>>>>>>
>>>>>>> Could this be causing the CTAS to fail with these types of 
>>>>>>> errors? Or
>>>>>>> is
>>>>>>> there another cause to look for?
>>>>>>>
>>>>>>> ~~~
>>>>>>> Error: SYSTEM ERROR: IllegalArgumentException: length: -260 
>>>>>>> (expected:
>>>>>>>> =
>>>>>>> 0)
>>>>>>>
>>>>>>> │·······························································
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> │·······························································
>>>>>>> Fragment 1:2
>>>>>>> ~~~
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Abdelhakim Deneche
>>>>>>
>>>>>> Software Engineer
>>>>>>
>>>>>> <http://www.mapr.com/>
>>>>>>
>>>>>>
>>>>>> Now Available - Free Hadoop On-Demand Training
>>>>>> <
>>>>>>
>>>>>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>> --
>>>>
>>>> Abdelhakim Deneche
>>>>
>>>> Software Engineer
>>>>
>>>> <http://www.mapr.com/>
>>>>
>>>>
>>>> Now Available - Free Hadoop On-Demand Training
>>>> <
>>>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>>>>>
>>>>
>>>
>>
>>
>> -- 
>>
>> Abdelhakim Deneche
>>
>> Software Engineer
>>
>> <http://www.mapr.com/>
>>
>>
>> Now Available - Free Hadoop On-Demand Training
>> <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message