drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: ***UNCHECKED*** Re: Query Error on PCAP over MapR FS
Date Tue, 12 Sep 2017 23:30:41 GMT
PCAP is a binary format that cannot easily be split.



On Wed, Sep 13, 2017 at 1:15 AM, Robert Hou <rhou@mapr.com> wrote:

> Hi Ted,
>
>
> My understanding is, by default, Drill should not have multiple threads
> reading the PCAP file in parallel.  But if the records in a PCAP file can
> be read in parallel (e.g. rows are restricted to one line each), then the
> plugin can be designed to read the file in parallel.
>
>
> Are PCAP records single-line records?
>
>
> Thanks.
>
>
> --Robert
>
> ________________________________
> From: Ted Dunning <ted.dunning@gmail.com>
> Sent: Tuesday, September 12, 2017 3:54 PM
> To: user
> Cc: jni@apache.org
> Subject: Re: ***UNCHECKED*** Re: Query Error on PCAP over MapR FS
>
> Robert,
>
> Thanks for looking at this. I think that this is a bug in the way that we
> wrote the format plugin in that it isn't telling Drill to not split the
> file.
>
>
>
> On Wed, Sep 13, 2017 at 12:52 AM, Robert Hou <rhou@mapr.com> wrote:
>
> > I asked a couple of Drill developers.  We don't have much experience with
> > PCAP yet.  Takeo, can you file a Jira for this, and include the
> information
> > below?  The error message mentions a bad magic number, which Drill
> > sometimes uses to help determine the file format.
> >
> >
> > Also, it appears that you have tried to query your data across many small
> > files rather than one large file.  This is the preferred approach, and it
> > seems that this approach works for you.  Please let me know if you think
> > otherwise, that you need to access your data in one large PCAP file.
> >
> >
> > Thanks.
> >
> >
> > --Robert
> >
> >
> > ________________________________
> > From: Ted Dunning <ted.dunning@gmail.com>
> > Sent: Monday, September 11, 2017 8:15 PM
> > To: user; jni@apache.org
> > Subject: Re: ***UNCHECKED*** Re: Query Error on PCAP over MapR FS
> >
> > This stack trace makes it clear that this is a bug in the PCAP decoder
> > caused by a misunderstanding of how to force large files to be read in
> one
> > batch on a single drillBit.
> >
> > Are there some real Drill experts out there who can provide hints about
> how
> > to avoid this?
> >
> >
> >
> > On Tue, Sep 12, 2017 at 5:03 AM, Takeo Ogawara <
> > ta-ogawara@kddi-research.jp>
> > wrote:
> >
> > > Sorry
> > >
> > > I paste plain texts.
> > >
> > > > 2017-09-11 15:06:52,390 [BitServer-2] WARN  o.a.d.exec.rpc.control.
> > WorkEventBus
> > > - A fragment message arrived but there was no registered listener for
> > that
> > > message: profile {
> > > >   state: FAILED
> > > >   error {
> > > >     error_id: "bbf284b6-9da4-4869-ac20-fa100eed11b9"
> > > >     endpoint {
> > > >       address: "node22"
> > > >       user_port: 31010
> > > >       control_port: 31011
> > > >       data_port: 31012
> > > >       version: "1.11.0"
> > > >     }
> > > >     error_type: SYSTEM
> > > >     message: "SYSTEM ERROR: IllegalStateException: Bad magic number =
> > > 0a0d0d0a\n\nFragment 1:200\n\n[Error Id: bbf284b6-9da4-4869-ac20-
> > fa100eed11b9
> > > on node22:31010]"
> > > >     exception {
> > > >       exception_class: "java.lang.IllegalStateException"
> > > >       message: "Bad magic number = 0a0d0d0a"
> > > >       stack_trace {
> > > >         class_name: "com.google.common.base.Preconditions"
> > > >         file_name: "Preconditions.java"
> > > >         line_number: 173
> > > >         method_name: "checkState"
> > > >         is_native_method: false
> > > >       }
> > > >       stack_trace {
> > > >         class_name: "org.apache.drill.exec.store.
> > > pcap.decoder.PacketDecoder"
> > > >         file_name: "PacketDecoder.java"
> > > >         line_number: 84
> > > >         method_name: "<init>"
> > > >         is_native_method: false
> > > >       }
> > > >       stack_trace {
> > > >         class_name: "org.apache.drill.exec.store.
> > pcap.PcapRecordReader"
> > > >         file_name: "PcapRecordReader.java"
> > > >         line_number: 104
> > > >         method_name: "setup"
> > > >         is_native_method: false
> > > >       }
> > > >       stack_trace {
> > > >         class_name: "org.apache.drill.exec.physical.impl.ScanBatch"
> > > >         file_name: "ScanBatch.java"
> > > >         line_number: 104
> > > >         method_name: "<init>"
> > > >         is_native_method: false
> > > >       }
> > > >       stack_trace {
> > > >         class_name: "org.apache.drill.exec.store.
> > > dfs.easy.EasyFormatPlugin"
> > > >         file_name: "EasyFormatPlugin.java"
> > > >         line_number: 166
> > > >         method_name: "getReaderBatch"
> > > >         is_native_method: false
> > > >       }
> > > >       stack_trace {
> > > >         class_name: "org.apache.drill.exec.store.dfs.easy.
> > > EasyReaderBatchCreator"
> > > >         file_name: "EasyReaderBatchCreator.java"
> > > >         line_number: 35
> > > >         method_name: "getBatch"
> > > >         is_native_method: false
> > > >       }
> > > >       stack_trace {
> > > >         class_name: "org.apache.drill.exec.store.dfs.easy.
> > > EasyReaderBatchCreator"
> > > >         file_name: "EasyReaderBatchCreator.java"
> > > >         line_number: 28
> > > >         method_name: "getBatch"
> > > >         is_native_method: false
> > > >       }
> > > >       stack_trace {
> > > >         class_name: "org.apache.drill.exec.
> physical.impl.ImplCreator"
> > > >         file_name: "ImplCreator.java"
> > > >         line_number: 156
> > > >         method_name: "getRecordBatch"
> > > >         is_native_method: false
> > > >       }
> > > >       stack_trace {
> > > >         class_name: "org.apache.drill.exec.
> physical.impl.ImplCreator"
> > > >         file_name: "ImplCreator.java"
> > > >         line_number: 179
> > > >         method_name: "getChildren"
> > > >         is_native_method: false
> > > >       }
> > > >       stack_trace {
> > > >         class_name: "org.apache.drill.exec.
> physical.impl.ImplCreator"
> > > >         file_name: "ImplCreator.java"
> > > >         line_number: 136
> > > >         method_name: "getRecordBatch"
> > > >         is_native_method: false
> > > >       }
> > > >       stack_trace {
> > > >         class_name: "org.apache.drill.exec.
> physical.impl.ImplCreator"
> > > >         file_name: "ImplCreator.java"
> > > >         line_number: 179
> > > >         method_name: "getChildren"
> > > >         is_native_method: false
> > > >       }
> > > >       stack_trace {
> > > >         class_name: "org.apache.drill.exec.
> physical.impl.ImplCreator"
> > > >         file_name: "ImplCreator.java"
> > > >         line_number: 136
> > > >         method_name: "getRecordBatch"
> > > >         is_native_method: false
> > > >       }
> > > >       stack_trace {
> > > >         class_name: "org.apache.drill.exec.
> physical.impl.ImplCreator"
> > > >         file_name: "ImplCreator.java"
> > > >         line_number: 179
> > > >         method_name: "getChildren"
> > > >         is_native_method: false
> > > >       }
> > > >       stack_trace {
> > > >         class_name: "org.apache.drill.exec.
> physical.impl.ImplCreator"
> > > >         file_name: "ImplCreator.java"
> > > >         line_number: 109
> > > >         method_name: "getRootExec"
> > > >         is_native_method: false
> > > >       }
> > > >       stack_trace {
> > > >         class_name: "org.apache.drill.exec.
> physical.impl.ImplCreator"
> > > >         file_name: "ImplCreator.java"
> > > >         line_number: 87
> > > >         method_name: "getExec"
> > > >         is_native_method: false
> > > >       }
> > > >       stack_trace {
> > > >         class_name: "org.apache.drill.exec.work.
> > > fragment.FragmentExecutor"
> > > >         file_name: "FragmentExecutor.java"
> > > >         line_number: 207
> > > >         method_name: "run"
> > > >         is_native_method: false
> > > >       }
> > > >       stack_trace {
> > > >         class_name: "org.apache.drill.common.SelfCleaningRunnable"
> > > >         file_name: "SelfCleaningRunnable.java"
> > > >         line_number: 38
> > > >         method_name: "run"
> > > >         is_native_method: false
> > > >       }
> > > >       stack_trace {
> > > >         class_name: "..."
> > > >         line_number: 0
> > > >         method_name: "..."
> > > >         is_native_method: false
> > > >       }
> > > >     }
> > > >   }
> > > >   minor_fragment_id: 200
> > > >   operator_profile {
> > > >     input_profile {
> > > >       records: 0
> > > >       batches: 0
> > > >       schemas: 0
> > > >     }
> > > >     operator_id: 0
> > > >     operator_type: 37
> > > >     setup_nanos: 0
> > > >     process_nanos: 29498572
> > > >     peak_local_memory_allocated: 0
> > > >     wait_nanos: 0
> > > >   }
> > > >   start_time: 1505110011975
> > > >   end_time: 1505110012320
> > > >   memory_used: 0
> > > >   max_memory_used: 1000000
> > > >   endpoint {
> > > >     address: "node22"
> > > >     user_port: 31010
> > > >     control_port: 31011
> > > >     data_port: 31012
> > > >     version: "1.11.0"
> > > >   }
> > > > }
> > > > handle {
> > > >   query_id {
> > > >     part1: 2758973773160297386
> > > >     part2: -412723615757922113
> > > >   }
> > > >   major_fragment_id: 1
> > > >   minor_fragment_id: 200
> > > > }
> > > > .
> > >
> > >
> > > > [Error Id: c737dd8b-78e4-40c6-89b0-d53260770b11 on node21:31010]
> > > >         at org.apache.drill.exec.rpc.user.QueryResultHandler.
> > > resultArrived(QueryResultHandler.java:123)
> > [drill-java-exec-1.11.0.jar:1.
> > > 11.0]
> > > >         at org.apache.drill.exec.rpc.user.UserClient.handle(
> > UserClient.java:368)
> > > [drill-java-exec-1.11.0.jar:1.11.0]
> > > >         at org.apache.drill.exec.rpc.user.UserClient.handle(
> > UserClient.java:90)
> > > [drill-java-exec-1.11.0.jar:1.11.0]
> > > >         at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(
> > RpcBus.java:274)
> > > [drill-rpc-1.11.0.jar:1.11.0]
> > > >         at org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(
> > RpcBus.java:244)
> > > [drill-rpc-1.11.0.jar:1.11.0]
> > > >         at io.netty.handler.codec.MessageToMessageDecoder.
> channelRead(
> > > MessageToMessageDecoder.java:89) [netty-codec-4.0.27.Final.jar:
> > > 4.0.27.Final]
> > > >         at io.netty.channel.AbstractChannelHandlerContext.
> > > invokeChannelRead(AbstractChannelHandlerContext.java:339)
> > > [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> > > >         at io.netty.channel.AbstractChannelHandlerContext.
> > > fireChannelRead(AbstractChannelHandlerContext.java:324)
> > > [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> > > >         at io.netty.handler.timeout.IdleStateHandler.channelRead(
> > IdleStateHandler.java:254)
> > > [netty-handler-4.0.27.Final.jar:4.0.27.Final]
> > > >         at io.netty.channel.AbstractChannelHandlerContext.
> > > invokeChannelRead(AbstractChannelHandlerContext.java:339)
> > > [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> > > >         at io.netty.channel.AbstractChannelHandlerContext.
> > > fireChannelRead(AbstractChannelHandlerContext.java:324)
> > > [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> > > >         at io.netty.handler.codec.MessageToMessageDecoder.
> channelRead(
> > > MessageToMessageDecoder.java:103) [netty-codec-4.0.27.Final.jar:
> > > 4.0.27.Final]
> > > >         at io.netty.channel.AbstractChannelHandlerContext.
> > > invokeChannelRead(AbstractChannelHandlerContext.java:339)
> > > [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> > > >         at io.netty.channel.AbstractChannelHandlerContext.
> > > fireChannelRead(AbstractChannelHandlerContext.java:324)
> > > [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> > > >         at io.netty.handler.codec.ByteToMessageDecoder.channelRead(
> > ByteToMessageDecoder.java:242)
> > > [netty-codec-4.0.27.Final.jar:4.0.27.Final]
> > > >         at io.netty.channel.AbstractChannelHandlerContext.
> > > invokeChannelRead(AbstractChannelHandlerContext.java:339)
> > > [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> > > >         at io.netty.channel.AbstractChannelHandlerContext.
> > > fireChannelRead(AbstractChannelHandlerContext.java:324)
> > > [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> > > >         at io.netty.channel.ChannelInboundHandlerAdapter.
> channelRead(
> > > ChannelInboundHandlerAdapter.java:86) [netty-transport-4.0.27.Final.
> > > jar:4.0.27.Final]
> > > >         at io.netty.channel.AbstractChannelHandlerContext.
> > > invokeChannelRead(AbstractChannelHandlerContext.java:339)
> > > [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> > > >         at io.netty.channel.AbstractChannelHandlerContext.
> > > fireChannelRead(AbstractChannelHandlerContext.java:324)
> > > [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> > > >         at io.netty.channel.DefaultChannelPipeline.fireChannelRead(
> > > DefaultChannelPipeline.java:847) [netty-transport-4.0.27.Final.
> > > jar:4.0.27.Final]
> > > >         at io.netty.channel.nio.AbstractNioByteChannel$
> > > NioByteUnsafe.read(AbstractNioByteChannel.java:131)
> > > [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> > > >         at io.netty.channel.nio.NioEventLoop.processSelectedKey(
> > NioEventLoop.java:511)
> > > [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> > > >         at io.netty.channel.nio.NioEventLoop.
> > > processSelectedKeysOptimized(NioEventLoop.java:468)
> > > [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> > > >         at io.netty.channel.nio.NioEventLoop.processSelectedKeys(
> > NioEventLoop.java:382)
> > > [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> > > >         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.
> > java:354)
> > > [netty-transport-4.0.27.Final.jar:4.0.27.Final]
> > > >         at io.netty.util.concurrent.SingleThreadEventExecutor$2.
> > > run(SingleThreadEventExecutor.java:111) [netty-common-4.0.27.Final.
> > > jar:4.0.27.Final]
> > > >         at java.lang.Thread.run(Thread.java:748) [na:1.7.0_141]
> > > > 2017-09-11 15:32:36,406 [Client-1] INFO  o.a.d.j.i.DrillCursor$
> > ResultsListener
> > > - [#5] Query failed:
> > > > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM
> ERROR:
> > > IllegalStateException: Bad magic number = 0a0d0d0a
> > > >
> > > >
> > > >
> > >
> > >
> > >
> > > > 2017/09/12 11:53、Takeo Ogawara <ta-ogawara@kddi-research.jp>のメール:
> > > >
> > > > Thank you for replies.
> > > >
> > > >> Instead of "location": "/mapr/cluster3",   use "location": "/",
> > > > I’ll use this config.
> > > >
> > > >
> > > >> Can you provide the stack trace from the Drillbit that hit the
> > problem?
> > > >
> > > > You can find logs in attached files.
> > > >
> > > >> Is it absolutely required to query large files like this? Would it
> be
> > > >> acceptable to split the file first by making a quick scan over it?
> > > > No,loading large file isn’t necessarily required.
> > > > In fact, this large PCAP file is created by concatenating small PCAP
> > > files with mergecap command.
> > > > So there is no problem with input small PCAP files into Drill.
> > > >
> > > > How can I analyze numbers of PCAP files together?
> > > > Can I concatenate parsed packet records of small PCAP files inside
> > Drill
> > > query?
> > > > Or should I export parsed records into a database and then merge
> them?
> > > >
> > > >
> > > >> 2017/09/12 5:07、Ted Dunning <ted.dunning@gmail.com>のメール:
> > > >>
> > > >> On Mon, Sep 11, 2017 at 11:23 AM, Takeo Ogawara <
> > > ta-ogawara@kddi-research.jp
> > > >>> wrote:
> > > >>
> > > >>> ...
> > > >>>
> > > >>> 1. Query error when cluster-name is not specified
> > > >>> ...
> > > >>>
> > > >>> With this setting, the following query failed.
> > > >>>> select * from mfs.`x.pcap` ;
> > > >>>> Error: DATA_READ ERROR: /x.pcap (No such file or directory)
> > > >>>>
> > > >>>> File name: /x.pcap
> > > >>>> Fragment 0:0
> > > >>>>
> > > >>>> [Error Id: 70b73062-c3ed-4a10-9a88-034b4e6d039a on node21:31010]
> > > >>> (state=,code=0)
> > > >>>
> > > >>> But these queries passed.
> > > >>>> select * from mfs.root.`x.pcap` ;
> > > >>>> select * from mfs.`x.csv`;
> > > >>>> select * from mfs.root.`x.csv`;
> > > >>>
> > > >>
> > > >> As Andries mentioned, the problem here has to do with understanding
> > what
> > > >> Drill is thinking about how paths are manipulated. Nothing to do
> with
> > > the
> > > >> PCAP capabilities.
> > > >>
> > > >> Usually, what I do is put entries into the configuration which
> > directly
> > > >> point to the directory above my data, but I can't add anything
> Andries
> > > >> comment.
> > > >>
> > > >>
> > > >>> 2. Large PCAP file
> > > >>> Query on very large PCAP file (larger than 100GB) failed with
> > following
> > > >>> error message.
> > > >>>> Error: SYSTEM ERROR: IllegalStateException: Bad magic number
=
> > > 0a0d0d0a
> > > >>>>
> > > >>>> Fragment 1:169
> > > >>>>
> > > >>>> [Error Id: 8882c359-c253-40c0-866c-417ef1ce5aa3 on node22:31010]
> > > >>> (state=,code=0)
> > > >>>
> > > >>> This happens even on Linux FS not MapR FS
> > > >>>
> > > >>
> > > >> Can you provide the stack trace from the Drillbit that hit the
> > problem?
> > > >>
> > > >> I suspect that this has to do with splitting of the PCAP file.
> > > Normally, it
> > > >> is assumed that parallelism will be achieved by having lots of
> smaller
> > > >> files since it is difficult to jump into the middle of a PCAP file
> and
> > > get
> > > >> good results.
> > > >>
> > > >> Even if we disable splitting to avoid this error, you will have the
> > > >> complementary problem of slow queries due to single-threading. That
> > > doesn't
> > > >> seem very satisfactory either.
> > > >>
> > > >> A similar problem is that splitting a PCAP file pretty much
> requires a
> > > >> single-threaded read of the file in question. The read doesn't need
> to
> > > >> process very much data, but it does need to touch the whole file.
> > > >>
> > > >> Is it absolutely required to query large files like this? Would it
> be
> > > >> acceptable to split the file first by making a quick scan over it?
> > > >
> > > >
> > > > <sa1153582.zip>
> > >
> > > ———————————————————————
> > >       <KDDI総合研究所 ビジョン>
> > > Challenge for the future 豊かな未来への挑戦
> > > ———————————————————————
> > >             英雄だけの夏。
> > >     https://www.au.com/pr/cm/3taro/
> [https://kddi-h.assetsadobe3.com/is/image/content/dam/au-
> com/pr/cm/3taro/images/og-3t.jpg?scl=1]<https://www.au.com/pr/cm/3taro/>
>
> 【au】三太郎スペシャルページ<https://www.au.com/pr/cm/3taro/>
> www.au.com
> 桃太郎、浦島太郎、金太郎や、かぐや姫、乙姫、鬼ちゃん、織姫が繰り広げる「au三太郎」のCMギャラリーです。最新動画や
...
>
>
>
> > [https://kddi-h.assetsadobe3.com/is/image/content/dam/au-
> > com/pr/cm/3taro/images/og-3t.jpg?scl=1]<https://www.au.com/pr/cm/3taro/>
> >
> > 【au】三太郎スペシャルページ<https://www.au.com/pr/cm/3taro/>
> > www.au.com
> > 桃太郎、浦島太郎、金太郎や、かぐや姫、乙姫、鬼ちゃん、織姫が繰り広げる「au三太郎」のCMギャラリーです。最新動画や
...
> >
> >
> >
> > > ———————————————————————
> > > 小河原 健生(Takeo Ogawara)
> > > (株)KDDI総合研究所
> > > コネクティッドカー1G
> > >
> > > TEL:049-278-7495 / 070-3623-9914
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message