drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Takeo Ogawara <ta-ogaw...@kddi-research.jp>
Subject ***UNCHECKED*** Re: Query Error on PCAP over MapR FS
Date Tue, 12 Sep 2017 02:53:58 GMT
Thank you for replies.

> Instead of "location": "/mapr/cluster3",   use "location": "/",
I’ll use this config.


> Can you provide the stack trace from the Drillbit that hit the problem?

You can find logs in attached files.

> Is it absolutely required to query large files like this? Would it be
> acceptable to split the file first by making a quick scan over it?
No,loading large file isn’t necessarily required.
In fact, this large PCAP file is created by concatenating small PCAP files with mergecap command.
So there is no problem with input small PCAP files into Drill.

How can I analyze numbers of PCAP files together?
Can I concatenate parsed packet records of small PCAP files inside Drill query?
Or should I export parsed records into a database and then merge them?


> 2017/09/12 5:07、Ted Dunning <ted.dunning@gmail.com>のメール:
> 
> On Mon, Sep 11, 2017 at 11:23 AM, Takeo Ogawara <ta-ogawara@kddi-research.jp
>> wrote:
> 
>> ...
>> 
>> 1. Query error when cluster-name is not specified
>> ...
>> 
>> With this setting, the following query failed.
>>> select * from mfs.`x.pcap` ;
>>> Error: DATA_READ ERROR: /x.pcap (No such file or directory)
>>> 
>>> File name: /x.pcap
>>> Fragment 0:0
>>> 
>>> [Error Id: 70b73062-c3ed-4a10-9a88-034b4e6d039a on node21:31010]
>> (state=,code=0)
>> 
>> But these queries passed.
>>> select * from mfs.root.`x.pcap` ;
>>> select * from mfs.`x.csv`;
>>> select * from mfs.root.`x.csv`;
>> 
> 
> As Andries mentioned, the problem here has to do with understanding what
> Drill is thinking about how paths are manipulated. Nothing to do with the
> PCAP capabilities.
> 
> Usually, what I do is put entries into the configuration which directly
> point to the directory above my data, but I can't add anything Andries
> comment.
> 
> 
>> 2. Large PCAP file
>> Query on very large PCAP file (larger than 100GB) failed with following
>> error message.
>>> Error: SYSTEM ERROR: IllegalStateException: Bad magic number = 0a0d0d0a
>>> 
>>> Fragment 1:169
>>> 
>>> [Error Id: 8882c359-c253-40c0-866c-417ef1ce5aa3 on node22:31010]
>> (state=,code=0)
>> 
>> This happens even on Linux FS not MapR FS
>> 
> 
> Can you provide the stack trace from the Drillbit that hit the problem?
> 
> I suspect that this has to do with splitting of the PCAP file. Normally, it
> is assumed that parallelism will be achieved by having lots of smaller
> files since it is difficult to jump into the middle of a PCAP file and get
> good results.
> 
> Even if we disable splitting to avoid this error, you will have the
> complementary problem of slow queries due to single-threading. That doesn't
> seem very satisfactory either.
> 
> A similar problem is that splitting a PCAP file pretty much requires a
> single-threaded read of the file in question. The read doesn't need to
> process very much data, but it does need to touch the whole file.
> 
> Is it absolutely required to query large files like this? Would it be
> acceptable to split the file first by making a quick scan over it?



Mime
View raw message