If I can get some more examples of corrupted files I’ll test more thoroughly. Also, we’ll
need to apply the same methodology to PCAP-NG, so I’ll need some examples there as well.
My strategy is going to be get as much data as possible out of the corrupt packet.
— C
> On Feb 10, 2019, at 10:54, Ted Dunning <ted.dunning@gmail.com> wrote:
>
> I think that accessing fields in corrupted packets will also cause
> exceptions. But this is a great start. Conditionalizing field access on
> !is_corrupt() might be sufficient for the next step.
>
>
>
> On Sun, Feb 10, 2019 at 4:58 AM Charles Givre <cgivre@gmail.com> wrote:
>
>> All,
>> I posted the following PR for this issue:
>> https://github.com/apache/drill/pull/1637 <
>> https://github.com/apache/drill/pull/1637>
>>
>> Basically this PR does two things.
>> 1. It creates a boolean column called is_corrupt and
>> 2. If the PCAP file has a corrupt row, it marks that row as corrupt by
>> setting is_corrupt to true and keeps going
>>
>> WIth the example from Giovanni, I was able to find 590 or so corrupt rows
>> out of 7000 in that PCAP file. It was late and I don’t know if that was
>> what ti was supposed to find, but it worked and was able to query that.
>> If you guys could send a few more examples, I’d like to test this on other
>> files to make sure it works with them. We’re also going to have to do the
>> same thing for the PCAP-NG format I would assume.
>>
>>> On Feb 10, 2019, at 03:07, Ted Dunning <ted.dunning@gmail.com> wrote:
>>>
>>> On Sat, Feb 9, 2019 at 2:25 PM Bob Rudis <bob@rud.is> wrote:
>>>
>>>> ...
>>>> And, I did indeed find a few and am just waiting for a formal review so
>> I
>>>> can submit them for the Drill dev & tests.
>>>>
>>>
>>> Awesome!
>>
>>
|