From user-return-10132-apmail-drill-user-archive=drill.apache.org@drill.apache.org Sun Feb 10 17:01:19 2019 Return-Path: X-Original-To: apmail-drill-user-archive@www.apache.org Delivered-To: apmail-drill-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1ECC01856D for ; Sun, 10 Feb 2019 17:01:19 +0000 (UTC) Received: (qmail 9579 invoked by uid 500); 10 Feb 2019 17:01:18 -0000 Delivered-To: apmail-drill-user-archive@drill.apache.org Received: (qmail 9489 invoked by uid 500); 10 Feb 2019 17:01:18 -0000 Mailing-List: contact user-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@drill.apache.org Delivered-To: mailing list user@drill.apache.org Received: (qmail 9472 invoked by uid 99); 10 Feb 2019 17:01:17 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 10 Feb 2019 17:01:17 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 828BC1805DF for ; Sun, 10 Feb 2019 17:01:17 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.599 X-Spam-Level: X-Spam-Status: No, score=0.599 tagged_above=-999 required=6.31 tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id Uk_iZHfj0DXa for ; Sun, 10 Feb 2019 17:01:15 +0000 (UTC) Received: from mail-qt1-f177.google.com (mail-qt1-f177.google.com [209.85.160.177]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 04B8560CDD for ; Sun, 10 Feb 2019 17:01:14 +0000 (UTC) Received: by mail-qt1-f177.google.com with SMTP id o6so9626276qtk.6 for ; Sun, 10 Feb 2019 09:01:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:content-transfer-encoding:mime-version:subject:date:references :to:in-reply-to:message-id; bh=Gyy5l0vH0G7h39dGGxJL+3L8A2TlMBCQ+T+4BnnBnGI=; b=iArdD2QAGoCI7rEMivG/jR2crRP0C43NcpmoWx2/1T2AE5ER78OSIs2rKN2Wba21nP eJG3vLAf0myWs4ObwGHM5Y3QC0hK9MN/otnMzTDXjsaRDI6aOerHrRHlon23kMpuUxvw RTJGC6fyMeEypA6+8Ts2AraoFImZUvOPDoYL5irrARYQkxHZGteElKm1k46iI57om7te 4GyQPt16cHGWtgF6d6d4PKru/GJFLxgJXvwUgASCNCcb76iBqSYZBiPXNoo4PMa2rHD3 nPZPhjCttIAcZV/aXa3NhGx8ikKFzkAREwIqhuPeru2gEMYjZ2s/AHnaJdwFWspoDaqS ELIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:content-transfer-encoding:mime-version :subject:date:references:to:in-reply-to:message-id; bh=Gyy5l0vH0G7h39dGGxJL+3L8A2TlMBCQ+T+4BnnBnGI=; b=LfoivKEbuxwQK8xRrGlugFlJt82/DkJjCA8NYpZ0GSDrGZ4GVOPrjreZu43XZdCTdF +HRiWA+SUCtI0HGsQNW5N0UShCEshrWloT+DNePr9wfasUdYFmr7P0uB9sOVNh8iHod1 pEjn5weW3ncox1e7oVB7XpNrhXvG1ET5Qb5AMjqcqpS0QgavcpX9dBeKTw68W07sCHsN l1pLBkSn/YHcD85n1Yq40TQvLkhjkt+B8yjj7h1OScUNbSPHoA4B3GL/R3RvveKty4il LGzDR4x1rhr99cJphf+usgZolN9o+YzTrb9+3AHBLWYYIKgxNw95LcTNz5UijaGFkIe8 ldPw== X-Gm-Message-State: AHQUAuaXxUJpyUNkmZmOL5SKzErfXN9uH90HdyV3OiiqQHRW0D9LAoUm 0/n8hYqb7b9isPyl0obYJzOPmLEd X-Google-Smtp-Source: AHgI3IYrRJGkiuB/LD6wiN9t0qbYYbWAYuCt5YRP+JDePKSSiOErFW1Beet6g1mjmj1KHCkRu8JPFQ== X-Received: by 2002:ac8:43d5:: with SMTP id w21mr22566460qtn.98.1549818067363; Sun, 10 Feb 2019 09:01:07 -0800 (PST) Received: from [192.168.1.21] (pool-108-22-224-35.bltmmd.east.verizon.net. [108.22.224.35]) by smtp.gmail.com with ESMTPSA id g25sm7655382qki.29.2019.02.10.09.01.05 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 10 Feb 2019 09:01:05 -0800 (PST) From: Charles Givre Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\)) Subject: Re: Drill fails to query pcap files Date: Sun, 10 Feb 2019 12:01:04 -0500 References: <1002D02D-FC78-456F-9F91-E1FA763AD2D6@gmail.com> <58C84879-B414-4FFC-91BA-0A43224C9FD8@gmail.com> <0DEBCA2A-EB3E-48B2-BF6D-1EB821880FBC@gmail.com> <977B672B-F3F6-464C-B231-59F7C777CB21@gmail.com> <9514D86E-5B72-4D7B-9B77-DCD4A99EC177@gmail.com> <9C723473-44F7-403A-91FA-A30D0700CDB6@gmail.com> <2114B4E5-918D-46D4-BB0C-2D0E3A73C60B@gmail.com> To: user@drill.apache.org In-Reply-To: <2114B4E5-918D-46D4-BB0C-2D0E3A73C60B@gmail.com> Message-Id: X-Mailer: Apple Mail (2.3445.102.3) Actually, some good news here=E2=80=A6 =20 I ran some test queries on the corrupted file and it seemed to work = pretty well. I didn=E2=80=99t get any exceptions! jdbc:drill:zk=3Dlocal> select src_ip, COUNT(*) as packet_count from = dfs.test.`testv1.pcap`WHERE is_corrupt=3D1 GROUP BY src_ip ORDER BY = packet_count DESC . . . . . . .semicolon> LIMIT 10; +-----------------------------------------+---------------+ | src_ip | packet_count | +-----------------------------------------+---------------+ | 150.249.255.161 | 176 | | 150.249.255.24 | 28 | | 131.38.3.15 | 26 | | 111.248.196.128 | 25 | | 202.13.230.242 | 20 | | 163.28.217.199 | 19 | | 27.18.36.151 | 18 | | 2001:320f:c2ed:8693:1dff:f8f8:500:f1ed | 17 | | 203.70.190.81 | 16 | | 203.70.182.104 | 13 | +-----------------------------------------+---------------+ 10 rows selected (0.944 seconds) select src_ip, dst_ip from dfs.test.`testv1.pcap`WHERE is_corrupt=3D1 = LIMIT 10; +------------------+------------------+ | src_ip | dst_ip | +------------------+------------------+ | 118.233.244.60 | 150.249.255.161 | | 150.249.255.161 | 165.63.110.188 | | 150.249.255.161 | 165.63.110.188 | | 172.40.96.180 | 131.39.133.22 | | 150.249.255.161 | 165.63.110.188 | | 150.249.255.161 | 165.63.110.188 | | 150.249.255.161 | 165.63.110.188 | | 150.249.255.161 | 165.63.110.188 | | 150.249.162.60 | 180.32.119.25 | | 150.249.255.161 | 165.63.110.188 | +------------------+------------------+ 10 rows selected (1.031 seconds) 0: jdbc:drill:zk=3Dlocal> SELECT src_port , dst_port , src_mac_address = , dst_mac_address . . . . . . .semicolon> FROM dfs.test.`testv1.pcap` . . . . . . .semicolon> WHERE is_corrupt =3D1 LIMIT 10; +-----------+-----------+--------------------+--------------------+ | src_port | dst_port | src_mac_address | dst_mac_address | +-----------+-----------+--------------------+--------------------+ | 57058 | 443 | 00:0C:DB:1F:72:41 | 88:E0:F3:7A:66:F0 | | 80 | 20706 | 00:0C:DB:1F:72:41 | 00:12:E2:C0:3F:09 | | 80 | 20706 | 00:0C:DB:1F:72:41 | 00:12:E2:C0:3F:09 | | 443 | 55972 | 00:0C:DB:1F:72:41 | CC:4E:24:1F:4E:00 | | 80 | 20706 | 00:0C:DB:1F:72:41 | 00:12:E2:C0:3F:09 | | 80 | 20706 | 00:0C:DB:1F:72:41 | 00:12:E2:C0:3F:09 | | 80 | 20706 | 00:0C:DB:1F:72:41 | 00:12:E2:C0:3F:09 | | 80 | 20706 | 00:0C:DB:1F:72:41 | 00:12:E2:C0:3F:09 | | 4016 | 7699 | 00:0C:DB:1F:72:41 | 00:12:E2:C0:3F:09 | | 80 | 20706 | 00:0C:DB:1F:72:41 | 00:12:E2:C0:3F:09 | +-----------+-----------+--------------------+--------------------+ 10 rows selected (0.751 seconds) SELECT getCountryName(src_ip) AS country, COUNT(*) as packet_count FROM = dfs.test.`testv1.pcap` WHERE is_corrupt=3D1 GROUP BY = getCountryName(src_ip) ORDER BY packet_count DESC LIMIT 10; +----------------+---------------+ | country | packet_count | +----------------+---------------+ | Japan | 269 | | Taiwan | 124 | | United States | 105 | | Unknown | 49 | | China | 26 | | South Korea | 8 | | Australia | 4 | | Germany | 3 | | Hong Kong | 2 | | Italy | 1 | +----------------+---------------+ 10 rows selected (1.519 seconds) SELECT is_corrupt, COUNT(*) as packet_count FROM dfs.test.`testv1.pcap` = GROUP BY is_corrupt; +-------------+---------------+ | is_corrupt | packet_count | +-------------+---------------+ | 0 | 6408 | | 1 | 592 | +-------------+---------------+ 2 rows selected (0.931 seconds) This PCAP file worked well with Superset also.=20 > On Feb 10, 2019, at 10:59, Charles Givre wrote: >=20 > If I can get some more examples of corrupted files I=E2=80=99ll test = more thoroughly. Also, we=E2=80=99ll need to apply the same methodology = to PCAP-NG, so I=E2=80=99ll need some examples there as well. My = strategy is going to be get as much data as possible out of the corrupt = packet.=20 > =E2=80=94 C >=20 >=20 >=20 >> On Feb 10, 2019, at 10:54, Ted Dunning wrote: >>=20 >> I think that accessing fields in corrupted packets will also cause >> exceptions. But this is a great start. Conditionalizing field access = on >> !is_corrupt() might be sufficient for the next step. >>=20 >>=20 >>=20 >> On Sun, Feb 10, 2019 at 4:58 AM Charles Givre = wrote: >>=20 >>> All, >>> I posted the following PR for this issue: >>> https://github.com/apache/drill/pull/1637 < >>> https://github.com/apache/drill/pull/1637> >>>=20 >>> Basically this PR does two things. >>> 1. It creates a boolean column called is_corrupt and >>> 2. If the PCAP file has a corrupt row, it marks that row as corrupt = by >>> setting is_corrupt to true and keeps going >>>=20 >>> WIth the example from Giovanni, I was able to find 590 or so corrupt = rows >>> out of 7000 in that PCAP file. It was late and I don=E2=80=99t know = if that was >>> what ti was supposed to find, but it worked and was able to query = that. >>> If you guys could send a few more examples, I=E2=80=99d like to test = this on other >>> files to make sure it works with them. We=E2=80=99re also going to = have to do the >>> same thing for the PCAP-NG format I would assume. >>>=20 >>>> On Feb 10, 2019, at 03:07, Ted Dunning = wrote: >>>>=20 >>>> On Sat, Feb 9, 2019 at 2:25 PM Bob Rudis wrote: >>>>=20 >>>>> ... >>>>> And, I did indeed find a few and am just waiting for a formal = review so >>> I >>>>> can submit them for the Drill dev & tests. >>>>>=20 >>>>=20 >>>> Awesome! >>>=20 >>>=20 >=20