From user-return-4958-apmail-drill-user-archive=drill.apache.org@drill.apache.org Mon Feb 1 15:27:14 2016 Return-Path: X-Original-To: apmail-drill-user-archive@www.apache.org Delivered-To: apmail-drill-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F005E18511 for ; Mon, 1 Feb 2016 15:27:13 +0000 (UTC) Received: (qmail 97280 invoked by uid 500); 1 Feb 2016 15:27:13 -0000 Delivered-To: apmail-drill-user-archive@drill.apache.org Received: (qmail 97213 invoked by uid 500); 1 Feb 2016 15:27:13 -0000 Mailing-List: contact user-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@drill.apache.org Delivered-To: mailing list user@drill.apache.org Received: (qmail 97201 invoked by uid 99); 1 Feb 2016 15:27:13 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Feb 2016 15:27:13 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id E7A05C0DB3 for ; Mon, 1 Feb 2016 15:27:12 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.901 X-Spam-Level: ** X-Spam-Status: No, score=2.901 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id QiBdbooZOA_S for ; Mon, 1 Feb 2016 15:27:00 +0000 (UTC) Received: from mail-io0-f181.google.com (mail-io0-f181.google.com [209.85.223.181]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id C1B6420532 for ; Mon, 1 Feb 2016 15:26:59 +0000 (UTC) Received: by mail-io0-f181.google.com with SMTP id 9so81861406iom.1 for ; Mon, 01 Feb 2016 07:26:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=OPUG2Z6I5vyJ9nXVAQKL3uBmjzvYIsrU+oHiO41ka/I=; b=i2xPRl+iD/epnjp9q+CtqwczCSTeA+0sobMiUzZVaWqIbvjj4vCK7SQML8Bm7ywZV1 og4BAY8C+c+yOdizii0Jui5PQu00+GMuIZOpWxez7nvzgVN/qHC9IM0aHIYJ1JaDdpTO Bnu8aFOn42V7beywJjYzFwgsSJvmubpe1AsUCfzPEjw76I3ePx/0KG/DBt3LpT3LKbm1 vJJ3ka3mqRhA9VjSMAb5An11B0gV4DsXLjirNUolKRYH7NGwOTYF/9tyb47yQJzkadqL Kck9MQXdYlxuhp4qAzSdmTklK9bd7noGJMXG+3r4iPyMfAMNHj/zzhv0upPwNA8TnPYp eUYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=OPUG2Z6I5vyJ9nXVAQKL3uBmjzvYIsrU+oHiO41ka/I=; b=gSRxVsWsCZMBV5h3knTII8BiwnZ0GbHOjjh18JJh7Hf0Vx0AZ3V4mpJrpHMZWFDg2J O1PDkGnQ8eCOv3VJJCS82yLrE21AeRA8wwMaCG5bEM1TAvf8J8CMSmURxB9AOEOQiRJ1 c62pthpc1wOTS/83gFq+IV/450RoB6atVgu4LCBqkqcM+vvwTRHVL0k+y6Sldu1hy99y sIEQmRvWg+Pc133ir2MySmRxF4lO0JWS8FqqkKbnSihkKL4iH4j3XWk2tPak9yl0D9DU lqZUhrXvVJ0CwjgF+DArLIhCq46jZESLu/woLELrPVWcT1DqT1B6di5JXeLYfUr/H6jd GvJw== X-Gm-Message-State: AG10YOSK/ACDDuw88S9Jiy58NffGy/TtQZ2lAT4uXN6s1ULGaHK3a6X9Z/DOak0rUZSiwDjVHk114dKK1HqoTg== MIME-Version: 1.0 X-Received: by 10.107.9.106 with SMTP id j103mr20918060ioi.104.1454340419186; Mon, 01 Feb 2016 07:26:59 -0800 (PST) Received: by 10.50.240.193 with HTTP; Mon, 1 Feb 2016 07:26:59 -0800 (PST) In-Reply-To: References: Date: Mon, 1 Feb 2016 16:26:59 +0100 Message-ID: Subject: Re: DRILL 1.4 - newline in strings not supported From: Nicolas Paris To: user@drill.apache.org Content-Type: multipart/alternative; boundary=001a113dea3c117b56052ab7049c --001a113dea3c117b56052ab7049c Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hello Abdel, I am creating parquet file from those CSV files. (CREATE TABLE syntax). Basically, I have a text column, with a maximum of 50k characters, containing newlines (the texts come from pdf extracted). I have multimilions tuples of texts. I am subseting texts containing some patterns (LIKE '%foo%' or regex =3D> sadly I haven't found mention about regex in documentation (postgresql "~" operator equivalent)) Usually I used postgresql or monetdb in order to mine the texts, but I am benchmarking/studying apache drill too. Thanks, 2016-02-01 15:54 GMT+01:00 Abdel Hakim Deneche : > Hey Nicolas, > > what kind of queries are you running on your csv file ? > > On Sun, Jan 31, 2016 at 12:14 PM, Nicolas Paris > wrote: > > > Hello, > > > > I am trying to import a csv containing large texts. They contains newli= ne > > character "\n". > > Apache Drill conplains about that. There is a jira issue opened on > > > > > https://www.google.fr/url?sa=3Dt&rct=3Dj&q=3D&esrc=3Ds&source=3Dweb&cd=3D= 2&cad=3Drja&uact=3D8&ved=3D0ahUKEwjUscyr7tTKAhXBVhoKHf0CAjYQFggpMAE&url=3Dh= ttp%3A%2F%2Fmail-archives.apache.org%2Fmod_mbox%2Fdrill-dev%2F201505.mbox%2= F%253CJIRA.12832322.1432356299000.15684.1432356317225%40Atlassian.JIRA%253E= &usg=3DAFQjCNHEwAdEpCBmS1QeuLhdfL8SIdTx6Q&sig2=3D4EM_xXq2QWd8kmC3LT2-Wg > > > > Is there a workaround ? (different that removing \n from texts) > > > > Thanks by advance > > > > > > -- > > Abdelhakim Deneche > > Software Engineer > > > > > Now Available - Free Hadoop On-Demand Training > < > http://www.mapr.com/training?utm_source=3DEmail&utm_medium=3DSignature&ut= m_campaign=3DFree%20available > > > --001a113dea3c117b56052ab7049c--