From user-return-10144-apmail-drill-user-archive=drill.apache.org@drill.apache.org Tue Feb 12 18:19:52 2019 Return-Path: X-Original-To: apmail-drill-user-archive@www.apache.org Delivered-To: apmail-drill-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 89E5518199 for ; Tue, 12 Feb 2019 18:19:52 +0000 (UTC) Received: (qmail 81992 invoked by uid 500); 12 Feb 2019 18:19:52 -0000 Delivered-To: apmail-drill-user-archive@drill.apache.org Received: (qmail 81905 invoked by uid 500); 12 Feb 2019 18:19:52 -0000 Mailing-List: contact user-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@drill.apache.org Delivered-To: mailing list user@drill.apache.org Received: (qmail 81891 invoked by uid 99); 12 Feb 2019 18:19:51 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Feb 2019 18:19:51 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 16D70C1F2B for ; Tue, 12 Feb 2019 18:19:51 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.799 X-Spam-Level: * X-Spam-Status: No, score=1.799 tagged_above=-999 required=6.31 tests=[DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id J28CjTIR_81G for ; Tue, 12 Feb 2019 18:19:49 +0000 (UTC) Received: from mail-lf1-f53.google.com (mail-lf1-f53.google.com [209.85.167.53]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 8645A5F4A3 for ; Tue, 12 Feb 2019 18:19:48 +0000 (UTC) Received: by mail-lf1-f53.google.com with SMTP id t14so2762513lfk.7 for ; Tue, 12 Feb 2019 10:19:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=kj4eTgasW2uqM5ZBiChqaUcFeib2ByiGI5m80SZJiG8=; b=k5pfiYowNRiAUvlbKZkNAaKuOyeeHMvjybyjaG7V98DHc8hD31czOB9I2E6j/TDwgk PG0Y0NFMuHLWghftBA+wGdV7717/Cm3EusqfJbhZFgEyBh0JTdnn+dTpGR8CYpBFDqgw spaEkyi0fpI8IEooRehO7D4s3/Q40d705YojotTKHPv0semI3sbW/TZKAyW1TZ2M6BCJ 1uthv9WgW0qQC/R9021X8dTS/6CHbbcfJgM+MDWrZDcTZpdPfnvTzZ7Lr0femPBjEEMy G0KSyIt5GwHFP5QqB+wiKfPkAsp0IewG3V7C1KUTJfLroEpP23mZUhZnnw1sLcmz6FSN Mb9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=kj4eTgasW2uqM5ZBiChqaUcFeib2ByiGI5m80SZJiG8=; b=HrM7lU+9HRe/+JzX2RxFjPfBH20erOQJwp/9anC2Vz5isg8+qbMIa/5VZm1yvYGzEu N4g6Tdnj4iMvNtLSNMBrYsZiAyrshXPr+UbJgQfORMSDv72fqpltvAtZL7hf1+xFhenH VR+MNPg22iFhxGLTxJTillqgNgepFXxeyrARZycBPKbz0c5of2r0gyPSvpKfswBdpPZx ArHO/DY6ZX5NKkAoA8QgSzVwEI5kvMzUoJjEjxJREjogr9F9bjrtRuw/YkoxOJXB1awS HFH6m41LXO7EG8NX8hQdPXs/6EncdW2q3SlMpWGEJP71+f66plpCSb3Z4bhq6/un7yuo dzrA== X-Gm-Message-State: AHQUAuaz/Tou1TEQyApEIaoqJKy/As2SgpRGVYCG59M1r79wke457Bfv 0Dn+jZeMSpQ1tvajyB+C3nAHAuFAj4OPI5SWQj0LYg== X-Google-Smtp-Source: AHgI3IYFXpjCli2+BMfl3qREC986ngRU/BMpePGirNQY4h9jDOT2JtivZmyrM2ID87j2gQQnZ+hQlBlu6kq50O1rZu0= X-Received: by 2002:ac2:43c6:: with SMTP id u6mr251250lfl.151.1549995586604; Tue, 12 Feb 2019 10:19:46 -0800 (PST) MIME-Version: 1.0 References: <93589e0d-5928-4371-b88c-75ce02dde54f@getmailbird.com> <77eb9a33-5b9e-4937-a8c0-9d394fb1708c@getmailbird.com> <0013d3a6-ad65-498f-8c44-03b5a44ebecb@getmailbird.com> In-Reply-To: <0013d3a6-ad65-498f-8c44-03b5a44ebecb@getmailbird.com> From: Sorabh Hamirwasia Date: Tue, 12 Feb 2019 10:19:35 -0800 Message-ID: Subject: Re: Performace issue To: user@drill.apache.org Content-Type: multipart/alternative; boundary="00000000000057d2270581b678ba" --00000000000057d2270581b678ba Content-Type: text/plain; charset="UTF-8" Hi Praveen, Can you also share what is the schema of your entire dataset and in what format it's stored? Thanks, Sorabh On Tue, Feb 12, 2019 at 10:02 AM Kunal Khatua wrote: > You'll need to edit the memory settings in DRILL_HOME/conf/drill-env.sh > I suspect that your 5MB JSON data might be having a lot of objects, which > need to be serialized in memory. > > FLATTEN has the problem that it replicates the data parent data for each > child node that is being flattened into a row... so the resulting data > being constructed in memory can grow significantly. > One way to work around (not elegant, but worth trying) would be to > generate intermediate flatten data and write temporary (if not using WebUI) > tables and keep flattening out those records until you have a fully > flattened dataset to work with directly. > > On 2/11/2019 10:37:58 PM, PRAVEEN DEVERACHETTY > wrote: > Thnks a lot Kunal. I am looking into that. I have one observation. > > With out flatten also, i tried to run a query of size 5MB, it is taking 5GB > of heap? how do i control heap? Are there any settings i can modify. i am > reading a lot, but nothing is working for me. It would be helpful how to > control heap, i modified memory parameters based on the documentation, it > is not working yet. it would be really helpful if i get some help in this > regard. Thanks in advance. > > Regards > Praveen > > On Tue, Feb 12, 2019 at 11:18 AM Kunal Khatua wrote: > > > This is a good starting point for understanding LATERAL-UNNEST and how it > > compares to the FLATTEN operator. > > > > https://drill.apache.org/docs/lateral-join/ > > > > > > On 2/11/2019 9:03:42 PM, PRAVEEN DEVERACHETTY wrote: > > Thanks Kunal. > > i am not getting how to use lateral-unrest as dataset does not have child > > rows. All data is in array of json objects(as mentioned below). There are > > two json objects separated by comma and enclosed in squre bracket. > > > > > [{"Location":"100","FirstName":"test1"},{"Location":"100","FirstName":"test2"},{"Location":"101","FirstName":"test3"}] > > > > We are using drill from Java. Through a rest invocation. Not using json > > files. All data is sent over post as string. We are using convert_from > > function in the query to convert into json objects. As we are sending > array > > of json objects, using FLATTEN operator to convert into multiple rows. is > > there any way to avoid Flatten, as we see huge spike for 54MB data, going > > to 24GB and still failing with heap error. not sure what is wrong. Can i > > use FLATTEN on the entire data set? There are almost 54K records that is > > getting FLATTENED. > > > > example query: 1)first converted into array of json objects 2) flatten to > > convert into multiple rows > > select ems.* from (select flatten(t.jdata) as record from (select > > > > > convert_from('[{"Location":"100","FirstName":"test1"},{"Location":"100","FirstName":"test2"},{"Location":"101","FirstName":"test3"}..]') > > as jdata) as t) ems > > > > > > On Sat, Feb 9, 2019 at 1:37 AM Kunal Khatua wrote: > > > > > The memory (heap) would climb as it tries to flatten the JSON data. > Have > > > you tried looking at Drill's LateralJoin-Unnest feature? It was meant > to > > > address memory issues for some use cases of the FLATTEN operator. > > > > > > On 2/8/2019 5:17:01 AM, PRAVEEN DEVERACHETTY wrote: > > > I am running a query with UNION ALL. as below > > > > > > select > > > from ( select FLATTEN(t.jdata) as record from > > > ((select convert_from(json string, json) union all > > > (select conver_from(json_string,json) union all > > > ... > > > ) as jdata) ) as t) ems > > > > > > Reason for giving union all is because we are invoking a call using > rest > > > app, there is limitation of 20,000 when we use convert_from function. > Our > > > heap size is 8GB, server is 8core. From profiling, it shows this > > perticula > > > query spikes from 100MB to 8GB continuously. is there anything i am > > > doing wrong?. > > > > > > Thanks, > > > Prveen > > > > > > --00000000000057d2270581b678ba--