From dev-return-1679-apmail-calcite-dev-archive=calcite.apache.org@calcite.incubator.apache.org Sat Aug 22 04:10:16 2015 Return-Path: X-Original-To: apmail-calcite-dev-archive@www.apache.org Delivered-To: apmail-calcite-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4085A18227 for ; Sat, 22 Aug 2015 04:10:16 +0000 (UTC) Received: (qmail 83807 invoked by uid 500); 22 Aug 2015 04:10:16 -0000 Delivered-To: apmail-calcite-dev-archive@calcite.apache.org Received: (qmail 83740 invoked by uid 500); 22 Aug 2015 04:10:16 -0000 Mailing-List: contact dev-help@calcite.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@calcite.incubator.apache.org Delivered-To: mailing list dev@calcite.incubator.apache.org Received: (qmail 83729 invoked by uid 99); 22 Aug 2015 04:10:15 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 22 Aug 2015 04:10:15 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 53298DFF0D for ; Sat, 22 Aug 2015 04:10:15 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.001 X-Spam-Level: X-Spam-Status: No, score=-0.001 tagged_above=-999 required=6.31 tests=[HEADER_FROM_DIFFERENT_DOMAINS=0.001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id AwH3b0XYDWSx for ; Sat, 22 Aug 2015 04:10:14 +0000 (UTC) Received: from mail-pa0-f47.google.com (mail-pa0-f47.google.com [209.85.220.47]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 6F2AD20634 for ; Sat, 22 Aug 2015 04:10:14 +0000 (UTC) Received: by pawq9 with SMTP id q9so65256540paw.3 for ; Fri, 21 Aug 2015 21:10:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to; bh=0T/W/D/qUmC2OBJwGGoK5Q61ikVV6Zqqd02+bxVm564=; b=BseEyq1WuJ5dkMX0iK9B57nH3yRb/10KgzcIBDghyiTdFrodgQ1pJxa4O6oDEXDW89 ILVCWvfKaqatlXxNOGsdOZCyFu08Y6H8QOJ6uZOScv/U1AKAntYpbLj4zdAr1JrVTshL kr4YMfKUzNAjjxH2enodr3YC3yB6hshLJl3qrlsXTAaOwH2O9LLzlb/fJ1QKXUQC1cdv RFJD5xMYbsMW7mVV7HH6TPm8kp6XpF7baVAQcntagP0hJv5f+C4aWZqlP0s7WtXMc1rv HpshS9DqqTM+De4G/1cnQdbDwK1WFV3AZ3LcQ8ERXV9EF6Nn43al0AzW+cGA/DqZz/Kg xGEg== X-Received: by 10.68.69.107 with SMTP id d11mr23756251pbu.45.1440216608607; Fri, 21 Aug 2015 21:10:08 -0700 (PDT) Received: from [192.168.2.200] (c-50-184-110-23.hsd1.ca.comcast.net. [50.184.110.23]) by smtp.gmail.com with ESMTPSA id gw3sm9430912pbc.46.2015.08.21.21.10.07 for (version=TLSv1/SSLv3 cipher=OTHER); Fri, 21 Aug 2015 21:10:07 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 8.2 \(2104\)) Subject: Re: Enumerable groupBy() take advantage of input collation? From: Julian Hyde In-Reply-To: Date: Fri, 21 Aug 2015 21:10:07 -0700 Content-Transfer-Encoding: 7bit Message-Id: References: <5C840423-C260-47ED-810D-DC113CCFEB97@apache.org> To: dev@calcite.incubator.apache.org X-Mailer: Apple Mail (2.2104) Thanks! > On Aug 21, 2015, at 4:01 PM, Li Yang wrote: > > https://issues.apache.org/jira/browse/CALCITE-853 > > On Fri, Aug 21, 2015 at 2:20 PM, Julian Hyde wrote: > >> Yes, that would be useful. Please log a jira. >> >> Enumerable.groupBy doesn't know its input's collation so can't make that >> decision, but EnumerableAggregate does. I think that EnumerableAggregate >> should have a "trigger key", a subset of its group key, and if the trigger >> key changes it will emit and flush its hash table. >> >> As well as for your use case, it will be useful for streaming queries. >> >> Julian >> >>> On Aug 20, 2015, at 2:35 AM, Li Yang wrote: >>> >>> I encountered Out Of Mem exception when a huge result set is passed into >>> EnumerableAggregate and get aggregated in memory. I'm thinking if the >> input >>> is sorted by the group-by key, then the groupBy() don't have to hold all >>> data in memory any more. >>> >>> So does the Enumerable groupBy() take advantage of input collation >>> currently? Should I open a JIRA for it? >>> >>> >>> Cheers >>> Yang >> >>