From user-return-245-apmail-crunch-user-archive=crunch.apache.org@crunch.apache.org Fri Apr 5 15:49:24 2013 Return-Path: X-Original-To: apmail-crunch-user-archive@www.apache.org Delivered-To: apmail-crunch-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EE38C1007D for ; Fri, 5 Apr 2013 15:49:24 +0000 (UTC) Received: (qmail 96238 invoked by uid 500); 5 Apr 2013 15:49:24 -0000 Delivered-To: apmail-crunch-user-archive@crunch.apache.org Received: (qmail 96209 invoked by uid 500); 5 Apr 2013 15:49:24 -0000 Mailing-List: contact user-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@crunch.apache.org Delivered-To: mailing list user@crunch.apache.org Received: (qmail 96201 invoked by uid 99); 5 Apr 2013 15:49:24 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Apr 2013 15:49:24 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of gabriel.reid@gmail.com designates 209.85.212.171 as permitted sender) Received: from [209.85.212.171] (HELO mail-wi0-f171.google.com) (209.85.212.171) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Apr 2013 15:49:20 +0000 Received: by mail-wi0-f171.google.com with SMTP id hn17so733089wib.10 for ; Fri, 05 Apr 2013 08:48:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=yb/s6Nak2c1T82yPTTAVy8tyLSnXw0NEYo5CzZ7Wro8=; b=Fjub2qHwqz6C8sIBR14G8OsQ+E4DgBXRitR14hoodng4YsAEvN0GKufkTowpCxVF3E nRnY7Uq4yo+Y19aMy9iphuZMPrgwTQcGxAysaq4GeMGSEp1YBiDxCywMvnFrxK8vEFQb WmN+N4OIB9ZTSiFOoXGjmjpEf6GcQNV1c3wWQDMy8dcPz5EKwJvZczdAevREr6FlPVXW M3uQa6fI0pP3QgEIu+ffG+uaL5a2fxtcGOjMAdJeHGXLv2VnhT0X/eBza2xHOUQZHCXl xiW2II53rPlVggooRdmYUo0xcPAB0/BD6wfK1c8eh76UXOYtn6NdFs6OzS/AygqB6xe9 Gd5A== MIME-Version: 1.0 X-Received: by 10.180.13.197 with SMTP id j5mr4941905wic.21.1365176935913; Fri, 05 Apr 2013 08:48:55 -0700 (PDT) Received: by 10.194.46.105 with HTTP; Fri, 5 Apr 2013 08:48:55 -0700 (PDT) In-Reply-To: References: Date: Fri, 5 Apr 2013 17:48:55 +0200 Message-ID: Subject: Re: PGroupedTable.combineValues question From: Gabriel Reid To: user@crunch.apache.org Content-Type: multipart/alternative; boundary=001a11c215ca521ea704d99f06aa X-Virus-Checked: Checked by ClamAV on apache.org --001a11c215ca521ea704d99f06aa Content-Type: text/plain; charset=ISO-8859-1 Hi Dave, On Fri, Apr 5, 2013 at 5:05 PM, Dave Beech wrote: > > I have a PGroupedTable and I want to aggregate / combine the values > to produce a PCollection - in other words, I need the type of the > aggregate to be different to the original value type. > > What's the best approach? The combineValues method takes either an > Aggregator or a CombineFn but as far as I can see, both of these assume the > end result will be of the same type as the values. > > The approach that I always use for this is just creating a custom DoFn to operate on the PGroupedTable and construct the instance of type C based in the incoming Iterable fromt he PGroupedTable. This basically works out to the same as a Aggregator. I don't think that this scenario would be technically applicable to a CombineFn, because the CombineFn can be called any number of times on an incoming set of values, on both the map and reduce sides of a job. In order to map values to another type, the intermediate value of type C would somehow need to be given to the CombineFn each time it was used. - Gabriel --001a11c215ca521ea704d99f06aa Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hi Dave,


On Fri, Apr 5, 2013 at 5:05 PM, Dave Beech <dave@paral= iatech.com> wrote:
I have a PGroupedTable<A,B> and I want to aggre= gate / combine the values to produce a PCollection<C> - in other word= s, I need the type of the aggregate to be different to the original value t= ype.=A0

What's the best approach? The combineValues method = takes either an Aggregator or a CombineFn but as far as I can see, both of = these assume the end result will be of the same type as the values.=A0


The approach th= at I always use for this is just creating a custom DoFn to operate on the P= GroupedTable and construct the instance of type C based in the incoming Ite= rable fromt he PGroupedTable. This basically works out to the same as a Agg= regator.

I don't think that this scenario would = be technically applicable to a CombineFn, because the CombineFn can be call= ed any number of times on an incoming set of values, on both the map and re= duce sides of a job. In order to map values to another type, the intermedia= te value of type C would somehow need to be given to the CombineFn each tim= e it was used.

- Gabriel

--001a11c215ca521ea704d99f06aa--