crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gabriel Reid (JIRA)" <>
Subject [jira] [Commented] (CRUNCH-167) Sort.sortTuples and related methods write out duplicate values
Date Fri, 22 Feb 2013 16:06:12 GMT


Gabriel Reid commented on CRUNCH-167:

Sounds very cool -- I'm going to try to take a closer look at this this weekend. It would
be great if we could take advantage of this to get CRUNCH-51 taken care of
> Sort.sortTuples and related methods write out duplicate values
> --------------------------------------------------------------
>                 Key: CRUNCH-167
>                 URL:
>             Project: Crunch
>          Issue Type: Bug
>    Affects Versions: 0.4.0, 0.5.0
>            Reporter: Josh Wills
>             Fix For: 0.6.0
>         Attachments: CRUNCH-167.patch
> I noticed when I was debugging CRUNCH-166 that the strategy that the Sort.sortPairs,
sortTrips, etc. methods are using has the potential to write out duplicate values in cases
where we are only sorting/grouping on a subset of the fields, because all of the records that
have the same value for those sub-fields will be called as part of the same reduce() call,
where only a single one of the records that had the same set of values for those sub-fields
will be used as the key, and the rest of the values will have been thrown away.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message