From dev-return-2274-apmail-crunch-dev-archive=crunch.apache.org@crunch.apache.org Sun Mar 24 17:13:15 2013 Return-Path: X-Original-To: apmail-crunch-dev-archive@www.apache.org Delivered-To: apmail-crunch-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5F7F3F0E8 for ; Sun, 24 Mar 2013 17:13:15 +0000 (UTC) Received: (qmail 72070 invoked by uid 500); 24 Mar 2013 17:13:15 -0000 Delivered-To: apmail-crunch-dev-archive@crunch.apache.org Received: (qmail 72046 invoked by uid 500); 24 Mar 2013 17:13:15 -0000 Mailing-List: contact dev-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@crunch.apache.org Delivered-To: mailing list dev@crunch.apache.org Received: (qmail 72036 invoked by uid 500); 24 Mar 2013 17:13:15 -0000 Delivered-To: apmail-incubator-crunch-dev@incubator.apache.org Received: (qmail 72033 invoked by uid 99); 24 Mar 2013 17:13:15 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 24 Mar 2013 17:13:15 +0000 Date: Sun, 24 Mar 2013 17:13:15 +0000 (UTC) From: "Josh Wills (JIRA)" To: crunch-dev@incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CRUNCH-183) Reservoir sampling functions don't take object reuse into account MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/CRUNCH-183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13612154#comment-13612154 ] Josh Wills commented on CRUNCH-183: ----------------------------------- I am the worst about that-- thanks Gabriel. +1. > Reservoir sampling functions don't take object reuse into account > ----------------------------------------------------------------- > > Key: CRUNCH-183 > URL: https://issues.apache.org/jira/browse/CRUNCH-183 > Project: Crunch > Issue Type: Bug > Reporter: Gabriel Reid > Assignee: Gabriel Reid > Attachments: CRUNCH-183.patch > > > ReservoirSampleFn and WRSCombineFn in o.a.c.lib.SampleUtils both hold onto references of processed values, but don't make deep copies of them. For complex objects such as Avro objects, this leads to incorrect results, with the same value being returned for all samples. > This can be resolved by making use of PType#getDetachedValue before storing a reference to the object. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira