crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <>
Subject [jira] [Updated] (CRUNCH-268) Crunch's internal Avro tuple schemas should have stable names
Date Fri, 20 Sep 2013 22:12:54 GMT


Josh Wills updated CRUNCH-268:

    Attachment: CRUNCH-268.patch

The patch I came up with that uses MD5 hashes to keep the tuple names consistent, unique,
and relatively short.
> Crunch's internal Avro tuple schemas should have stable names
> -------------------------------------------------------------
>                 Key: CRUNCH-268
>                 URL:
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core, IO
>    Affects Versions: 0.7.0
>            Reporter: Josh Wills
>            Assignee: Josh Wills
>         Attachments: CRUNCH-268.patch
> A long time ago, I made a change that used random names for the custom Avro schemas that
Crunch generates for processing tuple types (pairs, trips, etc.). I recently hit a use case
where that randomization burned me when I was re-running some pipelines over checkpointed
data that I serialized using Crunch's Avro schemas (Pair, in particular), so I think that
we should change the tuple schemas to have stable names based on their constituent field schemas
via an MD5 hash.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message