crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ben Roling (JIRA)" <>
Subject [jira] [Commented] (CRUNCH-663) Expose Record-level File Path to Processing Functions
Date Wed, 31 Jan 2018 14:29:00 GMT


Ben Roling commented on CRUNCH-663:

Thanks for the feedback [~jwills].  I'll upload a new patch to replace my modifications to
the WordCount example with a proper unit test.

> Expose Record-level File Path to Processing Functions
> -----------------------------------------------------
>                 Key: CRUNCH-663
>                 URL:
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Ben Roling
>            Assignee: Josh Wills
>            Priority: Major
>         Attachments: CRUNCH-663.patch
> We have some processing pipelines where we want to know the file path that each record
being processed came from.  It would be nice if this could be exposed to the DoFns in our
> This same desire was expressed a little over 1 year ago on the mailing list:
> []
> Unfortunately, that thread dead-ended.
> I will use the comments section and a patch to propose a simple, albeit slightly hacky
solution.  Another alternative would be to create a new Source that provides a PCollection<Pair<Path,
Record>>, but I'm not sure of the effort it would take to create that.

This message was sent by Atlassian JIRA

View raw message