crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ben Roling (JIRA)" <>
Subject [jira] [Commented] (CRUNCH-663) Expose Record-level File Path to Processing Functions
Date Tue, 30 Jan 2018 20:21:00 GMT


Ben Roling commented on CRUNCH-663:

The attached patch is a quick proof-of-concept.  I wouldn't expect it to be merged directly. 
The patch has a modified WordCount examples that demonstrates leveraging this property. 
I should have just added a unit test, to show it, but haven't done that yet.  If I get feedback
that the general approach is acceptable, I would certainly be happy to add one or more tests.

> Expose Record-level File Path to Processing Functions
> -----------------------------------------------------
>                 Key: CRUNCH-663
>                 URL:
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Ben Roling
>            Assignee: Josh Wills
>            Priority: Major
>         Attachments: CRUNCH-663.patch
> We have some processing pipelines where we want to know the file path that each record
being processed came from.  It would be nice if this could be exposed to the DoFns in our
> This same desire was expressed a little over 1 year ago on the mailing list:
> []
> Unfortunately, that thread dead-ended.
> I will use the comments section and a patch to propose a simple, albeit slightly hacky
solution.  Another alternative would be to create a new Source that provides a PCollection<Pair<Path,
Record>>, but I'm not sure of the effort it would take to create that.

This message was sent by Atlassian JIRA

View raw message