crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-663) Expose Record-level File Path to Processing Functions
Date Wed, 31 Jan 2018 05:43:00 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16346286#comment-16346286
] 

Josh Wills commented on CRUNCH-663:
-----------------------------------

So I like this; it's backwards-compatible with existing APIs, but does something that is most
certainly useful for a relatively small fraction of pipelines. I'm +1 and will be happy to
commit the patch to master if no one has any objections in the next day or so.

> Expose Record-level File Path to Processing Functions
> -----------------------------------------------------
>
>                 Key: CRUNCH-663
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-663
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Ben Roling
>            Assignee: Josh Wills
>            Priority: Major
>         Attachments: CRUNCH-663.patch
>
>
> We have some processing pipelines where we want to know the file path that each record
being processed came from.  It would be nice if this could be exposed to the DoFns in our
pipelines.
>  
> This same desire was expressed a little over 1 year ago on the mailing list:
> [http://mail-archives.apache.org/mod_mbox/crunch-user/201611.mbox/%3CCAG-tO+Y42KRFiocg1RJT4qFcyvkPjFSfZa4z=wk34AriP4weTw@mail.gmail.com%3E]
>  
> Unfortunately, that thread dead-ended.
>  
> I will use the comments section and a patch to propose a simple, albeit slightly hacky
solution.  Another alternative would be to create a new Source that provides a PCollection<Pair<Path,
Record>>, but I'm not sure of the effort it would take to create that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message