crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <>
Subject Re: Finding Input Split from DoFn
Date Thu, 22 Nov 2012 10:35:46 GMT
getContext() from inside of a DoFn during or after initialize() will return
the TaskInputOutputContext, which will be a MapContext when you call it
from a Mapper, and MapContext has a getInputSplit() method. We don't
normally want a DoFn to worry about whether it's on the map-side or the
reduce-side of a MapReduce job, so we don't indicate the distinction by
default, which means you need to do something like:

if (getContext() instanceof MapContext) {
  InputSplit split = ((MapContext) getContext()).getInputSplit()

which is a little ugly-- sorry about that.


On Thu, Nov 22, 2012 at 1:45 AM, Ashish <> wrote:

> Hi All,
> Is there a way to find the InputSplit from within an implementation of
> DoFn?
> I am trying to implement Inverted Index example using crunch. Have tried
> peeking in DoFn code, but couldn't find a way to retrieve InputSplit. Can
> someone point me in right direction.
> --
> thanks
> ashish
> Blog:
> My Photo Galleries:

Director of Data Science
Cloudera <>
Twitter: @josh_wills <>

View raw message