tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Siddharth Seth <ss...@apache.org>
Subject Re: Partition input
Date Fri, 31 Jul 2015 16:42:22 GMT
At the moment, using either the OrderedPartitionedKVOutput or
UnorderedPartitionKVOutput along with MROutput (assuming you want the data
on HDFS) is the best way to do this.
There's no variant of MROutput which supports partitioning. If something
like this were to be added - it would need to figure out how to generate
the partitioned files correctly - since each task and output file would end
up with multiple partitions.

On Mon, Jul 27, 2015 at 9:47 AM, Oleg Zhurakousky <
ozhurakousky@hortonworks.com> wrote:

> Guys
>
> I have a simple DAG where I simply want to partition the input data. In
> theory this should not require more then a single Vertex (read splits and
> write them to individual partitions). IN other words a Vertex with
> Datasource and DataSink.
> However, it appears unless I have a vertex sending its output to a
> OrderedPartitionedKVOutput, partitioner is not being called and the output
> goes to a single partition.
>
> Any pointers?
> Cheers
> Oleg
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message