tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gopal Vijayaraghavan <gop...@apache.org>
Subject Re: Partition input
Date Fri, 31 Jul 2015 19:44:32 GMT

There is a way around this, because the data doesn¹t move by a Tez edge,
there¹s no reason to actually use an edge partitioner.


But that¹s the terminal Vertex case.

If you want to send data via HDFS between vertices, you still need an HCFS
Edge, which sends data via a filesystem & URI locations via events.


On 7/31/15, 9:42 AM, "Siddharth Seth" <sseth@apache.org> wrote:

>At the moment, using either the OrderedPartitionedKVOutput or
>UnorderedPartitionKVOutput along with MROutput (assuming you want the data
>on HDFS) is the best way to do this.
>There's no variant of MROutput which supports partitioning. If something
>like this were to be added - it would need to figure out how to generate
>the partitioned files correctly - since each task and output file would
>up with multiple partitions.
>On Mon, Jul 27, 2015 at 9:47 AM, Oleg Zhurakousky <
>ozhurakousky@hortonworks.com> wrote:
>> Guys
>> I have a simple DAG where I simply want to partition the input data. In
>> theory this should not require more then a single Vertex (read splits
>> write them to individual partitions). IN other words a Vertex with
>> Datasource and DataSink.
>> However, it appears unless I have a vertex sending its output to a
>> OrderedPartitionedKVOutput, partitioner is not being called and the
>> goes to a single partition.
>> Any pointers?
>> Cheers
>> Oleg

View raw message