tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Johannes Zillmann <jzillm...@googlemail.com>
Subject Re: Which computation model does Tez supports
Date Wed, 25 Mar 2015 09:02:23 GMT
Hey Gopal,

> On 25 Mar 2015, at 05:26, Gopal Vijayaraghavan <gopalv@apache.org> wrote:
> 
> Hi,
> 
> Iterative algorithms are expressed as DAGs in a loop.
> 
> The acyclic nature of DAGs, whether in Tez or Spark (since you mention the
> paper) make that the natural way to implement that - repeated application
> of the same operation over the same data, with a decision condition
> determining whether to stay in the loop or not.

Can you point to a piece of code which implements this approach ?
If you each look operation is a single DAG, how would that avoid hdfs barrier ?

Johannes

> 
> You might want to look at last year¹s Hadoop Summit presentations for a
> direct example of Iterative algorithms with Tez.
> 
> http://www.slideshare.net/Hadoop_Summit/pig-on-tez-low-latency-etl-with-big
> -data/25
> 
> 
> Logistic regression needs you to use a library which implements that
> specific algorithm [1].
> 
> On that note, something which needs incremental iteration can probably be
> even more efficient in Tez than these approaches if you unroll the
> iteration as 1-1 edges all of the final tasks ending up generating outputs.
> 
> Cheers,
> Gopal
> [1] - https://github.com/myui/hivemall#regression
> 
> 
> On 3/24/15, 8:43 PM, "Chang Chen" <baibaichen@gmail.com> wrote:
> 
>> Hi
>> 
>> from the PhD Disseration
>> <http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-12.pdf> of
>> Matei
>> Zaharia, there are four computation models in the large scale clusters:
>> 
>> 
>>  1. *Iterative algorithm*, such as graph processing and machine leaning
>>  algorithm
>>  2. *Relational query*
>>  3. *MapReduce*, a general parallel computation model
>>  4. *Stream processing*,
>> 
>> Obviously, Tez supports #2 and #3, but for #1 and #4, I don't see any
>> examples.
>> 
>> As for streaming, I guess if we implement appropriate input,  there is no
>> reason that tez can't support in theory.
>> 
>> But for Machine Leaning, how do we use vertex and edge to express
>> *Logistic
>> Regression*?
>> 
>> Thanks
>> Chang
> 
> 


Mime
View raw message