tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gopal Vijayaraghavan <gop...@apache.org>
Subject Re: Which computation model does Tez supports
Date Wed, 25 Mar 2015 04:26:18 GMT
Hi,

Iterative algorithms are expressed as DAGs in a loop.

The acyclic nature of DAGs, whether in Tez or Spark (since you mention the
paper) make that the natural way to implement that - repeated application
of the same operation over the same data, with a decision condition
determining whether to stay in the loop or not.

You might want to look at last year¹s Hadoop Summit presentations for a
direct example of Iterative algorithms with Tez.

http://www.slideshare.net/Hadoop_Summit/pig-on-tez-low-latency-etl-with-big
-data/25


Logistic regression needs you to use a library which implements that
specific algorithm [1].

On that note, something which needs incremental iteration can probably be
even more efficient in Tez than these approaches if you unroll the
iteration as 1-1 edges all of the final tasks ending up generating outputs.

Cheers,
Gopal
[1] - https://github.com/myui/hivemall#regression


On 3/24/15, 8:43 PM, "Chang Chen" <baibaichen@gmail.com> wrote:

>Hi
>
>from the PhD Disseration
><http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-12.pdf> of
>Matei
>Zaharia, there are four computation models in the large scale clusters:
>
>
>   1. *Iterative algorithm*, such as graph processing and machine leaning
>   algorithm
>   2. *Relational query*
>   3. *MapReduce*, a general parallel computation model
>   4. *Stream processing*,
>
>Obviously, Tez supports #2 and #3, but for #1 and #4, I don't see any
>examples.
>
>As for streaming, I guess if we implement appropriate input,  there is no
>reason that tez can't support in theory.
>
>But for Machine Leaning, how do we use vertex and edge to express
>*Logistic
>Regression*?
>
>Thanks
>Chang



Mime
View raw message