tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tsuyoshi Ozawa <oz...@apache.org>
Subject Re: Which computation model does Tez supports
Date Wed, 25 Mar 2015 09:20:33 GMT
Hivemall has a MixServer, a external Key-value store, for exchanging
messages over map tasks.

https://github.com/myui/hivemall/tree/master/src/main/java/hivemall/mix


FYI, Optimus tries to express iteration by rewriting DAGs at runtime.

http://research.microsoft.com/en-us/projects/optimus/
http://research.microsoft.com/pubs/185714/Optimus.pptx

On Wed, Mar 25, 2015 at 6:02 PM, Johannes Zillmann
<jzillmann@googlemail.com> wrote:
> Hey Gopal,
>
>> On 25 Mar 2015, at 05:26, Gopal Vijayaraghavan <gopalv@apache.org> wrote:
>>
>> Hi,
>>
>> Iterative algorithms are expressed as DAGs in a loop.
>>
>> The acyclic nature of DAGs, whether in Tez or Spark (since you mention the
>> paper) make that the natural way to implement that - repeated application
>> of the same operation over the same data, with a decision condition
>> determining whether to stay in the loop or not.
>
> Can you point to a piece of code which implements this approach ?
> If you each look operation is a single DAG, how would that avoid hdfs barrier ?
>
> Johannes
>
>>
>> You might want to look at last year¹s Hadoop Summit presentations for a
>> direct example of Iterative algorithms with Tez.
>>
>> http://www.slideshare.net/Hadoop_Summit/pig-on-tez-low-latency-etl-with-big
>> -data/25
>>
>>
>> Logistic regression needs you to use a library which implements that
>> specific algorithm [1].
>>
>> On that note, something which needs incremental iteration can probably be
>> even more efficient in Tez than these approaches if you unroll the
>> iteration as 1-1 edges all of the final tasks ending up generating outputs.
>>
>> Cheers,
>> Gopal
>> [1] - https://github.com/myui/hivemall#regression
>>
>>
>> On 3/24/15, 8:43 PM, "Chang Chen" <baibaichen@gmail.com> wrote:
>>
>>> Hi
>>>
>>> from the PhD Disseration
>>> <http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-12.pdf> of
>>> Matei
>>> Zaharia, there are four computation models in the large scale clusters:
>>>
>>>
>>>  1. *Iterative algorithm*, such as graph processing and machine leaning
>>>  algorithm
>>>  2. *Relational query*
>>>  3. *MapReduce*, a general parallel computation model
>>>  4. *Stream processing*,
>>>
>>> Obviously, Tez supports #2 and #3, but for #1 and #4, I don't see any
>>> examples.
>>>
>>> As for streaming, I guess if we implement appropriate input,  there is no
>>> reason that tez can't support in theory.
>>>
>>> But for Machine Leaning, how do we use vertex and edge to express
>>> *Logistic
>>> Regression*?
>>>
>>> Thanks
>>> Chang
>>
>>
>

Mime
View raw message