Hivemall has a MixServer, a external Keyvalue store, for exchanging
messages over map tasks.
https://github.com/myui/hivemall/tree/master/src/main/java/hivemall/mix
FYI, Optimus tries to express iteration by rewriting DAGs at runtime.
http://research.microsoft.com/enus/projects/optimus/
http://research.microsoft.com/pubs/185714/Optimus.pptx
On Wed, Mar 25, 2015 at 6:02 PM, Johannes Zillmann
<jzillmann@googlemail.com> wrote:
> Hey Gopal,
>
>> On 25 Mar 2015, at 05:26, Gopal Vijayaraghavan <gopalv@apache.org> wrote:
>>
>> Hi,
>>
>> Iterative algorithms are expressed as DAGs in a loop.
>>
>> The acyclic nature of DAGs, whether in Tez or Spark (since you mention the
>> paper) make that the natural way to implement that  repeated application
>> of the same operation over the same data, with a decision condition
>> determining whether to stay in the loop or not.
>
> Can you point to a piece of code which implements this approach ?
> If you each look operation is a single DAG, how would that avoid hdfs barrier ?
>
> Johannes
>
>>
>> You might want to look at last yearÂ¹s Hadoop Summit presentations for a
>> direct example of Iterative algorithms with Tez.
>>
>> http://www.slideshare.net/Hadoop_Summit/pigontezlowlatencyetlwithbig
>> data/25
>>
>>
>> Logistic regression needs you to use a library which implements that
>> specific algorithm [1].
>>
>> On that note, something which needs incremental iteration can probably be
>> even more efficient in Tez than these approaches if you unroll the
>> iteration as 11 edges all of the final tasks ending up generating outputs.
>>
>> Cheers,
>> Gopal
>> [1]  https://github.com/myui/hivemall#regression
>>
>>
>> On 3/24/15, 8:43 PM, "Chang Chen" <baibaichen@gmail.com> wrote:
>>
>>> Hi
>>>
>>> from the PhD Disseration
>>> <http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS201412.pdf> of
>>> Matei
>>> Zaharia, there are four computation models in the large scale clusters:
>>>
>>>
>>> 1. *Iterative algorithm*, such as graph processing and machine leaning
>>> algorithm
>>> 2. *Relational query*
>>> 3. *MapReduce*, a general parallel computation model
>>> 4. *Stream processing*,
>>>
>>> Obviously, Tez supports #2 and #3, but for #1 and #4, I don't see any
>>> examples.
>>>
>>> As for streaming, I guess if we implement appropriate input, there is no
>>> reason that tez can't support in theory.
>>>
>>> But for Machine Leaning, how do we use vertex and edge to express
>>> *Logistic
>>> Regression*?
>>>
>>> Thanks
>>> Chang
>>
>>
>
