spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From guoxu1231 <guoxu1...@gmail.com>
Subject Re: Join operation on DStreams
Date Mon, 21 Sep 2015 08:13:57 GMT
Thanks for the prompt reply. 

May I ask why the keyBy(f) is not supported in DStreams? any particular
reason?
or is it possible to add it in future release since that "stream.map(record
=> (keyFunction(record), record))" looks tedious.

I checked the python source code, KeyBy looks like a "shortcut" method. 
maybe people are more familiar with it.

def keyBy(self, f):
        """
        Creates tuples of the elements in this RDD by applying C{f}.
        >>> x = sc.parallelize(range(0,3)).keyBy(lambda x: x*x)
        >>> y = sc.parallelize(zip(range(0,5), range(0,5)))
        >>> [(x, list(map(list, y))) for x, y in
sorted(x.cogroup(y).collect())]
        [(0, [[0], [0]]), (1, [[1], [1]]), (2, [[], [2]]), (3, [[], [3]]),
(4, [[2], [4]])]
        """
        return self.map(lambda x: (f(x), x))



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Join-operation-on-DStreams-tp14228p14232.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Mime
View raw message