spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alec Ten Harmsel <a...@alectenharmsel.com>
Subject Re: complexity of each action / transformation
Date Fri, 17 Oct 2014 19:35:57 GMT

On 10/17/2014 02:08 PM, ll wrote:
> hello... is there a list that shows the complexity of each
> action/transformation?  for example, what is the complexity of
> RDD.map()/filter() or RowMatrix.multiply() etc?  that would be really
> helpful.
>
> thanks!

I'm pretty new to Spark, so I only know about the RDD functions. The
complexity of map() and filter() depend on the function that you pass to
them, but it is also more than that. If you view the RDD as a single
array in memory, any operation (such as map and filter) is linear, or
O(n). If you split the RDD between h hosts, the complexity now becomes
O(n/h).

Now, adding the function that you pass. Say, for example, that you run a
function 'f' with complexity O(f). The new, final complexity is O(n/h *
O(f)). If 'f' is a call to String.split() and each String in the RDD has
'l' characters, it has (I think) complexity O(l). So the complexity of
calling:

    myRdd.map(line => line.split(','))

is something along the lines of O(n/h * l). The complexity of a filter()
call should be found the same way.

The complexity is mostly dependent on the complexity of the function
being passed to map()/filter()/whatever, which is where the power of
Spark comes in. According to O(n/h * O(f)), the runtime linearly
increases with RDD size (makes sense) and linearly *decreases* with the
number of hosts, which is why anyone uses Spark.

Hope this helps, and explains why the complexities of various functions
are not posted; they change based on the parameter passed.

Alec




Mime
View raw message