spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Re: Running a task over a single input
Date Wed, 28 Jan 2015 10:19:06 GMT
Processing one object isn't a distributed operation, and doesn't
really involve Spark. Just invoke your function on your object in the
driver; there's no magic at all to that.

You can make an RDD of one object and invoke a distributed Spark
operation on it, but assuming you mean you have it on the driver,
that's wasteful. It just copies the object to another machine to
invoke the function.

On Wed, Jan 28, 2015 at 10:14 AM, Matan Safriel <dev.matan@gmail.com> wrote:
> Hi,
>
> How would I run a given function in Spark, over a single input object?
> Would I first add the input to the file system, then somehow invoke the
> Spark function on just that input? or should I rather twist the Spark
> streaming api for it?
>
> Assume I'd like to run a piece of computation that normally runs over a
> large dataset, over just one new added datum. I'm a bit reticent adapting my
> code to Spark without knowing the limits of this scenario.
>
> Many thanks!
> Matan

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message