spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Craig Vanderborgh <craigvanderbo...@gmail.com>
Subject Spark Streaming on a Cluster
Date Tue, 03 Dec 2013 21:16:41 GMT
Hi All,

I am working toward running some of our Spark Streaming jobs on a cluster.
 However, I have not seen documentation on best practices for this.  Here
and there I have found some lore though:

1. Keeping task latency low is paramount.  Spark master has lower task
latency than Mesos, but "local" is the best.

2. It is possible to configure range partitioning so that ranges of keys
for incoming events are sent to the same node for processing.  This allows
Spark Streaming to perform parallel computation using multiple nodes.

Here's what I need:  What is the best way to configure a Spark Streaming
job to use range partitioning, a la #2 above?  I need the details:  what
has to be changed in the job's source code, whether to use "spark" master,
etc.

Thanks in advance,
Craig Vanderborgh

Mime
View raw message