Hi David,

My company uses Lamba to do simple data moving and processing using python scripts. I can see using Spark instead for the data processing would make it into a real production level platform. Does this pave the way into replacing the need of a pre-instantiated cluster in AWS or bought hardware in a datacenter? If so, then this would be a great efficiency and make an easier entry point for Spark usage. I hope the vision is to get rid of all cluster management when using Spark.


On Feb 1, 2016, at 4:23 AM, David Russell <themarchoffolly@gmail.com> wrote:

Hi all,

Just sharing news of the release of a newly available Spark package, SAMBA.


SAMBA is an Apache Spark package offering seamless integration with the AWS Lambda compute service for Spark batch and streaming applications on the JVM.

Within traditional Spark deployments RDD tasks are executed using fixed compute resources on worker nodes within the Spark cluster. With SAMBA, application developers can delegate selected RDD tasks to execute using on-demand AWS Lambda compute infrastructure in the cloud.

Not unlike the recently released ROSE package that extends the capabilities of traditional Spark applications with support for CRAN R analytics, SAMBA provides another (hopefully) useful extension for Spark application developers on the JVM.
Questions, suggestions, feedback welcome.


"All that is gold does not glitter, Not all those who wander are lost."