My company uses Lamba to do simple data moving and processing using python scripts. I can see using Spark instead for the data processing would make it into a real production level platform. Does this pave the way into replacing the need of a pre-instantiated cluster in AWS or bought hardware in a datacenter? If so, then this would be a great efficiency and make an easier entry point for Spark usage. I hope the vision is to get rid of all cluster management when using Spark.
SAMBA is an Apache Spark package offering seamless
integration with the
service for Spark batch and streaming applications on the JVM.
Within traditional Spark deployments RDD tasks are executed using fixed
compute resources on worker nodes within the Spark cluster. With SAMBA,
application developers can delegate selected RDD tasks to execute using
on-demand AWS Lambda compute infrastructure in the cloud.
Not unlike the recently released ROSE package that extends the capabilities of traditional Spark applications with support for CRAN R analytics, SAMBA provides another (hopefully) useful extension for Spark application developers on the JVM.
Questions, suggestions, feedback welcome.
"All that is gold does not glitter, Not all those who wander are lost."