spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From davidh <>
Subject iPython Notebook + Spark + Accumulo -- best practice?
Date Thu, 19 Mar 2015 00:45:59 GMT
hi all, I've been DDGing, Stack Overflowing, Twittering, RTFMing, and
scanning through this archive with only moderate success. in other words --
my way of saying sorry if this is answered somewhere obvious and I missed it

i've been tasked with figuring out how to connect Notebook, Spark, and
Accumulo together. The end user will do her work via notebook. thus far,
I've successfully setup a Vagrant image containing Spark, Accumulo, and
Hadoop. I was able to use some of the Accumulo example code to create a
table populated with data, create a simple program in scala that, when fired
off to Spark via spark-submit, connects to accumulo and prints the first ten
rows of data in the table. so w00t on that - but now I'm left with more

1) I'm still stuck on what's considered 'best practice' in terms of hooking
all this together. Let's say Sally, a  user, wants to do some analytic work
on her data. She pecks the appropriate commands into notebook and fires them
off. how does this get wired together on the back end? Do I, from notebook,
use spark-submit to send a job to spark and let spark worry about hooking
into accumulo or is it preferable to create some kind of open stream between
the two? 

2) if I want to extend spark's api, do I need to first submit an endless job
via spark-submit that does something like what this gentleman describes
<>  ? is there an
alternative (other than refactoring spark's source) that doesn't involve
extending the api via a job submission? 

ultimately what I'm looking for help locating docs, blogs, etc that may shed
some light on this. 

t/y in advance! 


View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message