Spark is obviously well-suited to crunching massive amounts of data. How about to crunch massive amounts of numbers?
A few years ago I put together a little demo for some co-workers to demonstrate the dangers of using SHA1
to hash and store passwords. Part of the demo included a live brute-forcing of hashes to show how SHA1's speed made it unsuitable for hashing passwords.
I think it would be cool to redo the demo, but utilize the power of a cluster managed by Spark to crunch through hashes even faster.
But how would you do that with Spark (if at all)?
I'm guessing you would create an RDD that somehow defined the search space you're going to go through, and then partition it to divide the work up equally amongst the cluster's cores. Does that sound right?
I wonder if others have already used Spark for computationally-intensive workloads like this, as opposed to just data-intensive ones.