spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Emanuele <>
Subject Re: Spark and geospatial data
Date Thu, 07 Nov 2013 20:32:13 GMT
Hi Andy,

There would be a large architectural design effort if we decided to support
Spark, or replace our current internal actor system with Spark. My thoughts
are that the Spark DAG would be fully utilized in tracking lineage and
scheduling tasks for the Spark backend, while our current Actor system
would route operations using it's own mechanisms. There will have to be a
lot of thought put into where exactly the API would split between the Spark
backend and our own dedicated Actor system backed, and some harmonization
would need to happen; we'd love to incorporate a lot of the great ideas
Spark has for scheduling tasks, but also remain with a situation where
local and high speed use cases did not need to run through unnecessary
machinery, for performance in the small scale. This is all in early stages
of consideration, so any input in design ideas is very welcome!

The aim from the start of a Spark support story would be to implement all
GeoTrellis operations that currently support distribution over tiled
rasters to be supported in the Spark environment, so Map Algebra operations
like Classification would be carried over as a first step. As far as
feature extraction and pyramid generation, these are operations that
GeoTrellis currently does not have (besides basic vectorization
capabilities), as our focus has been more on implementing fast Map Algebra
operations, but these would certainly be great additions to any geospatial
data analysis library.

Thanks for your ideas, and looking forward to your participation.


On Thu, Nov 7, 2013 at 3:05 PM, andy petrella <>wrote:

> Hello Rob,
> As you may know I have a long experience in Geospatial data, and I'm now
> investigating Spark... So I'll be very interested further answers but also
> to participate to going forward on this great idea!
> For instance, I'd say that implementing classical geospatial algorithms
> like classification, feature extraction, pyramid generation and so on would
> be a geo-extension lib to Spark, this would be easier using Geotrellis API.
> My only question, for now, is that Geotrellis has his own notion of
> lineage and Spark as well, so maybe some harmonization work will have to be
> done to serialize and schedule them? Maybe Pickles could help for the
> serialization part...
> Sorry If I miss something (or even said stupidities ^^)... I'm going now
> to the thread you mentioned!
> Looking forward ;)
> Cheers
> andy
> On Thu, Nov 7, 2013 at 8:49 PM, Rob Emanuele <> wrote:
>> Hello,
>> I'm a developer on the GeoTrellis project (
>> We do fast raster processing over large data sets, from web-time
>> (sub-100ms) processing for live endpoints to distributed raster analysis
>> over clusters using Akka clustering.
>> There's currently discussion underway about moving to support a Spark
>> backend for doing large scale distributed raster analysis. You can see the
>> discussion here:
>>!topic/geotrellis-user/wkUOhFwYAvc. Any
>> contributions to the discussion would be welcome.
>> My question to the list is, is there currently any development towards a
>> geospatial data story for Spark, that is, using Spark for large scale
>> raster\vector spatial data analysis? Is there anyone using Spark currently
>> for this sort of work?
>> Thanks,
>> Rob Emanuele

Rob Emanuele, GIS Software Engineer

Azavea |  340 N 12th St, Ste 402, Philadelphia, PA  | T 215.701.7692  | F 215.925.2663
Web <>  |  Blog<>
| Twitter @azavea <>

View raw message