spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: Evaluating spark + Cassandra for our use cases
Date Tue, 18 Aug 2015 20:14:34 GMT
Hi,

First you need to make your SLA clear. It does not sound for me they are
defined very well or that your solution is necessary for the scenario. I
also find it hard to believe that 1 customer has 100Million transactions
per month.

Time series data is easy to precalculate - you do not necessarily need
in-memory technology here.

I recommend your company to do a Proof of Concept and get more
details/clarificarion on the requirements before risking million of dollars
of investment.

Le mar. 18 août 2015 à 21:18, Benjamin Ross <bross@lattice-engines.com> a
écrit :

> My company is interested in building a real-time time-series querying
> solution using Spark and Cassandra.  Specifically, we’re interested in
> setting up a Spark system against Cassandra running a hive thrift server.
> We need to be able to perform real-time queries on time-series data –
> things like, how many accounts have spent in total more than $300 on
> product X in the past 3 months, and purchased product Y in the past month.
>
>
>
> These queries need to be fast – preferably sub-second but we can deal with
> a few seconds if absolutely necessary.  The data sizes are in the millions
> of records when rolled up to be per-monthly records.  Something on the
> order of 100M per customer.
>
>
>
> My question is, based on experience, how hard would it be to get Cassandra
> and Spark working together to give us sub-second response times in this use
> case?  Note that we’ll need to use DataStax enterprise (which is
> unappealing from a cost standpoint) because it’s the only thing that
> provides the hive spark thrift server to Cassandra.
>
>
>
> The two top contenders for our solution are Spark+Cassandra and Druid.
>
>
>
> Neither of these solutions work perfectly out of the box:
>
> -          Druid would need to be modified, possibly hacked, to support
> the queries we require.  I’m also not clear how operationally ready it is.
>
> -          Cassandra and Spark would require paying money for DataStax
> enterprise.  It really feels like it’s going to be tricky to configure
> Cassandra and Spark to be lightning fast for our use case.  Finally, window
> functions (which we need – see above) are not supported unless we use a
> pre-release milestone of the datastax spark Cassandra connector.
>
>
>
> I was wondering if anyone had any thoughts.  How easy is it to get Spark
> and Cassandra down to sub-second speeds in our use case?
>
>
>
> Thanks,
>
> Ben
>

Mime
View raw message