spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Imre Nagi <imre.nagi2...@gmail.com>
Subject Re: transformation - spark vs cassandra
Date Thu, 31 Mar 2016 15:32:41 GMT
I think querying by cassandra query language will be better in terms of
performance if you want to pull and filter the data from your db, rather
than pulling all of the data and do some filtering and transformation by
using spark data frame.


On 31 Mar 2016 22:19, "asethia" <sethia.arun@gmail.com> wrote:

> Hi,
>
> I am working with Cassandra and Spark, would like to know what is best
> performance using Cassandra filter based on primary key and cluster key vs
> using spark data frame transformation/filters.
>
> for example in spark:
>
>  val rdd = sqlContext.read.format("org.apache.spark.sql.cassandra")
>       .options(Map("keyspace" -> "test", "table" -> "test"))
>       .load()
>
> and then rdd.filter("cdate ='2016-06-07'").filter("country='USA'").count()
>
> vs
>
> using Cassandra (where cdate is part of primary key and country as cluster
> key).
>
> SELECT count(*) FROM test WHERE cdate ='2016-06-07' AND country='USA'
>
> I would like to know when should we use Cassandra simple query vs dataframe
> in terms of performance with billion of rows.
>
> Thanks
> arun
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/transformation-spark-vs-cassandra-tp26647.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message