spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From pouryas <>
Subject Spark Cassandra Connector Issue and performance
Date Wed, 24 Sep 2014 14:10:43 GMT
Hey all

I tried spark connector with Cassandra and I ran into a problem that I was
blocked on for couple of weeks. I managed to find a solution to the problem
but I am not sure whether it was a bug of the connector/spark or not. 

I had three tables in Cassandra (Running Cassandra on 5 node cluster) and a
large Spark cluster (5 worker node with each having 32 cores and 240G

When I ran my job which extracts data from S3 and writes to 3 tables in
Cassandra using around 1TB of memory and 160 cores, sometimes my job get
stuck at last few task of a stage...

After playing around for a while I realised that reducing number of cores to
2 per machine (10 Total) made the job stable. I gradually increased the
number of cores and it hanged again once I had about 50 cores total.

I would like to know if anyone else experienced this and if this is

On another note I would like to know if people seeing good performance
reading from cassandra using spark as oppose to reading data from HDFS. Kind
of an open question but would like to see how others are using it.

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message