spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tian Zhang <tzhang...@yahoo.com>
Subject spark 1.1.0 RDD and Calliope 1.1.0-CTP-U2-H2
Date Tue, 21 Oct 2014 23:33:31 GMT
Hi, I am using the latest calliope library from tuplejump.com to create RDD
for cassandra table.
I am on a 3 nodes spark 1.1.0 with yarn.

My cassandra table is defined as below and I have about 2000 rows of data
inserted.
CREATE TABLE top_shows (
  program_id varchar,
  view_minute timestamp,
  view_count counter,
  PRIMARY KEY (view_minute, program_id)   //note that view_minute is the
partition key
);

Here are the simple steps I ran from spark-shell on master node

spark-shell --master yarn-client --jars
rna/rna-spark-streaming-assembly-1.0-SNAPSHOT.jar --driver-memory 512m 
--executor-memory 512m --num-executors 3 --executor-cores 1

// Import the necessary 
import org.apache.spark.rdd.RDD
import org.apache.spark.SparkContext
import com.tuplejump.calliope.utils.RichByteBuffer._
import com.tuplejump.calliope.Implicits._
import com.tuplejump.calliope.CasBuilder
import com.tuplejump.calliope.Types.{CQLRowKeyMap, CQLRowMap}

// Define my class and the implicit cast 
case class ProgramViewCount(viewMinute:Long, program:String, viewCount:Long)
implicit def keyValtoProgramViewCount(key:CQLRowKeyMap,
values:CQLRowMap):ProgramViewCount =
   ProgramViewCount(key.get("view_minute").get.getLong,
key.get("program_id").toString, values.get("view_count").get.getLong)

// Use the cql3 interface to read from table with WHERE predicate.
val cas = CasBuilder.cql3.withColumnFamily("streaming_qa",
"top_shows").onHost("23.22.120.96")
.where("view_minute = 1413861780000")
val allPrograms = sc.cql3Cassandra[ProgramViewCount](cas)

// Lazy  evaluation till this point
val rowCount = allPrograms.count

I hit the following exception. It seems that it does not like my where
clause. If I do not have the 
WHERE CLAUSE, it works fine. But with the WHERE CLAUSE, no matter the
predicate is on 
partition key or not, it will fail with the following exception.

Anyone else using calliope package can share some lights? Thanks a lot.

Tian

scala> val rowCount = allPrograms.count
....
14/10/21 23:26:07 WARN scheduler.TaskSetManager: Lost task 2.0 in stage 0.0
(TID 2, ip-10-187-51-136.ec2.internal): java.lang.RuntimeException: 
       
com.tuplejump.calliope.hadoop.cql3.CqlPagingRecordReader$RowIterator.executeQuery(CqlPagingRecordReader.java:665)
       
com.tuplejump.calliope.hadoop.cql3.CqlPagingRecordReader$RowIterator.<init>(CqlPagingRecordReader.java:301)
       
com.tuplejump.calliope.hadoop.cql3.CqlPagingRecordReader.initialize(CqlPagingRecordReader.java:167)
       
com.tuplejump.calliope.cql3.Cql3CassandraRDD$$anon$1.<init>(Cql3CassandraRDD.scala:75)
       
com.tuplejump.calliope.cql3.Cql3CassandraRDD.compute(Cql3CassandraRDD.scala:64)
        org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
        org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
        org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
        org.apache.spark.scheduler.Task.run(Task.scala:54)
       
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
       
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
       
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        java.lang.Thread.run(Thread.java:745)






--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-1-1-0-RDD-and-Calliope-1-1-0-CTP-U2-H2-tp16975.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message