spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From shahab <shahab.mok...@gmail.com>
Subject Re: Accessing Cassandra with SparkSQL, Does not work?
Date Fri, 31 Oct 2014 17:59:25 GMT
Thanks Helena.
I tried setting the "KeySpace", but I got same result. I also removed other
Cassandra dependencies,  but still same exception!
I also tried to see if this setting appears in the CassandraSQLContext or
not, so I printed out the output of configustion

val cc = new CassandraSQLContext(sc)

    cc.setKeyspace("mydb")

    cc.conf.getAll.foreach(f => println (f._1  + " : " +  f._2))

printout:

spark.tachyonStore.folderName : spark-ec8ecb6a-1485-4d39-a93c-6f91711804a2

spark.driver.host :192.168.1.111

spark.cassandra.connection.host : localhost

spark.cassandra.input.split.size : 10000

spark.app.name : SomethingElse

spark.fileserver.uri :  http://192.168.1.111:51463

spark.driver.port : 51461

spark.master :  local

Does it have anything to do with the version of Apache Cassandra that I
use?? I use "apache-cassandra-2.1.0"


best,
/Shahab

The shortened SBT :

"com.datastax.spark" %% "spark-cassandra-connector" % "1.1.0-beta1"
withSources() withJavadoc(),

    "net.jpountz.lz4" % "lz4" % "1.2.0",

    "org.apache.spark" %% "spark-core" % "1.1.0" % "provided"
exclude("org.apache.hadoop", "hadoop-core"),

    "org.apache.spark" %% "spark-streaming" % "1.1.0" % "provided",

    "org.apache.hadoop" % "hadoop-client" % "1.0.4" % "provided",

    "com.github.nscala-time" %% "nscala-time" % "1.0.0",

    "org.scalatest" %% "scalatest" % "1.9.1" % "test",

    "org.apache.spark" %% "spark-sql" % "1.1.0" %  "provided",

    "org.apache.spark" %% "spark-hive" % "1.1.0" % "provided",

    "org.json4s" %% "json4s-jackson" % "3.2.5",

    "junit" % "junit" % "4.8.1" % "test",

    "org.slf4j" % "slf4j-api" % "1.7.7",

    "org.slf4j" % "slf4j-simple" % "1.7.7",

    "org.clapper" %% "grizzled-slf4j" % "1.0.2",

    "log4j" % "log4j" % "1.2.17"

On Fri, Oct 31, 2014 at 6:42 PM, Helena Edelson <helena.edelson@datastax.com
> wrote:

> Hi Shahab,
>
> I’m just curious, are you explicitly needing to use thrift? Just using the
> connector with spark does not require any thrift dependencies.
> Simply: "com.datastax.spark" %% "spark-cassandra-connector" %
> "1.1.0-beta1”
>
> But to your question, you declare the keyspace but also unnecessarily
> repeat the keyspace.table in your select.
> Try this instead:
>
> val cc = new CassandraSQLContext(sc)
>     cc.setKeyspace(“keyspaceName")
>     val result = cc.sql("SELECT * FROM tableName”) etc
>
> - Helena
> @helenaedelson
>
> On Oct 31, 2014, at 1:25 PM, shahab <shahab.mokari@gmail.com> wrote:
>
> Hi,
>
> I am using the latest Cassandra-Spark Connector  to access Cassandra
> tables form Spark. While I successfully managed to connect Cassandra using
> CassandraRDD, the similar SparkSQL approach does not work. Here is my code
> for both methods:
>
> import com.datastax.spark.connector._
>
> import org.apache.spark.{SparkConf, SparkContext}
>
> import org.apache.spark.sql._;
>
> import org.apache.spark.SparkContext._
>
> import org.apache.spark.sql.catalyst.expressions._
>
> import com.datastax.spark.connector.cql.CassandraConnector
>
> import org.apache.spark.sql.cassandra.CassandraSQLContext
>
>
>   val conf = new SparkConf().setAppName("SomethingElse")
>
>    .setMaster("local")
>
>     .set("spark.cassandra.connection.host", "localhost")
>
>     val sc: SparkContext = new SparkContext(conf)
>
>  val rdd = sc.cassandraTable("mydb", "mytable")  // this works
>
> But:
>
> val cc = new CassandraSQLContext(sc)
>
>      cc.setKeyspace("mydb")
>
>      val srdd: SchemaRDD = cc.sql("select * from mydb.mytable ")
>
>     println ("count : " +  srdd.count) // does not work
>
> Exception is thrown:
>
> Exception in thread "main"
> com.google.common.util.concurrent.UncheckedExecutionException:
> java.util.NoSuchElementException: key not found: mydb3.inverseeventtype
>
> at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2201)
>
> at com.google.common.cache.LocalCache.get(LocalCache.java:3934)
>
> at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3938)
>
> ....
>
>
> in fact mydb3 is anothery keyspace which I did not tried even to connect
> to it !
>
>
> Any idea?
>
>
> best,
>
> /Shahab
>
>
> Here is how my SBT looks like:
>
> libraryDependencies ++= Seq(
>
>     "com.datastax.spark" %% "spark-cassandra-connector" % "1.1.0-beta1"
> withSources() withJavadoc(),
>
>     "org.apache.cassandra" % "cassandra-all" % "2.0.9" intransitive(),
>
>     "org.apache.cassandra" % "cassandra-thrift" % "2.0.9" intransitive(),
>
>     "net.jpountz.lz4" % "lz4" % "1.2.0",
>
>     "org.apache.thrift" % "libthrift" % "0.9.1" exclude("org.slf4j",
> "slf4j-api") exclude("javax.servlet", "servlet-api"),
>
>     "com.datastax.cassandra" % "cassandra-driver-core" % "2.0.4"
> intransitive(),
>
>     "org.apache.spark" %% "spark-core" % "1.1.0" % "provided"
> exclude("org.apache.hadoop", "hadoop-core"),
>
>     "org.apache.spark" %% "spark-streaming" % "1.1.0" % "provided",
>
>     "org.apache.hadoop" % "hadoop-client" % "1.0.4" % "provided",
>
>     "com.github.nscala-time" %% "nscala-time" % "1.0.0",
>
>     "org.scalatest" %% "scalatest" % "1.9.1" % "test",
>
>     "org.apache.spark" %% "spark-sql" % "1.1.0" %  "provided",
>
>     "org.apache.spark" %% "spark-hive" % "1.1.0" % "provided",
>
>     "org.json4s" %% "json4s-jackson" % "3.2.5",
>
>     "junit" % "junit" % "4.8.1" % "test",
>
>     "org.slf4j" % "slf4j-api" % "1.7.7",
>
>     "org.slf4j" % "slf4j-simple" % "1.7.7",
>
>     "org.clapper" %% "grizzled-slf4j" % "1.0.2",
>
>     "log4j" % "log4j" % "1.2.17")
>
>
>

Mime
View raw message