spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Helena Edelson <helena.edel...@datastax.com>
Subject Re: Accessing Cassandra with SparkSQL, Does not work?
Date Fri, 31 Oct 2014 18:05:50 GMT
Hi Shahab,
The apache cassandra version looks great.

I think that doing
  cc.setKeyspace("mydb")
  cc.sql("SELECT * FROM mytable")

versus  
  cc.setKeyspace("mydb")
  cc.sql("select * from mydb.mytable ") 

Is the problem? And if not, would you mind creating a ticket off-list for us to help further?
You can create one here:
https://github.com/datastax/spark-cassandra-connector/issues
with tag: help wanted :)

Cheers,

- Helena
@helenaedelson

On Oct 31, 2014, at 1:59 PM, shahab <shahab.mokari@gmail.com> wrote:

> Thanks Helena.
> I tried setting the "KeySpace", but I got same result. I also removed other Cassandra
dependencies,  but still same exception!
> I also tried to see if this setting appears in the CassandraSQLContext or not, so I printed
out the output of configustion
> 
> val cc = new CassandraSQLContext(sc)
> 
>     cc.setKeyspace("mydb")
> 
>     cc.conf.getAll.foreach(f => println (f._1  + " : " +  f._2))
> 
> printout: 
> 
> spark.tachyonStore.folderName : spark-ec8ecb6a-1485-4d39-a93c-6f91711804a2
> 
> spark.driver.host :192.168.1.111
> 
> spark.cassandra.connection.host : localhost
> 
> spark.cassandra.input.split.size : 10000
> 
> spark.app.name : SomethingElse
> 
> spark.fileserver.uri :  http://192.168.1.111:51463
> 
> spark.driver.port : 51461
> 
> 
> spark.master :  local
> 
> 
> Does it have anything to do with the version of Apache Cassandra that I use?? I use "apache-cassandra-2.1.0"
> 
> 
> best,
> /Shahab
> 
> The shortened SBT :
> "com.datastax.spark" %% "spark-cassandra-connector" % "1.1.0-beta1" withSources() withJavadoc(),
> 
>     "net.jpountz.lz4" % "lz4" % "1.2.0",
> 
>     "org.apache.spark" %% "spark-core" % "1.1.0" % "provided" exclude("org.apache.hadoop",
"hadoop-core"),
> 
>     "org.apache.spark" %% "spark-streaming" % "1.1.0" % "provided",
> 
>     "org.apache.hadoop" % "hadoop-client" % "1.0.4" % "provided",
> 
>     "com.github.nscala-time" %% "nscala-time" % "1.0.0",
> 
>     "org.scalatest" %% "scalatest" % "1.9.1" % "test",
> 
>     "org.apache.spark" %% "spark-sql" % "1.1.0" %  "provided",
> 
>     "org.apache.spark" %% "spark-hive" % "1.1.0" % "provided",
> 
>     "org.json4s" %% "json4s-jackson" % "3.2.5",
> 
>     "junit" % "junit" % "4.8.1" % "test",
> 
>     "org.slf4j" % "slf4j-api" % "1.7.7",
> 
>     "org.slf4j" % "slf4j-simple" % "1.7.7",
> 
>     "org.clapper" %% "grizzled-slf4j" % "1.0.2",
> 
>     "log4j" % "log4j" % "1.2.17"
> 
> 
> On Fri, Oct 31, 2014 at 6:42 PM, Helena Edelson <helena.edelson@datastax.com> wrote:
> Hi Shahab,
> 
> I’m just curious, are you explicitly needing to use thrift? Just using the connector
with spark does not require any thrift dependencies.
> Simply: "com.datastax.spark" %% "spark-cassandra-connector" % "1.1.0-beta1”
> 
> But to your question, you declare the keyspace but also unnecessarily repeat the keyspace.table
in your select.
> Try this instead:
> 
> val cc = new CassandraSQLContext(sc)
>     cc.setKeyspace(“keyspaceName")
>     val result = cc.sql("SELECT * FROM tableName”) etc
> 
> - Helena
> @helenaedelson
> 
> On Oct 31, 2014, at 1:25 PM, shahab <shahab.mokari@gmail.com> wrote:
> 
>> Hi,
>> 
>> I am using the latest Cassandra-Spark Connector  to access Cassandra tables form
Spark. While I successfully managed to connect Cassandra using CassandraRDD, the similar SparkSQL
approach does not work. Here is my code for both methods:
>> 
>> import com.datastax.spark.connector._
>> 
>> import org.apache.spark.{SparkConf, SparkContext}
>> 
>> import org.apache.spark.sql._;
>> 
>> import org.apache.spark.SparkContext._
>> 
>> import org.apache.spark.sql.catalyst.expressions._
>> 
>> import com.datastax.spark.connector.cql.CassandraConnector
>> 
>> import org.apache.spark.sql.cassandra.CassandraSQLContext
>> 
>> 
>> 
>>   val conf = new SparkConf().setAppName("SomethingElse")  
>> 
>>    .setMaster("local")
>> 
>>     .set("spark.cassandra.connection.host", "localhost")
>> 
>>     val sc: SparkContext = new SparkContext(conf)
>> 
>> 
>> val rdd = sc.cassandraTable("mydb", "mytable")  // this works
>> 
>> But:
>> 
>> val cc = new CassandraSQLContext(sc)
>> 
>>      cc.setKeyspace("mydb")
>> 
>>      val srdd: SchemaRDD = cc.sql("select * from mydb.mytable ") 	 
>> 
>> 
>>     println ("count : " +  srdd.count) // does not work
>> 
>> Exception is thrown:
>> 
>> Exception in thread "main" com.google.common.util.concurrent.UncheckedExecutionException:
java.util.NoSuchElementException: key not found: mydb3.inverseeventtype
>> 
>> at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2201)
>> 
>> at com.google.common.cache.LocalCache.get(LocalCache.java:3934)
>> 
>> 
>> at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3938)
>> 
>> ....
>> 
>> 
>> 
>> in fact mydb3 is anothery keyspace which I did not tried even to connect to it !
 
>> 
>> 
>> 
>> Any idea?
>> 
>> 
>> 
>> best,
>> 
>> /Shahab
>> 
>> 
>> 
>> Here is how my SBT looks like: 
>> 
>> libraryDependencies ++= Seq(
>> 
>>     "com.datastax.spark" %% "spark-cassandra-connector" % "1.1.0-beta1" withSources()
withJavadoc(),
>> 
>>     "org.apache.cassandra" % "cassandra-all" % "2.0.9" intransitive(),
>> 
>>     "org.apache.cassandra" % "cassandra-thrift" % "2.0.9" intransitive(),
>> 
>>     "net.jpountz.lz4" % "lz4" % "1.2.0",
>> 
>>     "org.apache.thrift" % "libthrift" % "0.9.1" exclude("org.slf4j", "slf4j-api")
exclude("javax.servlet", "servlet-api"),
>> 
>>     "com.datastax.cassandra" % "cassandra-driver-core" % "2.0.4" intransitive(),
>> 
>>     "org.apache.spark" %% "spark-core" % "1.1.0" % "provided" exclude("org.apache.hadoop",
"hadoop-core"),
>> 
>>     "org.apache.spark" %% "spark-streaming" % "1.1.0" % "provided",
>> 
>>     "org.apache.hadoop" % "hadoop-client" % "1.0.4" % "provided",
>> 
>>     "com.github.nscala-time" %% "nscala-time" % "1.0.0",
>> 
>>     "org.scalatest" %% "scalatest" % "1.9.1" % "test",
>> 
>>     "org.apache.spark" %% "spark-sql" % "1.1.0" %  "provided",
>> 
>>     "org.apache.spark" %% "spark-hive" % "1.1.0" % "provided",
>> 
>>     "org.json4s" %% "json4s-jackson" % "3.2.5",
>> 
>>     "junit" % "junit" % "4.8.1" % "test",
>> 
>>     "org.slf4j" % "slf4j-api" % "1.7.7",
>> 
>>     "org.slf4j" % "slf4j-simple" % "1.7.7",
>> 
>>     "org.clapper" %% "grizzled-slf4j" % "1.0.2",
>> 
>> 
>>     "log4j" % "log4j" % "1.2.17")
>> 
> 
> 


Mime
View raw message