spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Ross <br...@Lattice-Engines.com>
Subject RE: Is there any external dependencies for lag() and lead() when using data frames?
Date Tue, 11 Aug 2015 14:21:41 GMT
I forgot to mention, my setup was:

-          Spark 1.4.1 running in standalone mode

-          Datastax spark cassandra connector 1.4.0-M1

-          Cassandra DB

-          Scala version 2.10.4


From: Benjamin Ross
Sent: Tuesday, August 11, 2015 10:16 AM
To: Jerry; Michael Armbrust
Cc: user
Subject: RE: Is there any external dependencies for lag() and lead() when using data frames?

Jerry,
I was able to use window functions without the hive thrift server.  HiveContext does not imply
that you need the hive thrift server running.

Here’s what I used to test this out:
    var conf = new SparkConf(true).set("spark.cassandra.connection.host", "127.0.0.1")

    val sc = new SparkContext(conf)
    val sqlContext = new HiveContext(sc)
    val df = sqlContext
      .read
      .format("org.apache.spark.sql.cassandra")
      .options(Map( "table" -> "kv", "keyspace" -> "test"))
      .load()
    val w = Window.orderBy("value").rowsBetween(-2, 0)


I then submitted this using spark-submit.



From: Jerry [mailto:jerry.comp@gmail.com]
Sent: Monday, August 10, 2015 10:55 PM
To: Michael Armbrust
Cc: user
Subject: Re: Is there any external dependencies for lag() and lead() when using data frames?

By the way, if Hive is present in the Spark install, does show up in text when you start the
spark shell? Any commands I can run to check if it exists? I didn't setup the spark machine
that I use, so I don't know what's present or absent.
Thanks,
        Jerry

On Mon, Aug 10, 2015 at 2:38 PM, Jerry <jerry.comp@gmail.com<mailto:jerry.comp@gmail.com>>
wrote:
Thanks...   looks like I now hit that bug about HiveMetaStoreClient as I now get the message
about being unable to instantiate it. On a side note, does anyone know where hive-site.xml
is typically located?
Thanks,
        Jerry

On Mon, Aug 10, 2015 at 2:03 PM, Michael Armbrust <michael@databricks.com<mailto:michael@databricks.com>>
wrote:
You will need to use a HiveContext for window functions to work.

On Mon, Aug 10, 2015 at 1:26 PM, Jerry <jerry.comp@gmail.com<mailto:jerry.comp@gmail.com>>
wrote:
Hello,
Using Apache Spark 1.4.1 I'm unable to use lag or lead when making queries to a data frame
and I'm trying to figure out if I just have a bad setup or if this is a bug. As for the exceptions
I get: when using selectExpr() with a string as an argument, I get "NoSuchElementException:
key not found: lag" and when using the select method and ...spark.sql.functions.lag I get
an AnalysisException. If I replace lag with abs in the first case, Spark runs without exception,
so none of the other syntax is incorrect.
As for how I'm running it; the code is written in Java with a static method that takes the
SparkContext as an argument which is used to create a JavaSparkContext which then is used
to create an SQLContext which loads a json file from the local disk and runs those queries
on that data frame object. FYI: the java code is compiled, jared and then pointed to with
-cp when starting the spark shell, so all I do is "Test.run(sc)" in shell.
Let me know what to look for to debug this problem. I'm not sure where to look to solve this
problem.
Thanks,
        Jerry



Mime
View raw message