spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashok Kumar <>
Subject Re: calling individual columns from spark temporary table
Date Thu, 24 Mar 2016 01:17:10 GMT
Thank you again
val r = df.filter(col("paid") > "").map(x => (x.getString(0),x.getString(1).....)

Can you give an example of column expression please like
df.filter(col("paid") > "").col("firstcolumn").getString   ?....


    On Thursday, 24 March 2016, 0:45, Michael Armbrust <> wrote:

 You can only use as on a Column expression, not inside of a lambda function.  The reason
is the lambda function is compiled into opaque bytecode that Spark SQL is not able to see. 
We just blindly execute it.
However, there are a couple of ways to name the columns that come out of a map.  Either use
a case class instead of a tuple.  Or use .toDF("name1", "name2"....) after the map.
>From a performance perspective, its even better though if you can avoid maps and stick
to Column expressions.  The reason is that for maps, we have to actually materialize and
object to pass to your function.  However, if you stick to column expression we can actually
work directly on serialized data.
On Wed, Mar 23, 2016 at 5:27 PM, Ashok Kumar <> wrote:

thank you sir
sql("select `_1` as firstcolumn from items")

is there anyway one can keep the csv column names using databricks when mapping
val r = df.filter(col("paid") > "").map(x => (x.getString(0),x.getString(1).....)

can I call example  x.getString(0).as.(firstcolumn) in above when mapping if possible so
columns will have labels


    On Thursday, 24 March 2016, 0:18, Michael Armbrust <> wrote:

 You probably need to use `backticks` to escape `_1` since I don't think that its a valid
SQL identifier.
On Wed, Mar 23, 2016 at 5:10 PM, Ashok Kumar <> wrote:

If I register a temporary table as below
 r.toDFres58: org.apache.spark.sql.DataFrame = [_1: string, _2: string, _3: double, _4: double,
_5: double]
sql("select * from items")res60: org.apache.spark.sql.DataFrame = [_1: string, _2: string,
_3: double, _4: double, _5: double]
Is there anyway I can do a select on the first column only
sql("select _1 from items" throws error
Thanking you


View raw message