spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <mich...@databricks.com>
Subject Re: calling individual columns from spark temporary table
Date Thu, 24 Mar 2016 00:44:36 GMT
You can only use as on a Column expression, not inside of a lambda
function.  The reason is the lambda function is compiled into opaque
bytecode that Spark SQL is not able to see.  We just blindly execute it.

However, there are a couple of ways to name the columns that come out of a
map.  Either use a case class instead of a tuple.  Or use .toDF("name1",
"name2"....) after the map.

>From a performance perspective, its even better though if you can avoid
maps and stick to Column expressions.  The reason is that for maps, we have
to actually materialize and object to pass to your function.  However, if
you stick to column expression we can actually work directly on serialized
data.

On Wed, Mar 23, 2016 at 5:27 PM, Ashok Kumar <ashok34668@yahoo.com> wrote:

> thank you sir
>
> sql("select `_1` as firstcolumn from items")
>
> is there anyway one can keep the csv column names using databricks when
> mapping
>
> val r = df.filter(col("paid") > "").map(x =>
> (x.getString(0),x.getString(1).....)
>
> can I call example  x.getString(0).as.(firstcolumn) in above when mapping
> if possible so columns will have labels
>
>
>
>
>
> On Thursday, 24 March 2016, 0:18, Michael Armbrust <michael@databricks.com>
> wrote:
>
>
> You probably need to use `backticks` to escape `_1` since I don't think
> that its a valid SQL identifier.
>
> On Wed, Mar 23, 2016 at 5:10 PM, Ashok Kumar <ashok34668@yahoo.com.invalid
> > wrote:
>
> Gurus,
>
> If I register a temporary table as below
>
>  r.toDF
> res58: org.apache.spark.sql.DataFrame = [_1: string, _2: string, _3:
> double, _4: double, _5: double]
>
> r.toDF.registerTempTable("items")
>
> sql("select * from items")
> res60: org.apache.spark.sql.DataFrame = [_1: string, _2: string, _3:
> double, _4: double, _5: double]
>
> Is there anyway I can do a select on the first column only
>
> sql("select _1 from items" throws error
>
> Thanking you
>
>
>
>
>

Mime
View raw message