df.filter(col("paid") > "").select(col("name1").as("newName"), ...) 

On Wed, Mar 23, 2016 at 6:17 PM, Ashok Kumar <ashok34668@yahoo.com> wrote:
Thank you again

For

val r = df.filter(col("paid") > "").map(x => (x.getString(0),x.getString(1).....)

Can you give an example of column expression please
 like

df.filter(col("paid") > "").col("firstcolumn").getString   ?....




On Thursday, 24 March 2016, 0:45, Michael Armbrust <michael@databricks.com> wrote:


You can only use as on a Column expression, not inside of a lambda function.  The reason is the lambda function is compiled into opaque bytecode that Spark SQL is not able to see.  We just blindly execute it.

However, there are a couple of ways to name the columns that come out of a map.  Either use a case class instead of a tuple.  Or use .toDF("name1", "name2"....) after the map.

From a performance perspective, its even better though if you can avoid maps and stick to Column expressions.  The reason is that for maps, we have to actually materialize and object to pass to your function.  However, if you stick to column expression we can actually work directly on serialized data.

On Wed, Mar 23, 2016 at 5:27 PM, Ashok Kumar <ashok34668@yahoo.com> wrote:
thank you sir

sql("select `_1` as firstcolumn from items")

is there anyway one can keep the csv column names using databricks when mapping

val r = df.filter(col("paid") > "").map(x => (x.getString(0),x.getString(1).....)

can I call example  x.getString(0).as.(firstcolumn) in above when mapping if possible so columns will have labels





On Thursday, 24 March 2016, 0:18, Michael Armbrust <michael@databricks.com> wrote:


You probably need to use `backticks` to escape `_1` since I don't think that its a valid SQL identifier.

On Wed, Mar 23, 2016 at 5:10 PM, Ashok Kumar <ashok34668@yahoo.com.invalid> wrote:
Gurus,

If I register a temporary table as below

 r.toDF
res58: org.apache.spark.sql.DataFrame = [_1: string, _2: string, _3: double, _4: double, _5: double]

r.toDF.registerTempTable("items")

sql("select * from items")
res60: org.apache.spark.sql.DataFrame = [_1: string, _2: string, _3: double, _4: double, _5: double]

Is there anyway I can do a select on the first column only

sql("select _1 from items" throws error

Thanking you