spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "lk_spark"<lk_sp...@163.com>
Subject Re: Re: Re: how to add colum to dataframe
Date Tue, 06 Dec 2016 10:02:07 GMT
I have know what is the right way to do it:
val df = spark.read.parquet("/parquetdata/weixin/page/month=201607")
val df2 = df.withColumn("pa_bid",when(isnull($"url"),"AAAA".split("#")(0)).otherwise(split(split(col("url"),"_biz=")(1),
"&mid")(1)))
scala> df2.select("pa_bid","url").show
+----------------+--------------------+
|          pa_bid|                 url|
+----------------+--------------------+
|MjM5MjEyNTk2MA==|http://mp.weixin....|
|MzAxODIwMDcwNA==|http://mp.weixin....|
|MzIzMjQ4NzQwOA==|http://mp.weixin....|
|MzAwOTIxMTcyMQ==|http://mp.weixin....|
|MzA3OTAyNzY2OQ==|http://mp.weixin....|
|MjM5NDAzMDAwMA==|http://mp.weixin....|
|MzAwMjE4MzU0Nw==|http://mp.weixin....|
|MzA4NzcyNjI0Mw==|http://mp.weixin....|
|MzI5OTE5Nzc5Ng==|http://mp.weixin....|


2016-12-06 

lk_spark 



发件人:"lk_spark"<lk_spark@163.com>
发送时间:2016-12-06 17:44
主题:Re: Re: how to add colum to dataframe
收件人:"Pankaj Wahane"<pankajwahane@live.com>,"user.spark"<user@spark.apache.org>
抄送:

thanks for reply. I will search how to use na.fill . and I don't know how to get the value
of the column and do some operation like substr or split.

2016-12-06 

lk_spark 



发件人:Pankaj Wahane <pankajwahane@live.com>
发送时间:2016-12-06 17:39
主题:Re: how to add colum to dataframe
收件人:"lk_spark"<lk_spark@163.com>,"user.spark"<user@spark.apache.org>
抄送:

You may want to try using df2.na.fill(…)
 
From: lk_spark <lk_spark@163.com>
Date: Tuesday, 6 December 2016 at 3:05 PM
To: "user.spark" <user@spark.apache.org>
Subject: how to add colum to dataframe
 
hi,all:
   my spark version is 2.0
   I have a parquet file with one colum name url type is string,I wang get substring from
the url and add it to the datafram:
   val df = spark.read.parquet("/parquetdata/weixin/page/month=201607")
   val df2 = df.withColumn("pa_bid",when($"url".isNull,col("url").substr(3, 5)))
   df2.select("pa_bid","url").show
   +------+--------------------+
|pa_bid|                 url|
+------+--------------------+
|  null|http://mp.weixin....|
|  null|http://mp.weixin....|
|  null|http://mp.weixin....|
|  null|http://mp.weixin....|
|  null|http://mp.weixin....|
|  null|http://mp.weixin....|
|  null|http://mp.weixin....|
|  null|http://mp.weixin....|
 
Why what I got is null?
 
2016-12-06



lk_spark 
Mime
View raw message