spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yanbo Liang <yblia...@gmail.com>
Subject Re: How to ignore case in dataframe groupby?
Date Fri, 25 Dec 2015 05:45:14 GMT
You can use DF.groupBy(upper(col("a"))).agg(sum(col("b"))).
DataFrame provide function "upper" to update column to uppercase.

2015-12-24 20:47 GMT+08:00 Eran Witkon <eranwitkon@gmail.com>:

> Use DF.withColumn("upper-code",df("countrycode).toUpper))
> or just run a map function that does the same
>
> On Thu, Dec 24, 2015 at 2:05 PM Bharathi Raja <rajakbv@yahoo.com.invalid>
> wrote:
>
>> Hi,
>> Values in a dataframe column named countrycode are in different cases.
>> Eg: (US, us).  groupBy & count gives two rows but the requirement is to
>> ignore case for this operation.
>> 1) Is there a way to ignore case in groupBy? Or
>> 2) Is there a way to update the dataframe column countrycode to uppercase?
>>
>> Thanks in advance.
>>
>> Regards,
>> Raja
>>
>

Mime
View raw message