spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gourav Sengupta <gourav.sengu...@gmail.com>
Subject Re: Sorting the dataframe
Date Fri, 04 Mar 2016 10:31:50 GMT
Hi,

I am completely agree with the use of dataframe for most operations using
SPARK, unless you are custom algorithm or algorithms that need use of RDD.
Databricks have taken a cue from Apache Flink (I think) and rewritten
tungsten as the base engine that drives dataframe, so there is performance
optimization.


Regards,
Gourav Sengupta

On Fri, Mar 4, 2016 at 8:35 AM, Mohammad Tariq <dontariq@gmail.com> wrote:

> You could try DataFrame.sort() to sort your data based on a column.
>
>
>
> [image: http://]
>
> Tariq, Mohammad
> about.me/mti
> [image: http://]
> <http://about.me/mti>
>
>
> On Fri, Mar 4, 2016 at 1:48 PM, Angel Angel <areyouangel90@gmail.com>
> wrote:
>
>> hello sir,
>>
>>  i want to sort the following table as per the *count*
>>
>> value count
>> 52639 22
>> 75243 4
>> 13 55
>> 56 5
>> 185463 45
>> 324364 32
>>
>>
>> So first i convert the my dataframe to to rdd to sort the table.
>>
>> val k = table.rdd
>>
>> convert the rdd array into key value pair.
>>
>> val s =k.take(6)
>>
>> val rdd = s.map(x=> x(1),(x(0)).
>> rdd.sortByKey
>>
>>
>>
>> this is my all operations i did to sort the table.
>>
>> Please can you suggest me the better way to sort the table
>>
>
>

Mime
View raw message