spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wenchen Fan <cloud0...@gmail.com>
Subject Re: Select top (100) percent equivalent in spark
Date Wed, 05 Sep 2018 03:59:46 GMT
+ Liang-Chi and Herman,

I think this is a common requirement to get top N records. For now we
guarantee it by the `TakeOrderedAndProject` operator. However, this
operator may not be used if the
spark.sql.execution.topKSortFallbackThreshold config has a small value.

Shall we reconsider
https://github.com/apache/spark/commit/5c27b0d4f8d378bd7889d26fb358f478479b9996
? Or we don't expect users to set a small value for
spark.sql.execution.topKSortFallbackThreshold?


On Wed, Sep 5, 2018 at 11:24 AM Chetan Khatri <chetan.opensource@gmail.com>
wrote:

> Thanks
>
> On Wed 5 Sep, 2018, 2:15 AM Russell Spitzer, <russell.spitzer@gmail.com>
> wrote:
>
>> RDD: Top
>>
>> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD@top(num:Int)(implicitord:Ordering[T]):Array[T
>> ]
>> Which is pretty much what Sean suggested
>>
>> For Dataframes I think doing a order and limit would be equivalent after
>> optimizations.
>>
>> On Tue, Sep 4, 2018 at 2:28 PM Sean Owen <srowen@gmail.com> wrote:
>>
>>> Sort and take head(n)?
>>>
>>> On Tue, Sep 4, 2018 at 12:07 PM Chetan Khatri <
>>> chetan.opensource@gmail.com> wrote:
>>>
>>>> Dear Spark dev, anything equivalent in spark ?
>>>>
>>>

Mime
View raw message