spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Liang-Chi Hsieh <vii...@gmail.com>
Subject Re: Select top (100) percent equivalent in spark
Date Wed, 05 Sep 2018 14:58:26 GMT

Thanks for pinging me.

Seems to me we should not make assumption about the value of
spark.sql.execution.topKSortFallbackThreshold config. Once it is changed,
the global sort + limit can produce wrong result for now. I will make a PR
for this.


cloud0fan wrote
> + Liang-Chi and Herman,
> 
> I think this is a common requirement to get top N records. For now we
> guarantee it by the `TakeOrderedAndProject` operator. However, this
> operator may not be used if the
> spark.sql.execution.topKSortFallbackThreshold config has a small value.
> 
> Shall we reconsider
> https://github.com/apache/spark/commit/5c27b0d4f8d378bd7889d26fb358f478479b9996
> ? Or we don't expect users to set a small value for
> spark.sql.execution.topKSortFallbackThreshold?
> 
> 
> On Wed, Sep 5, 2018 at 11:24 AM Chetan Khatri &lt;

> chetan.opensource@

> &gt;
> wrote:
> 
>> Thanks
>>
>> On Wed 5 Sep, 2018, 2:15 AM Russell Spitzer, &lt;

> russell.spitzer@

> &gt;
>> wrote:
>>
>>> RDD: Top
>>>
>>> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD@top(num:Int)(implicitord:Ordering[T]):Array[T
>>> ]
>>> Which is pretty much what Sean suggested
>>>
>>> For Dataframes I think doing a order and limit would be equivalent after
>>> optimizations.
>>>
>>> On Tue, Sep 4, 2018 at 2:28 PM Sean Owen &lt;

> srowen@

> &gt; wrote:
>>>
>>>> Sort and take head(n)?
>>>>
>>>> On Tue, Sep 4, 2018 at 12:07 PM Chetan Khatri <
>>>> 

> chetan.opensource@

>> wrote:
>>>>
>>>>> Dear Spark dev, anything equivalent in spark ?
>>>>>
>>>>





--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Mime
View raw message