spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reynold Xin <r...@databricks.com>
Subject Re: dataframe.groupby.agg vs sql("select from groupby)")
Date Thu, 10 Mar 2016 08:20:44 GMT
They should be identical. Can you paste the detailed explain output.

On Thursday, March 10, 2016, FangFang Chen <lulynn_2015_spark@163.com>
wrote:

> hi,
> Based on my testing, the memory cost is very different for
> 1. sql("select * from ...").groupby.agg
> 2. sql("select ... From ... Groupby ...").
>
> For table.partition sized more than 500g, 2# run good, while outofmemory
> happened in 1#. I am using the same spark configurations.
> Could somebody tell why this happened?
>
> 发自 网易邮箱大师 <http://u.163.com/signature>
>
>
>

Mime
View raw message