spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-15392) The default value of size estimation is not good
Date Wed, 18 May 2016 23:15:13 GMT

    [ https://issues.apache.org/jira/browse/SPARK-15392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15290056#comment-15290056
] 

Apache Spark commented on SPARK-15392:
--------------------------------------

User 'davies' has created a pull request for this issue:
https://github.com/apache/spark/pull/13183

> The default value of size estimation is not good
> ------------------------------------------------
>
>                 Key: SPARK-15392
>                 URL: https://issues.apache.org/jira/browse/SPARK-15392
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>            Reporter: Davies Liu
>            Assignee: Davies Liu
>             Fix For: 2.0.0
>
>
> We use  autoBroadcastJoinThreshold + 1L as the default value of size estimation, that
is not good in 2.0, because we will calculate the size based on size of schema, then the estimation
could be less than autoBroadcastJoinThreshold if you have an SELECT on top of an DataFrame
created from RDD.
> We should use an even bigger default value, for example, MaxLong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message