spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yin Huai (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-15530) Partitioning discovery logic HadoopFsRelation should use a higher setting of parallelism
Date Wed, 25 May 2016 17:10:12 GMT
Yin Huai created SPARK-15530:
--------------------------------

             Summary: Partitioning discovery logic HadoopFsRelation should use a higher setting
of parallelism
                 Key: SPARK-15530
                 URL: https://issues.apache.org/jira/browse/SPARK-15530
             Project: Spark
          Issue Type: Bug
          Components: SQL
            Reporter: Yin Huai


At https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/fileSourceInterfaces.scala#L418,
we launch a spark job to do parallel file listing in order to discover partitions. However,
we do not set the number of partitions at here, which means that we are using the default
parallelism of the cluster. It is better to set the number of partitions explicitly to generate
smaller tasks, which help load balancing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message