spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wenchen Fan (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (SPARK-26161) Ignore empty files in load
Date Sun, 02 Dec 2018 02:30:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-26161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Wenchen Fan reassigned SPARK-26161:
-----------------------------------

    Assignee: Maxim Gekk

> Ignore empty files in load
> --------------------------
>
>                 Key: SPARK-26161
>                 URL: https://issues.apache.org/jira/browse/SPARK-26161
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Maxim Gekk
>            Assignee: Maxim Gekk
>            Priority: Minor
>
> Currently, empty files are opened in load, and Spark tries to read data from them. In
some cases, empty partitions are produced from such empty files. For example, in the case
of *wholetext* in Text datasource and *multiLine* modes in CSV/JSON datasource. The behaviour
is unnecessary, and empty files can be skipped in read. It can reduce number of tasks submitted
for loading empty files. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message