spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Reynold Xin (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-18021) Refactor file name specification for data sources
Date Thu, 20 Oct 2016 05:36:58 GMT
Reynold Xin created SPARK-18021:
-----------------------------------

             Summary: Refactor file name specification for data sources
                 Key: SPARK-18021
                 URL: https://issues.apache.org/jira/browse/SPARK-18021
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
            Reporter: Reynold Xin
            Assignee: Reynold Xin


Currently each data source OutputWriter is responsible for specifying the entire file name
for each file output. This, however, does not make any sense because we rely on file name
for certain behaviors in Spark SQL, e.g. bucket id. The current approach allows individual
data sources to break the implementation of bucketing.

We don't want to move file name entirely also out of the data sources, because different data
sources do want to specify different extensions.

A good compromise is for the OutputWriter to take in the prefix for a file, and it can add
its own suffix.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message