spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hu Fuwang (Jira)" <j...@apache.org>
Subject [jira] [Created] (SPARK-29615) Add insertInto method with byName parameter in DataFrameWriter
Date Sun, 27 Oct 2019 23:56:00 GMT
Hu Fuwang created SPARK-29615:
---------------------------------

             Summary: Add insertInto method with byName parameter in DataFrameWriter
                 Key: SPARK-29615
                 URL: https://issues.apache.org/jira/browse/SPARK-29615
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.0.0
            Reporter: Hu Fuwang


Currently, the insertion through DataFrameWriter.insertInto method ignores the column names
and just uses position-based resolution. As DataFrameWriter only has one public insertInto
method, spark users may not check the description of this method and assume Spark will match
the columns by name. In such case, wrong column may be used as partition column, which may
result in problem (eg. huge amount of files/folders may be created in hive table tmp location).

We propose to add a new insertInto method in DataFrameWriter which has byName parameter for
Spark user to specify whether match columns by name.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message