spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joseph K. Bradley (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-6189) Pandas to DataFrame conversion should check field names for periods
Date Thu, 05 Mar 2015 17:52:38 GMT
Joseph K. Bradley created SPARK-6189:
----------------------------------------

             Summary: Pandas to DataFrame conversion should check field names for periods
                 Key: SPARK-6189
                 URL: https://issues.apache.org/jira/browse/SPARK-6189
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 1.3.0
            Reporter: Joseph K. Bradley
            Priority: Minor


Issue I ran into:  I imported an R dataset in CSV format into a Pandas DataFrame and then
use toDF() to convert that into a Spark DataFrame.  The R dataset had a column with a period
in it (column "GNP.deflator" in the "longley" dataset).  When I tried to select it using the
Spark DataFrame DSL, I could not because the DSL thought the period was selecting a field
within GNP.

Also, since "GNP" is another field's name, it gives an error which could be obscure to users,
complaining:
{code}
org.apache.spark.sql.AnalysisException: GetField is not valid on fields of type DoubleType;
{code}

We should either handle periods in column names or check during loading and warn/fail gracefully.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message