spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kazuaki Ishizaki (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-22935) Dataset with Java Beans for java.sql.Date throws CompileException
Date Tue, 02 Jan 2018 15:16:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-22935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16308194#comment-16308194
] 

Kazuaki Ishizaki commented on SPARK-22935:
------------------------------------------

[~jlaskowski]

When you see the scheme of this Dataset, {{timestamp}} is {{timestamp}}, is not {{date}}.
The inferSchema always sets type for time into {{timestamp}}. 
If you change declaration of {{timestamp}} in {{CDR}} class from {{java.sql.Date}} to {{java.sql.Timestamp}}
as below, it works well.

{code}
    Dataset<Row> df = spark
            .read()
            .format("csv")
            .option("header", "true")
            .option("inferSchema", "true")
            .option("delimiter", ";")
            .csv("CDR_SAMPLE.csv");
    df.printSchema();
    Dataset<CDR> cdr = df
            .as(Encoders.bean(CDR.class));
    cdr.printSchema();
    Dataset<CDR> ds = cdr.filter((FilterFunction<CDR2>) x -> (x.timestamp !=
null));
    ...

// result
root
 |-- timestamp: timestamp (nullable = true)
{code}

{code}
// CDR.java
public class CDR implements java.io.Serializable {
  public java.sql.Timestamp timestamp;
  public java.sql.Timestamp getTimestamp() { return this.timestamp; }
  public void setTimestamp(java.sql.Timestamp  timestamp) { this.timestamp = timestamp; }
}
{code}


> Dataset with Java Beans for java.sql.Date throws CompileException
> -----------------------------------------------------------------
>
>                 Key: SPARK-22935
>                 URL: https://issues.apache.org/jira/browse/SPARK-22935
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.1, 2.3.0
>            Reporter: Kazuaki Ishizaki
>
> The following code can throw an exception with or without whole-stage codegen.
> {code}
>   public void SPARK22935() {
>     Dataset<CDR> cdr = spark
>             .read()
>             .format("csv")
>             .option("header", "true")
>             .option("inferSchema", "true")
>             .option("delimiter", ";")
>             .csv("CDR_SAMPLE.csv")
>             .as(Encoders.bean(CDR.class));
>     Dataset<CDR> ds = cdr.filter((FilterFunction<CDR>) x -> (x.timestamp
!= null));
>     long c = ds.count();
>     cdr.show(2);
>     ds.show(2);
>     System.out.println("cnt=" + c);
>   }
> // CDR.java
> public class CDR implements java.io.Serializable {
>   public java.sql.Date timestamp;
>   public java.sql.Date getTimestamp() { return this.timestamp; }
>   public void setTimestamp(java.sql.Date timestamp) { this.timestamp = timestamp; }
> }
> // CDR_SAMPLE.csv
> timestamp
> 2017-10-29T02:37:07.815Z
> 2017-10-29T02:38:07.815Z
> {code}
> result
> {code}
> 12:17:10.352 ERROR org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator: failed
to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 61,
Column 70: No applicable constructor/method found for actual parameters "long"; candidates
are: "public static java.sql.Date org.apache.spark.sql.catalyst.util.DateTimeUtils.toJavaDate(int)"
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 61, Column
70: No applicable constructor/method found for actual parameters "long"; candidates are: "public
static java.sql.Date org.apache.spark.sql.catalyst.util.DateTimeUtils.toJavaDate(int)"
> 	at org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:11821)
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message