spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-26233) Incorrect decimal value with java beans and first/last/max... functions
Date Tue, 11 Dec 2018 01:19:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-26233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16715897#comment-16715897
] 

ASF GitHub Bot commented on SPARK-26233:
----------------------------------------

cloud-fan commented on issue #23232: [SPARK-26233][SQL][BACKPORT-2.4] CheckOverflow when encoding
a decimal value
URL: https://github.com/apache/spark/pull/23232#issuecomment-446036494
 
 
   Hi @mgaido91 , just one more question. Without this patch, does Spark always return wrong
result if the actual decimal doesn't fix the precision? even for simple operations like `df.show`?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Incorrect decimal value with java beans and first/last/max... functions
> -----------------------------------------------------------------------
>
>                 Key: SPARK-26233
>                 URL: https://issues.apache.org/jira/browse/SPARK-26233
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.2, 2.1.3, 2.2.2, 2.3.1, 2.4.0
>            Reporter: Miquel Canes
>            Assignee: Marco Gaido
>            Priority: Blocker
>              Labels: correctness
>             Fix For: 2.2.3, 2.3.3, 2.4.1, 3.0.0
>
>
> Decimal values from Java beans are incorrectly scaled when used with functions like first/last/max...
> This problem came because Encoders.bean always set Decimal values as _DecimalType(this.MAX_PRECISION(),
18)._
> Usually it's not a problem if you use numeric functions like *sum* but for functions
like *first*/*last*/*max*... it is a problem.
> How to reproduce this error:
> Using this class as an example:
> {code:java}
> public class Foo implements Serializable {
>   private String group;
>   private BigDecimal var;
>   public BigDecimal getVar() {
>     return var;
>   }
>   public void setVar(BigDecimal var) {
>     this.var = var;
>   }
>   public String getGroup() {
>     return group;
>   }
>   public void setGroup(String group) {
>     this.group = group;
>   }
> }
> {code}
>  
> And a dummy code to create some objects:
> {code:java}
> Dataset<Foo> ds = spark.range(5)
>     .map(l -> {
>       Foo foo = new Foo();
>       foo.setGroup("" + l);
>       foo.setVar(BigDecimal.valueOf(l + 0.1111));
>       return foo;
>     }, Encoders.bean(Foo.class));
> ds.printSchema();
> ds.show();
> +-----+------+
> |group| var|
> +-----+------+
> | 0|0.1111|
> | 1|1.1111|
> | 2|2.1111|
> | 3|3.1111|
> | 4|4.1111|
> +-----+------+
> {code}
> We can see that the DecimalType is precision 38 and 18 scale and all values are show
correctly.
> But if we use a first function, they are scaled incorrectly:
> {code:java}
> ds.groupBy(col("group"))
>     .agg(
>         first("var")
>     )
>     .show();
> +-----+-----------------+
> |group|first(var, false)|
> +-----+-----------------+
> | 3| 3.1111E-14|
> | 0| 1.111E-15|
> | 1| 1.1111E-14|
> | 4| 4.1111E-14|
> | 2| 2.1111E-14|
> +-----+-----------------+
> {code}
> This incorrect behavior cannot be reproduced if we use "numerical "functions like sum
or if the column is cast a new Decimal Type.
> {code:java}
> ds.groupBy(col("group"))
>     .agg(
>         sum("var")
>     )
>     .show();
> +-----+--------------------+
> |group| sum(var)|
> +-----+--------------------+
> | 3|3.111100000000000000|
> | 0|0.111100000000000000|
> | 1|1.111100000000000000|
> | 4|4.111100000000000000|
> | 2|2.111100000000000000|
> +-----+--------------------+
> ds.groupBy(col("group"))
>     .agg(
>         first(col("var").cast(new DecimalType(38, 8)))
>     )
>     .show();
> +-----+----------------------------------------+
> |group|first(CAST(var AS DECIMAL(38,8)), false)|
> +-----+----------------------------------------+
> | 3| 3.11110000|
> | 0| 0.11110000|
> | 1| 1.11110000|
> | 4| 4.11110000|
> | 2| 2.11110000|
> +-----+----------------------------------------+
> {code}
>    
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message