spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Davies Liu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-13753) Column nullable is derived incorrectly
Date Mon, 02 May 2016 23:00:16 GMT

    [ https://issues.apache.org/jira/browse/SPARK-13753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15267689#comment-15267689
] 

Davies Liu commented on SPARK-13753:
------------------------------------

After looking at the query, the bug is caused by we though the key of MapType should always
be not-nullable, but when create an map using map(), we do not check the nullability of keys.

So the solution could be 1) enforce the nullability check in map(), which will break this
use case, 2) or allow `null` as key in MapType, which may require more API changes

cc [~rxin] [~marmbrus] [~yhuai]

> Column nullable is derived incorrectly
> --------------------------------------
>
>                 Key: SPARK-13753
>                 URL: https://issues.apache.org/jira/browse/SPARK-13753
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.2
>            Reporter: Jingwei Lu
>            Priority: Critical
>
> There is a problem in spark sql to derive nullable column and used in optimization incorrectly.
In following query:
> {code}
> select concat("perf.realtime.web", b.tags[1]) as metric, b.value, b.tags[0]
>               from (
>                 select explode(map(a.frontend[0], ARRAY(concat("metric:frontend", ",controller:",
COALESCE(controller, "null"), ",action:", COALESCE(action, "null")), ".p50"),
>                                  a.frontend[1], ARRAY(concat("metric:frontend", ",controller:",
COALESCE(controller, "null"), ",action:", COALESCE(action, "null")), ".p90"),
>                                  a.backend[0], ARRAY(concat("metric:backend", ",controller:",
COALESCE(controller, "null"), ",action:", COALESCE(action, "null")), ".p50"),
>                                  a.backend[1], ARRAY(concat("metric:backend", ",controller:",
COALESCE(controller, "null"), ",action:", COALESCE(action, "null")), ".p90"),
>                                  a.render[0], ARRAY(concat("metric:render", ",controller:",
COALESCE(controller, "null"), ",action:", COALESCE(action, "null")), ".p50"),
>                                  a.render[1], ARRAY(concat("metric:render", ",controller:",
COALESCE(controller, "null"), ",action:", COALESCE(action, "null")), ".p90"),
>                                  a.page_load_time[0], ARRAY(concat("metric:page_load_time",
",controller:", COALESCE(controller, "null"), ",action:", COALESCE(action, "null")), ".p50"),
>                                  a.page_load_time[1], ARRAY(concat("metric:page_load_time",
",controller:", COALESCE(controller, "null"), ",action:", COALESCE(action, "null")), ".p90"),
>                                  a.total_load_time[0], ARRAY(concat("metric:total_load_time",
",controller:", COALESCE(controller, "null"), ",action:", COALESCE(action, "null")), ".p50"),
>                                  a.total_load_time[1], ARRAY(concat("metric:total_load_time",
",controller:", COALESCE(controller, "null"), ",action:", COALESCE(action, "null")), ".p90")))
as (value, tags)
>                 from (
>                   select  data.controller as controller, data.action as action,
>                           percentile(data.frontend, array(0.5, 0.9)) as frontend,
>                           percentile(data.backend, array(0.5, 0.9)) as backend,
>                           percentile(data.render, array(0.5, 0.9)) as render,
>                           percentile(data.page_load_time, array(0.5, 0.9)) as page_load_time,
>                           percentile(data.total_load_time, array(0.5, 0.9)) as total_load_time
>                   from air_events_rt
>                   where type='air_events' and data.event_name='pageload'
>                   group by data.controller, data.action
>                 ) a
>               ) b
>               where b.value is not null
> {code}
> b.value is incorrectly derived as not nullable.  "b.value is not null" predicate will
be ignored by optimizer which cause the query return incorrect result. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message