spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yin Huai (JIRA)" <>
Subject [jira] [Commented] (SPARK-13753) Column nullable is derived incorrectly
Date Mon, 02 May 2016 23:33:12 GMT


Yin Huai commented on SPARK-13753:

[~davies] Thank you for looking at it. Yea, we do have the requirement that keys of a map
should not contain null. For now, I think that option 1 is reasonable. 

> Column nullable is derived incorrectly
> --------------------------------------
>                 Key: SPARK-13753
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.2
>            Reporter: Jingwei Lu
>            Priority: Critical
> There is a problem in spark sql to derive nullable column and used in optimization incorrectly.
In following query:
> {code}
> select concat("perf.realtime.web", b.tags[1]) as metric, b.value, b.tags[0]
>               from (
>                 select explode(map(a.frontend[0], ARRAY(concat("metric:frontend", ",controller:",
COALESCE(controller, "null"), ",action:", COALESCE(action, "null")), ".p50"),
>                                  a.frontend[1], ARRAY(concat("metric:frontend", ",controller:",
COALESCE(controller, "null"), ",action:", COALESCE(action, "null")), ".p90"),
>                                  a.backend[0], ARRAY(concat("metric:backend", ",controller:",
COALESCE(controller, "null"), ",action:", COALESCE(action, "null")), ".p50"),
>                                  a.backend[1], ARRAY(concat("metric:backend", ",controller:",
COALESCE(controller, "null"), ",action:", COALESCE(action, "null")), ".p90"),
>                                  a.render[0], ARRAY(concat("metric:render", ",controller:",
COALESCE(controller, "null"), ",action:", COALESCE(action, "null")), ".p50"),
>                                  a.render[1], ARRAY(concat("metric:render", ",controller:",
COALESCE(controller, "null"), ",action:", COALESCE(action, "null")), ".p90"),
>                                  a.page_load_time[0], ARRAY(concat("metric:page_load_time",
",controller:", COALESCE(controller, "null"), ",action:", COALESCE(action, "null")), ".p50"),
>                                  a.page_load_time[1], ARRAY(concat("metric:page_load_time",
",controller:", COALESCE(controller, "null"), ",action:", COALESCE(action, "null")), ".p90"),
>                                  a.total_load_time[0], ARRAY(concat("metric:total_load_time",
",controller:", COALESCE(controller, "null"), ",action:", COALESCE(action, "null")), ".p50"),
>                                  a.total_load_time[1], ARRAY(concat("metric:total_load_time",
",controller:", COALESCE(controller, "null"), ",action:", COALESCE(action, "null")), ".p90")))
as (value, tags)
>                 from (
>                   select  data.controller as controller, data.action as action,
>                           percentile(data.frontend, array(0.5, 0.9)) as frontend,
>                           percentile(data.backend, array(0.5, 0.9)) as backend,
>                           percentile(data.render, array(0.5, 0.9)) as render,
>                           percentile(data.page_load_time, array(0.5, 0.9)) as page_load_time,
>                           percentile(data.total_load_time, array(0.5, 0.9)) as total_load_time
>                   from air_events_rt
>                   where type='air_events' and data.event_name='pageload'
>                   group by data.controller, data.action
>                 ) a
>               ) b
>               where b.value is not null
> {code}
> b.value is incorrectly derived as not nullable.  "b.value is not null" predicate will
be ignored by optimizer which cause the query return incorrect result. 

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message