spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Dimiduk (JIRA)" <>
Subject [jira] [Commented] (SPARK-19638) Filter pushdown not working for struct fields
Date Fri, 17 Feb 2017 19:53:41 GMT


Nick Dimiduk commented on SPARK-19638:

Debugging. I'm looking at the match expression in [{{DataSourceStrategy#translateFilter(Expression)}}|].
The predicate comes in as a {{EqualTo(GetStructField, Literal)}}. This doesn't match any of
the cases. I was expecting it to step into the [{{case expressions.EqualTo(a: Attribute, Literal(v,
t)) =>}}|]
on line 511 but execution steps past this point. Upon investigation, {{GetStructField}} does
not extend {{Attribute}}.

>From this point, the {{EqualTo}} condition involving the struct field is dropped from
the filter set pushed down to the ES connector. Thus I believe this is an issue in Spark,
not in the connector.

> Filter pushdown not working for struct fields
> ---------------------------------------------
>                 Key: SPARK-19638
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.0
>            Reporter: Nick Dimiduk
> Working with a dataset containing struct fields, and enabling debug logging in the ES
connector, I'm seeing the following behavior. The dataframe is created over the ES connector
and then the schema is extended with a couple column aliases, such as.
> {noformat}
> df.withColumn("f2", df("foo"))
> {noformat}
> Queries vs those alias columns work as expected for fields that are non-struct members.
> {noformat}
> scala> df.withColumn("f2", df("foo")).where("f2 == '1'").limit(0).show
> 17/02/16 15:06:49 DEBUG DataSource: Pushing down filters [IsNotNull(foo),EqualTo(foo,1)]
> 17/02/16 15:06:49 TRACE DataSource: Transformed filters into DSL [{"exists":{"field":"foo"}},{"match":{"foo":"1"}}]
> {noformat}
> However, try the same with an alias over a struct field, and no filters are pushed down.
> {noformat}
> scala> df.withColumn("bar_baz", df("bar.baz")).where("bar_baz == '1'").limit(1).show
> {noformat}
> In fact, this is the case even when no alias is used at all.
> {noformat}
> scala> df.where("bar.baz == '1'").limit(1).show
> {noformat}
> Basically, pushdown for structs doesn't work at all.
> Maybe this is specific to the ES connector?

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message