spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pınar Ersoy <>
Subject Spark 3 - Predicate/Projection Pushdown Feature
Date Tue, 06 Oct 2020 07:36:49 GMT

I am working as a Data Scientist and using Spark with Python while
developing my models.

Our data model has multiple nested structures (*Ex.* For the first two-level, I can read them
as the following with PySpark:

*data.where(col('attributes.product') != "")*

However, I cannot read the three-level nested structure:

*data.where(col('attributes.product.feature') != "")*

I was hoping to overcome this problem with Spark 3 with the advancements of
Predicate & Projection Pushdown; however, after I tested them I still
cannot read the third-level *without flattening the data*.

I would like to ask whether there will be an improvement in reading nested
data (JSON/Parquet) that has *more than two levels *in the upcoming
versions of Spark 3.

Or if I am missing something in the existing Spark versions released,
please let me know how to proceed.

Kindest Regards,
Pınar Ersoy

View raw message