spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Junfeng Chen <>
Subject How to delete empty columns in df when writing to parquet?
Date Tue, 03 Apr 2018 03:28:56 GMT
I am trying to read data from kafka and writing them in parquet format via
Spark Streaming.
The problem is, the data from kafka are in variable data structure. For
example, app one has columns A,B,C, app two has columns B,C,D. So the data
frame I read from kafka has all columns ABCD. When I decide to write the
dataframe to parquet file partitioned with app name,
the parquet file of app one also contains columns D, where the columns D is
empty and it contains no data actually. So how to filter the empty columns
when I writing dataframe to parquet?


Junfeng Chen

View raw message