spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Artur Sukhenko <artur.sukhe...@gmail.com>
Subject Spark locking Hive partition
Date Mon, 24 Jun 2019 14:45:41 GMT
Hi,
I have Spark streaming app(1m batch) writing parquet data to a partition
e.g.
val hdfsPath = s"$dbPath/$tableName/year=$year/month=$month/day=$day"

df.write.mode(SaveMode.Append).parquet(hdfsPath)

I wonder would I lose data if I overwrite this partition with Hive
(compaction/deduplication) while Spark is adding more data to it every
minute. (hive query can take > 2 minutes)

Thanks,
Artur Sukhenko

Mime
View raw message