spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jungtaek Lim <kabhwan.opensou...@gmail.com>
Subject Re: [SPARK-STRUCTURED-STREAMING] IllegalStateException: Race while writing batch 4
Date Thu, 13 Aug 2020 03:02:40 GMT
File stream sink doesn't support the functionality. There're several
approaches to do so:

1) two queries write to Kafka (or any intermediate storage which allows
concurrent writes), and let next Spark application read and write to the
final path
2) two queries write to two different directories, and let next Spark
application read and write to the final path
3) use alternative data sources which enable concurrent writes on writing
files (you may want to check Delta Lake, Apache Hudi, Apache Iceberg for
such functionalities - though you'd probably need to learn many other
things to maintain the table in good shape)

Thanks,
Jungtaek Lim (HeartSaVioR)

On Sat, Aug 8, 2020 at 4:19 AM Amit Joshi <mailtojoshiamit@gmail.com> wrote:

> Hi,
>
> I have 2spark structure streaming queries writing to the same outpath in
> object storage.
> Once in a while I am getting the "IllegalStateException: Race while
> writing batch 4".
> I found that this error is because there are two writers writing to the
> output path. The file streaming sink doesn't support multiple writers.
> It assumes there is only one writer writing to the path. Each query needs
> to use its own output directory.
>
> Is there a way to write the output to the same path by both queries, as I
> need the output at the same path.?
>
> Regards
> Amit Joshi
>

Mime
View raw message