spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabor Somogyi <gabor.g.somo...@gmail.com>
Subject Re: Parquet File Output Sink - Spark Structured Streaming
Date Wed, 27 Mar 2019 16:20:18 GMT
Hi Matt,

Maybe you could set maxFilesPerTrigger to 1.

BR,
G


On Wed, Mar 27, 2019 at 4:45 PM Matt Kuiper <matt.kuiper@polarisalpha.com>
wrote:

> Hello,
>
> I am new to Spark and Structured Streaming and have the following File
> Output Sink question:
>
> Wondering what (and how to modify) triggers a Spark Sturctured Streaming
> Query (with Parquet File output sink configured) to write data to the
> parquet files.  I periodically feed the Stream input data (using
> Stream Reader to read in files), but it does not write output to Parquet
> file for each file provided as input.   Once I have given it a few files,
> it tends to write a Parquet file just fine.
>
> I am wondering how to control the threshold to write.  I would like to be
> able force a new write to Parquet file for every new file provided as input
> (at least for intitial testing).   Any tips appreciated!
>
> Thanks,
> Matt
>
>

Mime
View raw message