spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dipl.-Inf. Rico Bergmann" <i...@ricobergmann.de>
Subject Re: Spark DataSets and multiple write(.) calls
Date Mon, 19 Nov 2018 12:51:34 GMT
Thanks for your advise. But I'm using Batch processing. Does anyone have
a solution for the batch processing case?

Best,

Rico.


Am 19.11.2018 um 09:43 schrieb Magnus Nilsson:
>
>
>       Magnus Nilsson
>
> 	
> 9:43 AM (0 minutes ago)
> 	
> 	
> to info
>
> I had the same requirements. As far as I know the only way is to
> extend the foreachwriter, cache the microbatch result and write to
> each output.
>
> https://docs.databricks.com/spark/latest/structured-streaming/foreach.html
>
> Unfortunately it seems as if you have to make a new connection "per
> batch" instead of creating one long lasting connections for the
> pipeline as such. Ie you might have to implement some sort of
> connection pooling by yourself depending on sink. 
>
> Regards,
>
> Magnus
>
>
> On Mon, Nov 19, 2018 at 9:13 AM Dipl.-Inf. Rico Bergmann
> <info@ricobergmann.de <mailto:info@ricobergmann.de>> wrote:
>
>     Hi!
>
>     I have a SparkSQL programm, having one input and 6 ouputs (write).
>     When
>     executing this programm every call to write(.) executes the plan. My
>     problem is, that I want all these writes to happen in parallel (inside
>     one execution plan), because all writes have a common and compute
>     intensive subpart, that can be shared by all plans. Is there a
>     possibility to do this? (Caching is not a solution because the input
>     dataset is way to large...)
>
>     Hoping for advises ...
>
>     Best, Rico B.
>
>
>     ---
>     Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
>     https://www.avast.com/antivirus
>
>
>     ---------------------------------------------------------------------
>     To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>     <mailto:user-unsubscribe@spark.apache.org>
>


Mime
View raw message