spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vadim Semenov <va...@datadoghq.com>
Subject Re: Spark DataSets and multiple write(.) calls
Date Mon, 19 Nov 2018 18:12:37 GMT
You can use checkpointing, in this case Spark will write out an rdd to
whatever destination you specify, and then the RDD can be reused from the
checkpointed state avoiding recomputing.

On Mon, Nov 19, 2018 at 7:51 AM Dipl.-Inf. Rico Bergmann <
info@ricobergmann.de> wrote:

> Thanks for your advise. But I'm using Batch processing. Does anyone have a
> solution for the batch processing case?
>
> Best,
>
> Rico.
>
> Am 19.11.2018 um 09:43 schrieb Magnus Nilsson:
>
>
> Magnus Nilsson
> 9:43 AM (0 minutes ago)
>
> to info
> I had the same requirements. As far as I know the only way is to extend
> the foreachwriter, cache the microbatch result and write to each output.
>
> https://docs.databricks.com/spark/latest/structured-streaming/foreach.html
>
> Unfortunately it seems as if you have to make a new connection "per batch"
> instead of creating one long lasting connections for the pipeline as such.
> Ie you might have to implement some sort of connection pooling by yourself
> depending on sink.
>
> Regards,
>
> Magnus
>
>
> On Mon, Nov 19, 2018 at 9:13 AM Dipl.-Inf. Rico Bergmann <
> info@ricobergmann.de> wrote:
>
>> Hi!
>>
>> I have a SparkSQL programm, having one input and 6 ouputs (write). When
>> executing this programm every call to write(.) executes the plan. My
>> problem is, that I want all these writes to happen in parallel (inside
>> one execution plan), because all writes have a common and compute
>> intensive subpart, that can be shared by all plans. Is there a
>> possibility to do this? (Caching is not a solution because the input
>> dataset is way to large...)
>>
>> Hoping for advises ...
>>
>> Best, Rico B.
>>
>>
>> ---
>> Diese E-Mail wurde von Avast Antivirus-Software auf Viren gepr├╝ft.
>> https://www.avast.com/antivirus
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>>
>
>
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
Virenfrei.
> www.avast.com
> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
> <#m_-7118895712672043959_m_6471921890789606388_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org



-- 
Sent from my iPhone

Mime
View raw message