spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henrique Oliveira <heso...@gmail.com>
Subject Re: [Spark SQL]: Can't write DataFrame after using explode function on multiple columns.
Date Mon, 03 Aug 2020 14:06:24 GMT
Hi Patrick, thank you for your quick response.
That's exactly what I think. Actually, the result of this processing is an
intermediate table that is going to be used for other views generation.
Another approach I'm trying now, is to move the "explosion" step for this
"view generation" step, this way I don't need to explode every column but
just those used for the final client.

ps.I was avoiding UDFs for now because I'm still on Spark 2.4 and the
python udfs I tried had very bad performance, but I will give it a try in
this case. It can't be worse.
Thanks again!

Em seg., 3 de ago. de 2020 às 10:53, Patrick McCarthy <
pmccarthy@dstillery.com> escreveu:

> This seems like a very expensive operation. Why do you want to write out
> all the exploded values? If you just want all combinations of values, could
> you instead do it at read-time with a UDF or something?
>
> On Sat, Aug 1, 2020 at 8:34 PM hesouol <hesouol@gmail.com> wrote:
>
>> I forgot to add an information. By "can't write" I mean it keeps
>> processing
>> and nothing happens. The job runs for hours even with a very small file
>> and
>> I have to force the stoppage.
>>
>>
>>
>> --
>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>>
>
> --
>
>
> *Patrick McCarthy  *
>
> Senior Data Scientist, Machine Learning Engineering
>
> Dstillery
>
> 470 Park Ave South, 17th Floor, NYC 10016
>

Mime
View raw message