spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcin Tustin <marcin.tus...@bluevoyant.com.INVALID>
Subject Re: How to combine all rows into a single row in DataFrame
Date Mon, 19 Aug 2019 20:35:59 GMT
It sounds like you want to aggregate your rows in some way. I actually just
wrote a blog post about that topic:
https://medium.com/@albamus/spark-aggregating-your-data-the-fast-way-e37b53314fad

On Mon, Aug 19, 2019 at 4:24 PM Rishikesh Gawade <rishikeshg1996@gmail.com>
wrote:

> *This Message originated outside your organization.*
> ------------------------------
> Hi All,
> I have been trying to serialize a dataframe in protobuf format. So far, I
> have been able to serialize every row of the dataframe by using map
> function and the logic for serialization within the same(within the lambda
> function). The resultant dataframe consists of rows in serialized format(1
> row = 1 serialized message).
> I wish to form a single protobuf serialized message for this dataframe and
> in order to do that i need to combine all the serialized rows using some
> custom logic very similar to the one used in map operation.
> I am assuming that this would be possible by using the reduce operation on
> the dataframe, however, i am unaware of how to go about it.
> Any suggestions/approach would be much appreciated.
>
> Thanks,
> Rishikesh
>

Mime
View raw message