spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From JF Chen <darou...@gmail.com>
Subject Re: How to deal with context dependent computing?
Date Mon, 27 Aug 2018 01:38:33 GMT
Thanks Sonal.
For example, I have data as following:
login 2018/8/27 10:00
logout 2018/8/27 10:05
login 2018/8/27 10:08
logout 2018/8/27 10:15
login 2018/8/27 11:08
logout 2018/8/27 11:32

Now I want to calculate the time between each login and logout. For
example, I should get 5 min, 7 min, 24 min from the above sample data.
I know I can calculate it with foreach, but it seems all data running on
spark driver node rather than multi executors.
So any good way to solve this problem? Thanks!

Regard,
Junfeng Chen


On Thu, Aug 23, 2018 at 6:15 PM Sonal Goyal <sonalgoyal4@gmail.com> wrote:

> Hi Junfeng,
>
> Can you please show by means of an example what you are trying to achieve?
>
> Thanks,
> Sonal
> Nube Technologies <http://www.nubetech.co>
>
> <http://in.linkedin.com/in/sonalgoyal>
>
>
>
> On Thu, Aug 23, 2018 at 8:22 AM, JF Chen <darouwan@gmail.com> wrote:
>
>> For example, I have some data with timstamp marked as category A and B,
>> and ordered by time. Now I want to calculate each duration from A to B. In
>> normal program, I can use the  flag bit to record the preview data if it is
>> A or B, and then calculate the duration. But in Spark Dataframe, how to do
>> it?
>>
>> Thanks!
>>
>> Regard,
>> Junfeng Chen
>>
>
>

Mime
View raw message