spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yasemin Kaya <godo...@gmail.com>
Subject Re: Struggling time by data
Date Fri, 25 Dec 2015 11:29:42 GMT
it is ok but . I want to categorize the urls by sessions actually.

*DATA:* (sorted by time)
*(userid1_time, url1) *
*(userid1_time2, url2)*
*(userid1_time3, url3) *
*(userid1_time4, url4)*

*RESULT: *
*url1 *already added to* session1*
*time2-time1 < 30 min *so* url2 *go to* session1*
*time3-time2 > 30 min *so* url3 *goes to* session2*
*time4-time3 <30 min *so *url4* goes to* session3*

*(user1, [url1, url2] [url3,url4])*

Does your solution fit my problem?

2015-12-25 12:23 GMT+02:00 Xingchi Wang <regrecall@gmail.com>:

> map{case(x, y) => s = x.split("_"), (s(0), (s(1),
> y)))}.groupByKey().filter{case (_, (a, b)) => abs(a._1, a._1) < 30min}
>
> does it work for you ?
>
> 2015-12-25 16:53 GMT+08:00 Yasemin Kaya <godot85@gmail.com>:
>
>> hi,
>>
>> I have struggled this data couple of days, i cant find solution. Could
>> you help me?
>>
>> *DATA:*
>> *(userid1_time, url) *
>> *(userid1_time2, url2)*
>>
>>
>> I want to get url which are in 30 min.
>>
>> *RESULT:*
>> *If time2-time1<30 min*
>> *(user1, [url1, url2] )*
>>
>> Best,
>> yasemin
>> --
>> hiç ender hiç
>>
>
>


-- 
hiç ender hiç

Mime
View raw message