spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bipin Nag <bipin....@gmail.com>
Subject Re: How to group multiple row data ?
Date Fri, 01 May 2015 06:23:59 GMT
OK, consider the case where there are multiple event triggers for a given
customer/ vendor/product like 1,1,2,2,3 arranged in the order of *event*
*occurrence* (time stamp). So output should be two groups (1,2) and
(1,2,3). The doublet would be first occurrence of 1,2 and triplet later
occurrences 1,2,3.

On 29 April 2015 at 18:04, Manoj Awasthi <awasthi.manoj@gmail.com> wrote:

> Sorry but I didn't fully understand the grouping. This line:
>
> >> The group must only take the closest previous trigger. The first one
> hence shows alone.
>
> Can you please explain further?
>
>
> On Wed, Apr 29, 2015 at 4:42 PM, bipin <bipin.nag@gmail.com> wrote:
>
>> Hi, I have a ddf with schema (CustomerID, SupplierID, ProductID, Event,
>> CreatedOn), the first 3 are Long ints and event can only be 1,2,3 and
>> CreatedOn is a timestamp. How can I make a group triplet/doublet/singlet
>> out
>> of them such that I can infer that Customer registered event from 1to 2
>> and
>> if present to 3 timewise and preserving the number of entries. For e.g.
>>
>> Before processing:
>> 10001, 132, 2002, 1, 2012-11-23
>> 10001, 132, 2002, 1, 2012-11-24
>> 10031, 102, 223, 2, 2012-11-24
>> 10001, 132, 2002, 2, 2012-11-25
>> 10001, 132, 2002, 3, 2012-11-26
>> (total 5 rows)
>>
>> After processing:
>> 10001, 132, 2002, 2012-11-23, "1"
>> 10031, 102, 223, 2012-11-24, "2"
>> 10001, 132, 2002, 2012-11-24, "1,2,3"
>> (total 5 in last field - comma separated!)
>>
>> The group must only take the closest previous trigger. The first one hence
>> shows alone. Can this be done using spark sql ? If it needs to processed
>> in
>> functionally in scala, how to do this. I can't wrap my head around this.
>> Can
>> anyone help.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-group-multiple-row-data-tp22701.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>

Mime
View raw message