spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "boyingking@163.com" <boyingk...@163.com>
Subject Re: Re: About SpakSQL OR MLlib
Date Tue, 16 Sep 2014 04:30:53 GMT
case class Car(id:String,age:Int,tkm:Int,emissions:Int,date:Date, km:Int,
fuel:Int)

1. Create an PairedRDD of (age,Car) tuples (pairedRDD)
2. Create a new function fc

//returns the interval lower and upper bound

def fc(x:Int, interval:Int) : (Int,Int) = {

     val floor = x - (x%interval)

     val ceil = floor + interval

     (floor,ceil)

 }
3. do a groupBy on this RDD (step 1) by passing the function fc

val myrdd = pairedRDD.groupBy( x => fun(x.age, 5) )


On Mon, Sep 15, 2014 at 11:38 PM, boyingking@163.com <boyingking@163.com>
wrote:

>  Hi:
> I have a dataset ,the struct [id,driverAge,TotalKiloMeter ,Emissions
> ,date,KiloMeter ,fuel], and the data like this:
>  [1-980,34,221926,9,2005-2-8,123,14]
> [1-981,49,271321,15,2005-2-8,181,82]
> [1-982,36,189149,18,2005-2-8,162,51]
> [1-983,51,232753,5,2005-2-8,106,92]
> [1-984,56,45338,8,2005-2-8,156,98]
> [1-985,45,132060,4,2005-2-8,179,98]
> [1-986,40,15751,5,2005-2-8,149,77]
> [1-987,36,167930,17,2005-2-8,121,87]
> [1-988,53,44949,4,2005-2-8,195,72]
> [1-989,34,252867,5,2005-2-8,181,86]
> [1-990,53,152858,11,2005-2-8,130,43]
> [1-991,40,126831,11,2005-2-8,126,47]
> ………………………………………………
>
> now ,my requirments is group by driverAge, five is a step,like 20~25 is a
> group,26~30 is a group?
> how should i do ? who can give some code?
>
>
> ------------------------------
>  boyingking@163.com
>

Mime
View raw message