spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Zhang <zjf...@gmail.com>
Subject Re: Transformation not happening for reduceByKey or GroupByKey
Date Fri, 21 Aug 2015 10:19:49 GMT
Hi Satish,

I don't see where spark support "-i", so suspect it is provided by DSE. In
that case, it might be bug of DSE.



On Fri, Aug 21, 2015 at 6:02 PM, satish chandra j <jsatishchandra@gmail.com>
wrote:

> HI Robin,
> Yes, it is DSE but issue is related to Spark only
>
> Regards,
> Satish Chandra
>
> On Fri, Aug 21, 2015 at 3:06 PM, Robin East <robin.east@xense.co.uk>
> wrote:
>
>> Not sure, never used dse - it’s part of DataStax Enterprise right?
>>
>> On 21 Aug 2015, at 10:07, satish chandra j <jsatishchandra@gmail.com>
>> wrote:
>>
>> HI Robin,
>> Yes, below mentioned piece or code works fine in Spark Shell but the same
>> when place in Script File and executed with -i <file name> it creating an
>> empty RDD
>>
>> scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
>> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77]
>> at makeRDD at <console>:28
>>
>>
>> scala> pairs.reduceByKey((x,y) => x + y).collect
>> res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
>>
>> Command:
>>
>>         dse spark --master local --jars postgresql-9.4-1201.jar -i
>>  <ScriptFile>
>>
>> I understand, I am missing something here due to which my final RDD does
>> not have as required output
>>
>> Regards,
>> Satish Chandra
>>
>> On Thu, Aug 20, 2015 at 8:23 PM, Robin East <robin.east@xense.co.uk>
>> wrote:
>>
>>> This works for me:
>>>
>>> scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40)))
>>> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77]
>>> at makeRDD at <console>:28
>>>
>>>
>>> scala> pairs.reduceByKey((x,y) => x + y).collect
>>> res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40))
>>>
>>> On 20 Aug 2015, at 11:05, satish chandra j <jsatishchandra@gmail.com>
>>> wrote:
>>>
>>> HI All,
>>> I have data in RDD as mentioned below:
>>>
>>> RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40))
>>>
>>>
>>> I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function
>>> on Values for each key
>>>
>>> Code:
>>> RDD.reduceByKey((x,y) => x+y)
>>> RDD.take(3)
>>>
>>> Result in console:
>>> RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey
>>> at <console>:73
>>> res:Array[(Int,Int)] = Array()
>>>
>>> Command as mentioned
>>>
>>> dse spark --master local --jars postgresql-9.4-1201.jar -i  <ScriptFile>
>>>
>>>
>>> Please let me know what is missing in my code, as my resultant Array is
>>> empty
>>>
>>>
>>>
>>> Regards,
>>> Satish
>>>
>>>
>>>
>>
>>
>


-- 
Best Regards

Jeff Zhang

Mime
View raw message