spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Selvam Raman <sel...@gmail.com>
Subject Re: Data frame Performance
Date Wed, 17 Aug 2016 04:04:35 GMT
Hi Mich,

The input and output are just for example and it s not exact column name.
Colc not needed.

The code which I shared is working fine but need to confirm, was it right
approach and effect performance.

Thanks,
Selvam R
+91-97877-87724
On Aug 16, 2016 5:18 PM, "Mich Talebzadeh" <mich.talebzadeh@gmail.com>
wrote:

> Hi Selvan,
>
> is table called sel,?
>
> And are these assumptions correct?
>
> site -> ColA
> requests -> ColB
>
> I don't think you are using ColC here?
>
> HTH
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 16 August 2016 at 12:06, Selvam Raman <selmna@gmail.com> wrote:
>
>> Hi All,
>>
>> Please suggest me the best approach to achieve result. [ Please comment
>> if the existing logic is fine or not]
>>
>> Input Record :
>>
>> ColA ColB ColC
>> 1 2 56
>> 1 2 46
>> 1 3 45
>> 1 5 34
>> 1 5 90
>> 2 1 89
>> 2 5 45
>> ​
>> Expected Result
>>
>> ResA     ResB
>> 1            2:2|3:3|5:5
>> 2           1:1|5:5
>>
>> I followd the below Spark steps
>>
>> (Spark version - 1.5.0)
>>
>> def valsplit(elem :scala.collection.mutable.WrappedArray[String]) :
>> String =
>> {
>>
>>     elem.map(e => e+":"+e).mkString("|")
>> }
>>
>> sqlContext.udf.register("valudf",valsplit(_:scala.collection
>> .mutable.WrappedArray[String]))
>>
>>
>> val x =sqlContext.sql("select site,valudf(collect_set(requests)) as test
>> from sel group by site").first
>>
>>
>>
>> --
>> Selvam Raman
>> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>>
>
>

Mime
View raw message