spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alonso Isidoro Roman <alons...@gmail.com>
Subject Re: Code optimization
Date Tue, 19 Apr 2016 08:22:25 GMT
Hi Angel,

how about to use this :

k.filter(k("WT_ID")

as a val variable? i think you can avoid that and do not forget to use
System.nanoTime to know the profit...

Alonso Isidoro Roman.

Mis citas preferidas (de hoy) :
"Si depurar es el proceso de quitar los errores de software, entonces
programar debe ser el proceso de introducirlos..."
 -  Edsger Dijkstra

My favorite quotes (today):
"If debugging is the process of removing software bugs, then programming
must be the process of putting ..."
  - Edsger Dijkstra

"If you pay peanuts you get monkeys"


2016-04-19 9:46 GMT+02:00 Angel Angel <areyouangel90@gmail.com>:

> Hello,
>
> I am writing the one spark application, it runs well but takes long
> execution time can anyone help me to optimize my query to increase the
> processing speed.
>
>
> I am writing one application in which i have to construct the histogram
> and compare the histograms in order to find the final candidate.
>
>
> My code in which i read the text file and matches the first field and
> subtract the second fild from the matched candidates and update the table.
>
> Here is my code, Please help me to optimize it.
>
>
> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>
>
> import sqlContext.implicits._
>
>
> val Array_Ele =
> sc.textFile("/root/Desktop/database_200/patch_time_All_20_modified_1.txt").flatMap(line=>line.split("
> ")).take(900)
>
>
> val df1=
> sqlContext.read.parquet("hdfs://hadoopm0:8020/tmp/input1/database_modified_No_name_400.parquet")
>
>
> var k = df1.filter(df1("Address").equalTo(Array_Ele(0) ))
>
> var a= 0
>
>
> for( a <-2 until 900 by 2){
>
> k=k.unionAll(
> df1.filter(df1("Address").equalTo(Array_Ele(a))).select(df1("Address"),df1("Couple_time")-Array_Ele(a+1),df1("WT_ID")))}
>
>
> k.cache()
>
>
> val WT_ID_Sort  = k.groupBy("WT_ID").count().sort(desc("count"))
>
>
> val temp = WT_ID_Sort.select("WT_ID").rdd.map(r=>r(0)).take(10)
>
>
> val Table0=
> k.filter(k("WT_ID").equalTo(temp(0))).groupBy("Couple_time").count().select(max($"count")).show()
>
> val Table1=
> k.filter(k("WT_ID").equalTo(temp(1))).groupBy("Couple_time").count().select(max($"count")).show()
>
> val Table2=
> k.filter(k("WT_ID").equalTo(temp(2))).groupBy("Couple_time").count().select(max($"count")).show()
>
> val Table3=
> k.filter(k("WT_ID").equalTo(temp(3))).groupBy("Couple_time").count().select(max($"count")).show()
>
> val Table4=
> k.filter(k("WT_ID").equalTo(temp(4))).groupBy("Couple_time").count().select(max($"count")).show()
>
> val Table5=
> k.filter(k("WT_ID").equalTo(temp(5))).groupBy("Couple_time").count().select(max($"count")).show()
>
> val Table6=
> k.filter(k("WT_ID").equalTo(temp(6))).groupBy("Couple_time").count().select(max($"count")).show()
>
> val Table7=
> k.filter(k("WT_ID").equalTo(temp(7))).groupBy("Couple_time").count().select(max($"count")).show()
>
> val Table8=
> k.filter(k("WT_ID").equalTo(temp(8))).groupBy("Couple_time").count().select(max($"count")).show()
>
>
>
> val Table10=
> k.filter(k("WT_ID").equalTo(temp(10))).groupBy("Couple_time").count().select(max($"count")).show()
>
>
> val Table11=
> k.filter(k("WT_ID").equalTo(temp(11))).groupBy("Couple_time").count().select(max($"count")).show()
>
>
> and last one how can i compare the all this tables to find the maximum
> value.
>
>
>
>
> Thanks,
>
>
>

Mime
View raw message