spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: filling missing values in a sequence
Date Mon, 19 Sep 2016 05:42:51 GMT
I am not sure what you try to achieve here. Can you please tell us what the goal of the program
is. Maybe with some example data?

Besides this, I have the feeling that it will fail once it is not used in a single node scenario
due to the reference to the global counter variable.

Also unclear why you collect the data first to parallelize it again. 

> On 18 Sep 2016, at 14:26, sudhindra <smagadi@gmail.com> wrote:
> 
> Hi i have coded something like this , pls tell me how bad it is .
> 
> package Spark.spark;
> import java.util.List;
> import java.util.function.Function;
> 
> import org.apache.spark.SparkConf;
> import org.apache.spark.SparkContext;
> import org.apache.spark.api.java.JavaRDD;
> import org.apache.spark.api.java.JavaSparkContext;
> import org.apache.spark.sql.DataFrame;
> import org.apache.spark.sql.Dataset;
> import org.apache.spark.sql.Row;
> import org.apache.spark.sql.SQLContext;
> 
> 
> 
> public class App 
> {
>    static long counter=1;
>    public static void main( String[] args )
>    {
>        
>        
>        
>        SparkConf conf = new
> SparkConf().setAppName("sorter").setMaster("local[2]").set("spark.executor.memory","1g");
>        JavaSparkContext sc = new JavaSparkContext(conf);
>        
>        SQLContext sqlContext = new org.apache.spark.sql.SQLContext(sc);
>        
>        DataFrame df = sqlContext.read().json("path");
>        DataFrame sortedDF = df.sort("id");
>        //df.show();
>        //sortedDF.printSchema();
> 
>        System.out.println(sortedDF.collectAsList().toString());
>        JavaRDD<Row> distData = sc.parallelize(sortedDF.collectAsList());
>        
>        
>     List<String >missingNumbers=distData.map(new
> org.apache.spark.api.java.function.Function<Row, String>() {
>           
> 
>            public String call(Row arg0) throws Exception {
>                // TODO Auto-generated method stub
>                
>                
>                if(counter!=new Integer(arg0.getString(0)).intValue())
>                {
>                    StringBuffer misses = new StringBuffer();
>                    long newCounter=counter;
>                    while(newCounter!=new Integer(arg0.getString(0)).intValue())
>                    {
>                        misses.append(new String(new Integer((int) counter).toString())
);
>                        newCounter++;
>                        
>                    }
>                    counter=new Integer(arg0.getString(0)).intValue()+1;
>                    return misses.toString();
>                    
>                }
>                counter++;
>                return null;
>                
>            
>                
>            }
>        }).collect();
>        
>        
>        
>        for (String name: missingNumbers) {
>              System.out.println(name);
>            }
>        
>       
>        
>    }
> }
> 
> 
> 
> 
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/filling-missing-values-in-a-sequence-tp5708p27748.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> 

Mime
View raw message