spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Xin <>
Subject Re: apache-spark: Converting List of Rows into Dataset Java
Date Wed, 29 Mar 2017 02:17:14 GMT
Maybe you could try something like that:        SparkSession sparkSession = SparkSession
        List<Row> results = new LinkedList<Row>();
        JavaRDD<Row> jsonRDD =
                new JavaSparkContext(sparkSession.sparkContext()).parallelize(results);
        Dataset<Row> peopleDF = sparkSession.createDataFrame(jsonRDD, Row.class);

Richard Xin 

    On Tuesday, March 28, 2017 7:51 AM, Karin Valisova <> wrote:

I am running Spark on Java and bumped into a problem I can't solve or find anything helpful
among answered questions, so I would really appreciate your help. 
I am running some calculations, creating rows for each result:
List<Row> results = new LinkedList<Row>();

for(something){ results.add(RowFactory.create( someStringVariable, someIntegerVariable )); 
Now I ended up with a list of rows I need to turn into dataframe to perform some spark sql
operations on them, like groupings and sorting. Would like to keep the dataTypes.
I tried: 
Dataset<Row> toShow = spark.createDataFrame(results, Row.class);

but it throws nullpointer. (spark being SparkSession) Is my logic wrong there somewhere, should
this operation be possible, resulting in what I want? Or do I have to create a custom class
which extends serializable and create a list of those objects rather than Rows? Will I be
able to perform SQL queries on dataset consisting of custom class objects rather than rows?
I'm sorry if this is a duplicate question.Thank you for your help!Karin 

View raw message