spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bedrytski Aliaksandr <sp...@bedryt.ski>
Subject Re: How to acess the WrappedArray
Date Mon, 29 Aug 2016 11:28:16 GMT
Hi,

It depends on how you see "elements from the WrappedArray" represented.
Is it a List[Any] or you need a special case class for each line? Or you
want to create a DataFrame that will hold the type for each column?

Will the json file always be < 100mb so that you can pre-treat it with a
*sed* command?
If it's the case I would recommend to transform this file into a csv (as
it is a more structured type of file) using bash tools and then read it
with spark while casting column types to the ones that are expected (or
leave the inferred types if they are sufficient enough).

Or (if the file is expected to be larger than bash tools can handle) you
could iterate over the resulting WrappedArray and create a case class
for each line.

PS: I wonder where the *meta* object from the json goes.

--
  Bedrytski Aliaksandr
  spark@bedryt.ski



On Mon, Aug 29, 2016, at 11:27, Sreeharsha wrote:
> Here is the snippet of code :
>
> //The entry point into all functionality in Spark is the SparkSession
> class. To create a basic SparkSession, just use
> SparkSession.builder():
> SparkSession spark = SparkSession.builder().appName("Java Spark SQL
> Example").master("local").getOrCreate();
> //With a SparkSession, applications can create DataFrames from an
> existing RDD, from a Hive table, or from Spark data sources.
> Dataset<Row> rows_salaries =
> spark.read().json("/Users/sreeharsha/Downloads/rows_salaries.json");
> // Register the DataFrame as a SQL temporary view
> rows_salaries.createOrReplaceTempView("salaries");
> // SQL statements can be run by using the sql methods provided
> by spark
> List<Row> df = spark.sql("select * from salaries").collectAsList();
> for(Row r:df){
>                         if(r.get(0)!=null)
>                        System.out.println(r.get(0).toString());
>                     }
>
>
> Actaul Output :
> WrappedArray(WrappedArray(1, B9B42DE1-E810-4489-9735-B365A47A4012, 1,
> 1467358044, 697390, 1467358044, 697390, null, Aaron,Patricia G,
> Facilities/Office Services II, A03031, OED-Employment Dev (031), 1979-10-
> 24T00:00:00, 56705.00, 54135.44))
> Expecting Output:
> Need elements from the WrappedArray
> Below you can find the attachment of .json file
>
>   *rows_salaries.json* (4M) Download Attachment[1]
>
> View this message in context:How to acess the WrappedArray[2]
>  Sent from the Apache Spark User List mailing list archive[3] at
>  Nabble.com.


Links:

  1. http://apache-spark-user-list.1001560.n3.nabble.com/attachment/27615/0/rows_salaries.json
  2. http://apache-spark-user-list.1001560.n3.nabble.com/How-to-acess-the-WrappedArray-tp27615.html
  3. http://apache-spark-user-list.1001560.n3.nabble.com/

Mime
View raw message