spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Annabel Melongo <melongo_anna...@yahoo.com.INVALID>
Subject Re: DataFrame to read json and include raw Json in DataFrame
Date Fri, 30 Dec 2016 05:21:58 GMT
Richard,
In the provided documentation, under the paragraph "Schema Merging", you can actually perform
what you want this way:
1. Create a schema that read the raw json, line by line
2. Create another schema that read the json file and structure it in ("id", "ln", "fn"....)
3. Merge the two schemas and you'll get what you want.
Thanks 

    On Thursday, December 29, 2016 7:18 PM, Richard Xin <richardxin168@yahoo.com> wrote:
 

 thanks, I have seen this, but this doesn't cover my question.
What I need is read json and include raw json as part of my dataframe. 

    On Friday, December 30, 2016 10:23 AM, Annabel Melongo <melongo_annabel@yahoo.com.INVALID>
wrote:
 

 Richard,
Below documentation will show you how to create a sparkSession and how to programmatically
load data:
Spark SQL and DataFrames - Spark 2.1.0 Documentation

  
|  
|   |  
Spark SQL and DataFrames - Spark 2.1.0 Documentation
   |  |

  |

 
 

    On Thursday, December 29, 2016 5:16 PM, Richard Xin <richardxin168@yahoo.com.INVALID>
wrote:
 

 Say I have following data in file:{"id":1234,"ln":"Doe","fn":"John","age":25}
{"id":1235,"ln":"Doe","fn":"Jane","age":22}
java code snippet:        final SparkConf sparkConf = new SparkConf().setMaster("local[2]").setAppName("json_test");
        JavaSparkContext ctx = new JavaSparkContext(sparkConf);
        HiveContext hc = new HiveContext(ctx.sc());
        DataFrame df = hc.read().json("files/json/example2.json");

what I need is a DataFrame with columns id, ln, fn, age as well as raw_json string
any advice on the best practice in java?Thanks,
Richard


   

   

   
Mime
View raw message