In the provided documentation, under the paragraph "Schema Merging", you can actually perform what you want this way:

1. Create a schema that read the raw json, line by line

2. Create another schema that read the json file and structure it in ("id", "ln", "fn"....)

3. Merge the two schemas and you'll get what you want.


On Thursday, December 29, 2016 7:18 PM, Richard Xin <> wrote:

thanks, I have seen this, but this doesn't cover my question.
What I need is read json and include raw json as part of my dataframe.

On Friday, December 30, 2016 10:23 AM, Annabel Melongo <> wrote:


Below documentation will show you how to create a sparkSession and how to programmatically load data:

On Thursday, December 29, 2016 5:16 PM, Richard Xin <> wrote:

Say I have following data in file:

java code snippet:
        final SparkConf sparkConf = new SparkConf().setMaster("local[2]").setAppName("json_test");
        JavaSparkContext ctx = new JavaSparkContext(sparkConf);
        HiveContext hc = new HiveContext(;
        DataFrame df ="files/json/example2.json");

what I need is a DataFrame with columns id, ln, fn, age as well as raw_json string

any advice on the best practice in java?