spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay <jayadeep.jayara...@gmail.com>
Subject Re: spark partitionBy with partitioned column in json output
Date Tue, 05 Jun 2018 02:44:22 GMT
The partitionBy clause is used to create hive folders so that you can point
a hive partitioned table on the data .

What are you using the partitionBy for ? What is the use case ?

On Mon 4 Jun, 2018, 4:59 PM purna pradeep, <purna2pradeep@gmail.com> wrote:

> im reading below json in spark
>
>     {"bucket": "B01", "actionType": "A1", "preaction": "NULL",
> "postaction": "NULL"}
>     {"bucket": "B02", "actionType": "A2", "preaction": "NULL",
> "postaction": "NULL"}
>     {"bucket": "B03", "actionType": "A3", "preaction": "NULL",
> "postaction": "NULL"}
>
>     val df=spark.read.json("actions.json").toDF()
>
> Now im writing the same to a json output as below
>
>     df.write. format("json"). mode("append").
> partitionBy("bucket","actionType"). save("output.json")
>
>
> and the output.json is as below
>
>     {"preaction":"NULL","postaction":"NULL"}
>
> bucket,actionType columns are missing in the json output, i need
> partitionby columns as well in the output
>
>

Mime
View raw message