spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mahender Sarangam <mahender.bigd...@outlook.com>
Subject Re: Unable to read multiple JSON.Gz File.
Date Thu, 18 Oct 2018 08:30:06 GMT
Hi Jyoti,

We are using HDInsight Spark 2.2 . Is there any setting differences for latest version of
cluster


/mahender



On 10/2/2018 1:48 PM, Jyoti Ranjan Mahapatra wrote:
Hi Mahendar,
Which version of spark and Hadoop are you using?
I tried it on spark2.3.1 with Hadoop 2.7.3 and it works for a folder containing multiple gz
files.


From: Mahender Sarangam <mahender.bigdata@outlook.com><mailto:mahender.bigdata@outlook.com>
Sent: Monday, October 1, 2018 2:00 AM
To: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Unable to read multiple JSON.Gz File.



I’m trying to read multiple .json.gz files from a Blob storage path using the below scala
code. But I’m unable to read the data from the files or print the schema. If the files are
not compressed as .gz then we are able to read all the files into the Dataframe.
I’ve even tried giving *.gz but no luck.
 val df = spark.read.json("wasb://XYZ@AzureStorage.blob.core.windows.net/sourcePath/"<mailto:wasb://XYZ@AzureStorage.blob.core.windows.net/sourcePath/>)
Mime
View raw message