spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhishek <smartsho...@gmail.com>
Subject Re: Skip Corrupted Parquet blocks / footer.
Date Sun, 01 Jan 2017 19:16:06 GMT
You will have to change the metadata file under _spark_metadata folder to remove the listing
of corrupt files.

Thanks,
Shobhit G 

Sent from my iPhone

> On Dec 31, 2016, at 8:11 PM, khyati [via Apache Spark Developers List] <ml-node+s1001551n20418h77@n3.nabble.com>
wrote:
> 
> Hi, 
> 
> I am trying to read the multiple parquet files in sparksql. In one dir there are two
files, of which one is corrupted. While trying to read these files, sparksql throws Exception
for the corrupted file. 
> 
> val newDataDF = sqlContext.read.parquet("/data/testdir/data1.parquet","/data/testdir/corruptblock.0")

> newDataDF.show 
> 
> throws Exception. 
> 
> Is there any way to just skip the file having corrupted block/footer and just read the
file/files which are proper? 
> 
> Thanks 
> 
> If you reply to this email, your message will be added to the discussion below:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Skip-Corrupted-Parquet-blocks-footer-tp20418.html
> To start a new topic under Apache Spark Developers List, email ml-node+s1001551n1h50@n3.nabble.com

> To unsubscribe from Apache Spark Developers List, click here.
> NAML




-----
Regards, 
Abhi
--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Skip-Corrupted-Parquet-blocks-footer-tp20418p20420.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Mime
View raw message