spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yehuda Finkelstein <yeh...@veracity-group.com>
Subject get corrupted rows using columnNameOfCorruptRecord
Date Tue, 06 Dec 2016 14:31:51 GMT
Hi all



I’m trying to parse json using existing schema and got rows with NULL’s

//get schema

val df_schema = spark.sqlContext.sql("select c1,c2,…cn t1  limit 1")

//read json file

val f = sc.textFile("/tmp/x")

//load json into data frame using schema

var df =
spark.sqlContext.read.option("columnNameOfCorruptRecord","xxx").option("mode","PERMISSIVE").schema(df_schema.schema).json(f)



in documentation it say that you can query the corrupted rows by this
columns à columnNameOfCorruptRecord

o    “columnNameOfCorruptRecord (default is the value specified in
spark.sql.columnNameOfCorruptRecord): allows renaming the new field having
malformed string created by PERMISSIVE mode. This overrides
spark.sql.columnNameOfCorruptRecord.”



The question is how to fetch those corrupted rows ?





Thanks

Yehuda

Mime
View raw message