spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yehuda Finkelstein <>
Subject get corrupted rows using columnNameOfCorruptRecord
Date Tue, 06 Dec 2016 14:31:51 GMT
Hi all

I’m trying to parse json using existing schema and got rows with NULL’s

//get schema

val df_schema = spark.sqlContext.sql("select c1,c2,…cn t1  limit 1")

//read json file

val f = sc.textFile("/tmp/x")

//load json into data frame using schema

var df ="columnNameOfCorruptRecord","xxx").option("mode","PERMISSIVE").schema(df_schema.schema).json(f)

in documentation it say that you can query the corrupted rows by this
columns à columnNameOfCorruptRecord

o    “columnNameOfCorruptRecord (default is the value specified in
spark.sql.columnNameOfCorruptRecord): allows renaming the new field having
malformed string created by PERMISSIVE mode. This overrides

The question is how to fetch those corrupted rows ?



View raw message