spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colin Williams <>
Subject inferred schemas for spark streaming from a Kafka source
Date Tue, 13 Nov 2018 20:32:21 GMT
Does anybody know how to use inferred schemas with structured

I have some code like :

object StreamingApp {

  def launch(config: Config, spark: SparkSession): Unit = {
    import spark.implicits._

    val schemaJson = spark.sparkContext.parallelize(List(config.schema))
    val schemaDF =

    // read text from kafka
    val df = spark
      .option("startingOffsets", "earliest")

    spark.sql("set spark.sql.streaming.schemaInference=true")

    val jsonOptions = Map[String,String]("mode" -> "FAILFAST")

    val org_store_event_df =
      from_json(col("value").cast("string"), schemaDF.schema,

I'd like to compare an inferred schema against my provided, to
determine what I'm missing from my provided scheme or why I arrive
with all nulls in my values column.

currently I'm using a schema to read from a json file. But I'd like to
infer the schema from the stream as suggested by the docs. Then not
sure how to replace from_json so that the value column is read using
an inferred schema, or otherwise.

Maybe it's not supported for kafka streams and only for file streams?
If this is the case then why the have different implementations?

Also shouldn't we make the documentation more clear?

To unsubscribe e-mail:

View raw message