spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Junfeng Chen <>
Subject Reading kafka and save to parquet problem
Date Thu, 08 Mar 2018 01:33:47 GMT
I am struggling in trying to read data in kafka and save them to parquet
file on hdfs by using spark streaming according to this post

My code is similar to  following

val df = spark
  .option("kafka.bootstrap.servers", "host1:port1,host2:port2")
  .option("subscribe", "topic1")
df.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
  .as[(String, String)]


What the difference is I am writing in Java language.

But in practice, this code just run once and then exit gracefully. Although
it produces the parquet file successfully and no any exception is threw out
, it runs like a normal spark program rather than a spark streaming program.

What should I do if want to read kafka and save them to parquet in batch?

Junfeng Chen

View raw message