spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From satyajit vegesna <satyajit.apas...@gmail.com>
Subject Infer JSON schema in structured streaming Kafka.
Date Mon, 11 Dec 2017 02:28:58 GMT
Hi All,

I would like to infer JSON schema from a sample of data that i receive
from, Kafka Streams(specific topic), and i have to infer the schema as i am
going to receive random JSON string with different schema for each topic,
so i chose to go ahead with below steps,

a. readStream from Kafka(latest offset), from a single Kafka topic.
b. Some how to store the JSON string into val and infer the schema.
c. stop the stream.
d.Create new readStream(smallest offset) and use the above inferred schema
to process the JSON using spark provided JSON support, like from_json,
json_object and others and run my actuall business logic.

Now i am not sure how to be successful with step(b). Any help would be
appreciated.
And would also like to know if there is any better approach.

Regards,
Satyajit.

Mime
View raw message