spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard <fifistorm...@gmail.com>
Subject Spark dataset to explode json string
Date Fri, 19 Jul 2019 20:47:48 GMT
let's say I use spark to migrate some data from Cassandra table to Oracle
table
Cassandra Table:
CREATE TABLE SOURCE(
id UUID PRIMARY KEY,
col1 text,
col2 text,
jsonCol text
);
example jsonCol value: {"foo": "val1", "bar", "val2"}

I am trying to extract fields from the json column while importing to
Oracle table
Destination (
id varchar2(50),
col1 varchar(128).
col2 varchar(128)
raw_json clob,
foo varchar2(256),
bar varchar2(256)
);

What I have done:
separate udf for foo and bar.
This approach works, but that also means I need to deserialize raw json to
json object twice, things getting worse if i want to extract many fields
from the json.
example:
df = df.withColumn("foo", getFoo.apply(col("jsonCol")))
     .withColumn("bar", getBar.apply(col("jsonCol")));
// getFoo and getBar are UserDefinedFunction

how do I parse raw json string only once and explode fields I need to
multiple columns into Oracle in spark?

Thanks,

Mime
View raw message