spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tobias Pfeiffer <...@preferred.jp>
Subject SQL query over (Long, JSON string) tuples
Date Thu, 29 Jan 2015 09:12:06 GMT
Hi,

I have data as RDD[(Long, String)], where the Long is a timestamp and the
String is a JSON-encoded string. I want to infer the schema of the JSON and
then do a SQL statement on the data (no aggregates, just column selection
and UDF application), but still have the timestamp associated with each row
of the result. I completely fail to see how that would be possible. Any
suggestions?

I can't even see how I would get an RDD[(Long, Row)] so that I *might* be
able to add the timestamp to the row after schema inference. Is there *any*
way other than string-manipulating the JSON string and adding the timestamp
to it?

Thanks
Tobias

Mime
View raw message