spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ayoub <>
Subject Re: SQL query over (Long, JSON string) tuples
Date Thu, 29 Jan 2015 09:23:19 GMT

SQLContext and hiveContext have a "jsonRDD" method which accept an
RDD[String] where the string is a JSON String a returns a SchemaRDD, it
extends RDD[Row] which the type you want.

After words you should be able to do a join to keep your tuple.


2015-01-29 10:12 GMT+01:00 Tobias Pfeiffer <>:

> Hi,
> I have data as RDD[(Long, String)], where the Long is a timestamp and the
> String is a JSON-encoded string. I want to infer the schema of the JSON and
> then do a SQL statement on the data (no aggregates, just column selection
> and UDF application), but still have the timestamp associated with each row
> of the result. I completely fail to see how that would be possible. Any
> suggestions?
> I can't even see how I would get an RDD[(Long, Row)] so that I *might* be
> able to add the timestamp to the row after schema inference. Is there *any*
> way other than string-manipulating the JSON string and adding the timestamp
> to it?
> Thanks
> Tobias

View this message in context:
Sent from the Apache Spark User List mailing list archive at
View raw message