spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ayoub <benali.ayoub.i...@gmail.com>
Subject Re: SQL query over (Long, JSON string) tuples
Date Thu, 29 Jan 2015 09:23:19 GMT
Hello,

SQLContext and hiveContext have a "jsonRDD" method which accept an
RDD[String] where the string is a JSON String a returns a SchemaRDD, it
extends RDD[Row] which the type you want.

After words you should be able to do a join to keep your tuple.

Best,
Ayoub.

2015-01-29 10:12 GMT+01:00 Tobias Pfeiffer <tgp@preferred.jp>:

> Hi,
>
> I have data as RDD[(Long, String)], where the Long is a timestamp and the
> String is a JSON-encoded string. I want to infer the schema of the JSON and
> then do a SQL statement on the data (no aggregates, just column selection
> and UDF application), but still have the timestamp associated with each row
> of the result. I completely fail to see how that would be possible. Any
> suggestions?
>
> I can't even see how I would get an RDD[(Long, Row)] so that I *might* be
> able to add the timestamp to the row after schema inference. Is there *any*
> way other than string-manipulating the JSON string and adding the timestamp
> to it?
>
> Thanks
> Tobias
>




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Re-SQL-query-over-Long-JSON-string-tuples-tp21419.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Mime
View raw message