spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicholas Chammas <nicholas.cham...@gmail.com>
Subject Re: Using Spark to analyze complex JSON
Date Thu, 22 May 2014 03:43:13 GMT
That's a good idea. So you're saying create a SchemaRDD by applying a
function that deserializes the JSON and transforms it into a relational
structure, right?

The end goal for my team would be to expose some JDBC endpoint for analysts
to query from, so once Shark is updated to use Spark SQL that would become
possible without having to resort to using Hive at all.


On Wed, May 21, 2014 at 11:11 PM, Tobias Pfeiffer <tgp@preferred.jp> wrote:

> Hi,
>
> as far as I understand, if you create an RDD with a relational
> structure from your JSON, you should be able to do much of that
> already today. For example, take lift-json's deserializer and do
> something like
>
>   val json_table: RDD[MyCaseClass] = json_data.flatMap(json =>
> json.extractOpt[MyCaseClass])
>
> then I guess you can use Spark SQL on that. (Something like your
> likes[2] query won't work, though, I guess.)
>
> Regards
> Tobias
>
>
> On Thu, May 22, 2014 at 5:32 AM, Nicholas Chammas
> <nicholas.chammas@gmail.com> wrote:
> > Looking forward to that update!
> >
> > Given a table of JSON objects like this one:
> >
> > {
> >    "name": "Nick",
> >    "location": {
> >       "x": 241.6,
> >       "y": -22.5
> >    },
> >    "likes": ["ice cream", "dogs", "Vanilla Ice"]
> > }
> >
> > It would be SUPER COOL if we could query that table in a way that is as
> > natural as follows:
> >
> > SELECT DISTINCT name
> > FROM json_table;
> >
> > SELECT MAX(location.x)
> > FROM json_table;
> >
> > SELECT likes[2] -- Ice Ice Baby
> > FROM json_table
> > WHERE name = "Nick";
> >
> > Of course, this is just a hand-wavy suggestion of how I’d like to be
> able to
> > query JSON (particularly that last example) using SQL. I’m interested in
> > seeing what y’all come up with.
> >
> > A large part of what my team does is make it easy for analysts to explore
> > and query JSON data using SQL. We have a fairly complex home-grown
> process
> > to do that and are looking to replace it with something more out of the
> box.
> > So if you’d like more input on how users might use this feature, I’d be
> glad
> > to chime in.
> >
> > Nick
> >
> >
> >
> > On Wed, May 21, 2014 at 11:21 AM, Michael Armbrust <
> michael@databricks.com>
> > wrote:
> >>
> >> You can already extract fields from json data using Hive UDFs.  We have
> an
> >> intern working on on better native support this summer.  We will be
> sure to
> >> post updates once there is a working prototype.
> >>
> >> Michael
> >>
> >>
> >> On Tue, May 20, 2014 at 6:46 PM, Nick Chammas <
> nicholas.chammas@gmail.com>
> >> wrote:
> >>>
> >>> The Apache Drill home page has an interesting heading: "Liberate Nested
> >>> Data".
> >>>
> >>> Is there any current or planned functionality in Spark SQL or Shark to
> >>> enable SQL-like querying of complex JSON?
> >>>
> >>> Nick
> >>>
> >>>
> >>> ________________________________
> >>> View this message in context: Using Spark to analyze complex JSON
> >>> Sent from the Apache Spark User List mailing list archive at
> Nabble.com.
> >>
> >>
> >
>

Mime
View raw message