spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tobias Pfeiffer <...@preferred.jp>
Subject Re: json parsing with json4s
Date Thu, 12 Jun 2014 07:22:15 GMT
Hi,

I usually use pattern matching for that, like
json \ "key" match { case JInt(i) => i; case _ => 0 /* default value */ }

Tobias

On Thu, Jun 12, 2014 at 7:39 AM, Michael Cutler <michael@tumra.com> wrote:
> Hello,
>
> You're absolutely right, the syntax you're using is returning the json4s
> value objects, not native types like Int, Long etc. fix that problem and
> then everything else (filters) will work as you expect.  This is a short
> snippet of a larger example: [1]
>
>
>     val lines = sc.textFile("likes.json")
>     val user_interest = lines.map(line => {
>       // Parse the JSON, returns RDD[JValue]
>       parse(line)
>     }).map(json => {
>       // Extract the values we need to populate the UserInterest class
>       implicit lazy val formats = org.json4s.DefaultFormats
>       val name = (json \ "name").extract[String]
>       val location_x = (json \ "location" \ "x").extract[Double]
>       val location_y = (json \ "location" \ "y").extract[Double]
>       val likes = (json \
> "likes").extract[Seq[String]].map(_.toLowerCase()).mkString(";")
>       ( UserInterest(name, location_x, location_y, likes) )
>     })
>
>
> The key parts are "implicit lazy val formats = org.json4s.DefaultFormats"
> being defined before you mess with the JSON and "(json \ "location" \
> "x").extract[Double]" to extract the parts you need.
>
> One thing to be wary of is if you're JSON is not consistent, i.e. fields not
> always being set -- then using the "extract[Double]" method will raise
> exceptions.  Then you may wish to use an alternate way to pull out the
> values as a String and process them yourself. e.g.
>
> val id = compact(render(json \ "facebook" \ "id"))
>
> Good luck playing with JSON and Spark!  :o)
>
> Best,
>
> MC
>
>
> [1] UserInterestsExample.scala
> https://gist.github.com/cotdp/b471cfff183b59d65ae1
>
>
>
>
>
> On 11 June 2014 23:26, SK <skrishna.id@gmail.com> wrote:
>>
>> I have the following piece of code that parses a json file and extracts
>> the
>> age and TypeID
>>
>> val p = sc.textFile(log_file)
>>                    .map(line => { parse(line) })
>>                    .map(json =>
>>                       {  val v1 = json \ "person" \ "age"
>>                          val v2 = json \ "Action" \ "Content" \ "TypeID"
>>                          (v1, v2)
>>                       }
>>                     )
>>
>> p.foreach(r => println(r))
>>
>> The result is:
>>
>> (JInt(12),JInt(5))
>> (JInt(32),JInt(6))
>> (JInt(40),JInt(7))
>>
>> 1) How can I extract the values (i.e. without the JInt) ? I tried
>> returning
>> (v1.toInt, v2.toInt) from the map but got a compilation error stating that
>> toInt is not a valid operation.
>>
>> 2) I would also like to know how  I can filter the above tuples based on
>> the
>> age values. For e.g. I added the following after the second map operation:
>>
>>   p.filter(tup => tup._1 > 20)
>>
>> I got a compilation errror: value > is not a member of org.json4s.JValue
>>
>> Thanks for your help.
>>
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/json-parsing-with-json4s-tp7430.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
>

Mime
View raw message