spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wang, Daoyuan" <daoyuan.w...@intel.com>
Subject RE: MatchError in JsonRDD.toLong
Date Fri, 16 Jan 2015 09:14:15 GMT
The second parameter of jsonRDD is the sampling ratio when we infer schema.

Thanks,
Daoyuan

From: Tobias Pfeiffer [mailto:tgp@preferred.jp]
Sent: Friday, January 16, 2015 5:11 PM
To: Wang, Daoyuan
Cc: user
Subject: Re: MatchError in JsonRDD.toLong

Hi,

On Fri, Jan 16, 2015 at 5:55 PM, Wang, Daoyuan <daoyuan.wang@intel.com<mailto:daoyuan.wang@intel.com>>
wrote:
Can you provide how you create the JsonRDD?

This should be reproducible in the Spark shell:

---------------------------------------------------------
import org.apache.spark.sql._
val sqlc = new SparkContext(sc)
val rdd = sc.parallelize("""{"Click":"nonclicked", "Impression":1, "DisplayURL":4401798909506983219,
"AdId":21215341}""" ::
                         """{"Click":"nonclicked", "Impression":1, "DisplayURL":14452800566866169008,
"AdId":10587781}""" :: Nil)

// works fine
val json = sqlc.jsonRDD(rdd)
json.registerTempTable("test")
sqlc.sql("SELECT * FROM test").collect

// -> MatchError
val json2 = sqlc.jsonRDD(rdd, 0.1)
json2.registerTempTable("test2")
sqlc.sql("SELECT * FROM test2").collect
---------------------------------------------------------

I guess the issue in the latter case is that the column is inferred as Long when some rows
actually are too big for Long...

Thanks
Tobias

Mime
View raw message