spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From fanooos <dev.fano...@gmail.com>
Subject org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException
Date Tue, 17 Mar 2015 14:25:47 GMT
I have a hadoop cluster and I need to query the data stored on the HDFS using
spark sql thrift server. 

Spark sql thrift server is up and running. It is configured to read from
HIVE table. The hive table is an external table that corresponding to set of
files stored on HDFS. These files contains JSON data. 

I am connecting to spark sql thrift server using beeline. When I try to
execute a simple query like *select * from mytable limit 3* every thing
works fine.


But when  I try to execute other queries like *select count(*) from mytable*
the following exceptions is thrown

*org.apache.hadoop.hive.serde2.SerDeException:
org.codehaus.jackson.JsonParseException: Unrecognized character escape ' '
(code 32) at [Source: java.io.StringReader@34ef429a; line: 1, column: 351]*


What I understand from the exception is that there are some files contains
corrupted JSON. 


question 1 : am I understand this correctly? 
question 2 : How can I find the file(s) causes this problem if I have about
3 thousand files and each file contains about 700 line of json data ?
question 3 : If I am sure that the json in the files on HDFS contains valid
json data, what should I do ? 







--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/org-apache-hadoop-hive-serde2-SerDeException-org-codehaus-jackson-JsonParseException-tp22103.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message