spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Davidson <ilike...@gmail.com>
Subject Re: jsonFile function in SQLContext does not work
Date Wed, 25 Jun 2014 19:08:15 GMT
Is it possible you have blank lines in your input? Not that this should be
an error condition, but it may be what's causing it.


On Wed, Jun 25, 2014 at 11:57 AM, durin <mail@simon-schaefer.net> wrote:

> Hi Zongheng Yang,
>
> thanks for your response. Reading your answer, I did some more tests and
> realized that analyzing very small parts of the dataset (which is ~130GB in
> ~4.3M lines) works fine.
> The error occurs when I analyze larger parts. Using 5% of the whole data,
> the error is the same as posted before for certain TIDs. However, I get the
> structure determined so far as a result when using 5%.
>
> The Spark WebUI shows the following:
>
> Job aborted due to stage failure: Task 6.0:11 failed 4 times, most recent
> failure: Exception failure in TID 108 on host foo.bar.com:
> com.fasterxml.jackson.databind.JsonMappingException: No content to map due
> to end-of-input at [Source: java.io.StringReader@3697781f; line: 1,
> column:
> 1]
>
> com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:164)
>
> com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:3029)
>
> com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:2971)
>
> com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2091)
>
> org.apache.spark.sql.json.JsonRDD$$anonfun$parseJson$1$$anonfun$apply$5.apply(JsonRDD.scala:261)
>
> org.apache.spark.sql.json.JsonRDD$$anonfun$parseJson$1$$anonfun$apply$5.apply(JsonRDD.scala:261)
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> scala.collection.Iterator$class.foreach(Iterator.scala:727)
> scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>
> scala.collection.TraversableOnce$class.reduceLeft(TraversableOnce.scala:172)
> scala.collection.AbstractIterator.reduceLeft(Iterator.scala:1157)
> org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:823)
> org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:821)
> org.apache.spark.SparkContext$$anonfun$24.apply(SparkContext.scala:1132)
> org.apache.spark.SparkContext$$anonfun$24.apply(SparkContext.scala:1132)
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:112)
> org.apache.spark.scheduler.Task.run(Task.scala:51)
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> java.lang.Thread.run(Thread.java:662) Driver stacktrace:
>
>
>
> Is the only possible reason that some of these 4.3 Million JSON-Objects are
> not valid JSON, or could there be another explanation?
> And if it is the reason, is there some way to tell the function to just
> skip
> faulty lines?
>
>
> Thanks,
> Durin
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/jsonFile-function-in-SQLContext-does-not-work-tp8273p8278.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>

Mime
View raw message