spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matei Zaharia <matei.zaha...@gmail.com>
Subject Re: spark 0.8
Date Thu, 17 Oct 2013 23:56:21 GMT
Koert, did you link your Spark job to the right version of HDFS as well? In Spark 0.8, you
have to add a Maven dependency on "hadoop-client" for your version of Hadoop. See http://spark.incubator.apache.org/docs/latest/quick-start.html#a-standalone-app-in-scala
for example.

Matei

On Oct 17, 2013, at 4:38 PM, Koert Kuipers <koert@tresata.com> wrote:

> i got the job a little further along by also setting this:
> System.setProperty("spark.closure.serializer", "org.apache.spark.serializer.KryoSerializer")
> 
> not sure why i need to... but anyhow, now my workers start and then they blow up on this:
> 
> 13/10/17 19:22:57 ERROR Executor: Uncaught exception in thread Thread[pool-5-thread-1,5,main]
> java.lang.NullPointerException
>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>     at java.lang.Thread.run(Thread.java:662)
> 
> 
> which is:
>  val metrics = attemptedTask.flatMap(t => t.metrics)
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Thu, Oct 17, 2013 at 7:30 PM, dachuan <hdc1112@gmail.com> wrote:
> thanks, Mark.
> 
> 
> On Thu, Oct 17, 2013 at 6:36 PM, Mark Hamstra <mark@clearstorydata.com> wrote:
> SNAPSHOTs are not fixed versions, but are floating names associated with whatever is
the most recent code.  So, Spark 0.8.0 is the current released version of Spark, which is
exactly the same today as it was yesterday, and will be the same thing forever.  Spark 0.8.1-SNAPSHOT
is whatever is currently in branch-0.8.  It changes every time new code is committed to that
branch (which should be just bug fixes and the few additional features that we wanted to get
into 0.8.0, but that didn't quite make it.)  Not too long from now there will be a release
of Spark 0.8.1, at which time the SNAPSHOT will got to 0.8.2 and 0.8.1 will be forever frozen.
 Meanwhile, the wild new development is taking place on the master branch, and whatever is
currently in that branch becomes 0.9.0-SNAPSHOT.  This could be quite different from day to
day, and there are no guarantees that things won't be broken in 0.9.0-SNAPSHOT.  Several months
from now there will be a release of Spark 0.9.0 (unless the decision is made to bump the version
to 1.0.0), at which point the SNAPSHOT goes to 0.9.1 and the whole process advances to the
next phase of development.
> 
> The short answer is that releases are stable, SNAPSHOTs are not, and SNAPSHOTs that aren't
on maintenance branches can break things.  You make your choice of which to use and pay the
consequences. 
> 
> 
> On Thu, Oct 17, 2013 at 3:18 PM, dachuan <hdc1112@gmail.com> wrote:
> yeah, I mean 0.9.0-SNAPSHOT. I use git clone and that's what I got.. what's the difference?
I mean SNAPSHOT and non-SNAPSHOT.
> 
> 
> On Thu, Oct 17, 2013 at 6:15 PM, Mark Hamstra <mark@clearstorydata.com> wrote:
> Of course, you mean 0.9.0-SNAPSHOT.  There is no Spark 0.9.0, and won't be for several
months.
> 
> 
> 
> On Thu, Oct 17, 2013 at 3:11 PM, dachuan <hdc1112@gmail.com> wrote:
> I'm sorry if this doesn't answer your question directly, but I have tried spark 0.9.0
and hdfs 1.0.4 just now, it works..
> 
> 
> On Thu, Oct 17, 2013 at 6:05 PM, Koert Kuipers <koert@tresata.com> wrote:
> after upgrading from spark 0.7 to spark 0.8 i can no longer access any files on HDFS.
> i see the error below. any ideas?
> 
> i am running spark standalone on a cluster that also has CDH4.3.0 and rebuild spark accordingly.
the jars in lib_managed look good to me.
> 
> i noticed similar errors in the mailing list but found no suggested solutions. 
> 
> thanks! koert
> 
> 
> 13/10/17 17:43:23 ERROR Executor: Exception in task ID 0
> java.io.EOFException
> 	at java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2703)
> 	at java.io.ObjectInputStream.readFully(ObjectInputStream.java:1008)
> 	at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:68)
> 	at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:106)
> 	at org.apache.hadoop.io.UTF8.readChars(UTF8.java:258)
> 	at org.apache.hadoop.io.UTF8.readString(UTF8.java:250)
> 	at org.apache.hadoop.mapred.FileSplit.readFields(FileSplit.java:87)
> 	at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:280)
> 	at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:75)
> 	at org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969)
> 	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1852)
> 	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1756)
> 	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326)
> 	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1950)
> 	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1874)
> 	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1756)
> 	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326)
> 	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
> 	at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:135)
> 	at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1795)
> 	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1754)
> 	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326)
> 	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
> 	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:39)
> 	at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:61)
> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:153)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> 	at java.lang.Thread.run(Thread.java:662)
> 
> 
> 
> -- 
> Dachuan Huang
> Cellphone: 614-390-7234
> 2015 Neil Avenue
> Ohio State University
> Columbus, Ohio
> U.S.A.
> 43210
> 
> 
> 
> 
> -- 
> Dachuan Huang
> Cellphone: 614-390-7234
> 2015 Neil Avenue
> Ohio State University
> Columbus, Ohio
> U.S.A.
> 43210
> 
> 
> 
> 
> -- 
> Dachuan Huang
> Cellphone: 614-390-7234
> 2015 Neil Avenue
> Ohio State University
> Columbus, Ohio
> U.S.A.
> 43210
> 


Mime
View raw message