spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Blomo <jim.bl...@gmail.com>
Subject Odd pyspark state
Date Fri, 13 Jun 2014 20:11:09 GMT
Hi all, I'm having much better luck stability-wise with the release,
congratulations!  However, I'm still running into problems with stages
hanging and I'm not sure where to look for more debugging info.
Advice appreciated.

I'm processing basically the same data set as previous emails, with
some additional data joined in.  Attached is a screenshot of the
application stages.  A couple questions:

1) Is it bad or unusual for a completed stage no to have all tasks succeeded?
2) Inversely, I seem to have a stage that has more succeeded than total tasks
3) Task 42 is in both Active and Completed sections
4) Several Tasks are listed twice in Completed Stages
3) That stage (41) seems hung... neither of the Active stages have any
running tasks. What is the next step to debug?

I looked at one of the logs on a worker machine and found this in the
stderr log:

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/home/hadoop/spark-1.0.0-bin-hadoop2/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/home/hadoop/spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop2.2.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
PySpark worker failed with exception:
Traceback (most recent call last):
  File "/home/hadoop/spark-1.0.0-bin-hadoop2/python/pyspark/worker.py",
line 73, in main
    command = pickleSer._read_with_length(infile)
  File "/home/hadoop/spark-1.0.0-bin-hadoop2/python/pyspark/serializers.py",
line 142, in _read_with_length
    length = read_int(stream)
  File "/home/hadoop/spark-1.0.0-bin-hadoop2/python/pyspark/serializers.py",
line 337, in read_int
    raise EOFError
EOFError


The last thing in stdout is:

2014-06-13 19:16:13,163 ERROR [pool-2-thread-2]
network.SendingConnection (Logging.scala:logError(95)) - Exception
while reading SendingConnection to
ConnectionManagerId(ip-10....us-west-2.compute.internal,56070)
java.nio.channels.ClosedChannelException
        at sun.nio.ch.SocketChannelImpl.ensureReadOpen(SocketChannelImpl.java:252)
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:295)
        at org.apache.spark.network.SendingConnection.read(Connection.scala:397)
        at org.apache.spark.network.ConnectionManager$$anon$6.run(ConnectionManager.scala:175)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
2014-06-13 19:16:13,164 INFO  [pool-2-thread-2]
network.ConnectionManager (Logging.scala:logInfo(58)) - Handling
connection error on connection to
ConnectionManagerId(ip-10-233-142-183.us-west-2.compute.internal,56070)
2014-06-13 19:16:13,164 INFO  [pool-2-thread-2]
network.ConnectionManager (Logging.scala:logInfo(58)) - Removing
SendingConnection to
ConnectionManagerId(ip-10-233-142-183.us-west-2.compute.internal,56070)
2014-06-13 19:16:13,164 INFO  [pool-2-thread-2]
network.ConnectionManager (Logging.scala:logInfo(58)) - Removing
SendingConnection to
ConnectionManagerId(ip-10-233-142-183.us-west-2.compute.internal,56070)
2014-06-13 19:18:18,362 INFO  [pool-2-thread-3]
network.ConnectionManager (Logging.scala:logInfo(58)) - Removing
SendingConnection to
ConnectionManagerId(ip-10-233-132-184.us-west-2.compute.internal,40506)
2014-06-13 19:18:18,362 INFO  [pool-2-thread-1]
network.ConnectionManager (Logging.scala:logInfo(58)) - Removing
ReceivingConnection to
ConnectionManagerId(ip-10-233-132-184.us-west-2.compute.internal,40506)
2014-06-13 19:18:18,363 ERROR [pool-2-thread-1]
network.ConnectionManager (Logging.scala:logError(74)) - Corresponding
SendingConnectionManagerId not found

Mime
View raw message