spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Friedman <eric.d.fried...@gmail.com>
Subject Re: Lost executors
Date Thu, 24 Jul 2014 03:40:38 GMT
hi Andrew,

Thanks for your note.  Yes, I see a stack trace now.  It seems to be an
issue with python interpreting a function I wish to apply to an RDD.  The
stack trace is below.  The function is a simple factorial:

def f(n):
  if n == 1: return 1
  return n * f(n-1)

and I'm trying to use it like this:

tf = sc.textFile(...)
tf.map(lambda line: line and len(line)).map(f).collect()

I get the following error, which does not occur if I use a built-in
function, like math.sqrt

 TypeError: __import__() argument 1 must be string, not X#

stacktrace follows



WARN TaskSetManager: Loss was due to
org.apache.spark.api.python.PythonException

org.apache.spark.api.python.PythonException: Traceback (most recent call
last):

  File
"/hadoop/d11/yarn/nm/usercache/eric_d_friedman/filecache/26/spark-assembly-1.0.1-hadoop2.2.0.jar/pyspark/worker.py",
line 77, in main

    serializer.dump_stream(func(split_index, iterator), outfile)

  File
"/hadoop/d11/yarn/nm/usercache/eric_d_friedman/filecache/26/spark-assembly-1.0.1-hadoop2.2.0.jar/pyspark/serializers.py",
line 191, in dump_stream

    self.serializer.dump_stream(self._batched(iterator), stream)

  File
"/hadoop/d11/yarn/nm/usercache/eric_d_friedman/filecache/26/spark-assembly-1.0.1-hadoop2.2.0.jar/pyspark/serializers.py",
line 123, in dump_stream

    for obj in iterator:

  File
"/hadoop/d11/yarn/nm/usercache/eric_d_friedman/filecache/26/spark-assembly-1.0.1-hadoop2.2.0.jar/pyspark/serializers.py",
line 180, in _batched

    for item in iterator:

  File "<ipython-input-39-0f0dafaf1ed4>", line 2, in f

TypeError: __import__() argument 1 must be string, not X#



 at org.apache.spark.api.python.PythonRDD$$anon$1.read(PythonRDD.scala:115)

at org.apache.spark.api.python.PythonRDD$$anon$1.<init>(PythonRDD.scala:145)

at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:78)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)





On Wed, Jul 23, 2014 at 11:07 AM, Andrew Or <andrew@databricks.com> wrote:

> Hi Eric,
>
> Have you checked the executor logs? It is possible they died because of
> some exception, and the message you see is just a side effect.
>
> Andrew
>
>
> 2014-07-23 8:27 GMT-07:00 Eric Friedman <eric.d.friedman@gmail.com>:
>
> I'm using spark 1.0.1 on a quite large cluster, with gobs of memory, etc.
>>  Cluster resources are available to me via Yarn and I am seeing these
>> errors quite often.
>>
>> ERROR YarnClientClusterScheduler: Lost executor 63 on <host>: remote Akka
>> client disassociated
>>
>>
>> This is in an interactive shell session.  I don't know a lot about Yarn
>> plumbing and am wondering if there's some constraint in play -- executors
>> can't be idle for too long or they get cleared out.
>>
>>
>> Any insights here?
>>
>
>

Mime
View raw message