spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sameer Farooqui <same...@databricks.com>
Subject Re: pyspark is crashing in this case. why?
Date Sun, 14 Dec 2014 19:11:22 GMT
How much executor-memory are you setting for the JVM? What about the Driver
JVM memory?

Also check the Windows Event Log for Out of memory errors for one of the 2
above JVMs.
On Dec 14, 2014 6:04 AM, "genesis fatum" <genesis.fatum@gmail.com> wrote:

> Hi,
>
> My environment is: standalone spark 1.1.1 on windows 8.1 pro.
>
> The following case works fine:
> >>> a = [1,2,3,4,5,6,7,8,9]
> >>> b = []
> >>> for x in range(100000):
> ...  b.append(a)
> ...
> >>> rdd1 = sc.parallelize(b)
> >>> rdd1.first()
> >>>[1, 2, 3, 4, 5, 6, 7, 8, 9]
>
> The following case does not work. The only difference is the size of the
> array. Note the loop range: 100K vs. 1M.
> >>> a = [1,2,3,4,5,6,7,8,9]
> >>> b = []
> >>> for x in range(1000000):
> ...  b.append(a)
> ...
> >>> rdd1 = sc.parallelize(b)
> >>> rdd1.first()
> >>>
> 14/12/14 07:52:19 ERROR PythonRDD: Python worker exited unexpectedly
> (crashed)
> java.net.SocketException: Connection reset by peer: socket write error
>         at java.net.SocketOutputStream.socketWrite0(Native Method)
>         at java.net.SocketOutputStream.socketWrite(Unknown Source)
>         at java.net.SocketOutputStream.write(Unknown Source)
>         at java.io.BufferedOutputStream.flushBuffer(Unknown Source)
>         at java.io.BufferedOutputStream.write(Unknown Source)
>         at java.io.DataOutputStream.write(Unknown Source)
>         at java.io.FilterOutputStream.write(Unknown Source)
>         at
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$
> 1.apply(PythonRDD.scala:341)
>         at
> org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$
> 1.apply(PythonRDD.scala:339)
>         at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>         at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>         at
> org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRD
> D.scala:339)
>         at
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.app
> ly$mcV$sp(PythonRDD.scala:209)
>         at
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.app
> ly(PythonRDD.scala:184)
>         at
> org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.app
> ly(PythonRDD.scala:184)
>         at
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1364)
>         at
> org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scal
> a:183)
>
> What I have tried:
> 1. Replaced JRE 32bit with JRE64
> 2. Multiple configurations when I start pyspark: --driver-memory,
> --executor-memory
> 3. Tried to set the SparkConf with different settings
> 4. Tried also with spark 1.1.0
>
> Being new to Spark, I am sure that it is something simple that I am missing
> and would appreciate any thoughts.
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-is-crashing-in-this-case-why-tp20675.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message