spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lalwani, Jayesh" <jlalw...@amazon.com.INVALID>
Subject Re: Multiple applications being spawned
Date Tue, 13 Oct 2020 17:10:49 GMT
Where are you running your Spark cluster? Can you post the command line that you are using
to run your application?

Spark is designed to process a lot of data by distributing work to a cluster of a machines.
When you submit a job, it starts executor processes on the cluster. So, what you are seeing
is somewhat expected, (although 25 processes on a single node seem too high)

From: Sachit Murarka <connectsachit@gmail.com>
Date: Tuesday, October 13, 2020 at 8:15 AM
To: spark users <user@spark.apache.org>
Subject: RE: [EXTERNAL] Multiple applications being spawned


CAUTION: This email originated from outside of the organization. Do not click links or open
attachments unless you can confirm the sender and know the content is safe.


Adding Logs.

When it launches the multiple applications , following logs get generated on the terminal
Also it retries the task always:

20/10/13 12:04:30 WARN TaskSetManager: Lost task XX in stage XX (TID XX, executor 5): java.net.SocketException:
Broken pipe (Write failed)
        at java.net.SocketOutputStream.socketWrite0(Native Method)
        at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
        at java.io.DataOutputStream.write(DataOutputStream.java:107)
        at java.io.FilterOutputStream.write(FilterOutputStream.java:97)
        at org.apache.spark.api.python.PythonRDD$.org$apache$spark$api$python$PythonRDD$$write$1(PythonRDD.scala:212)
        at org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:224)
        at org.apache.spark.api.python.PythonRDD$$anonfun$writeIteratorToStream$1.apply(PythonRDD.scala:224)
        at scala.collection.Iterator$class.foreach(Iterator.scala:891)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
        at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:224)
        at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:561)
        at org.apache.spark.api.python.BasePythonRunner$WriterThread$$anonfun$run$1.apply(PythonRunner.scala:346)
        at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1945)
        at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:195)

Kind Regards,
Sachit Murarka


On Tue, Oct 13, 2020 at 4:02 PM Sachit Murarka <connectsachit@gmail.com<mailto:connectsachit@gmail.com>>
wrote:
Hi Users,
When action(I am using count and write) gets executed in my spark job , it launches many more
application instances(around 25 more apps).

In my spark code ,  I am running the transformations through Dataframes then converting dataframe
to rdd then applying zipwithindex , then converting it back to dataframe and then applying
2 actions(Count & Write).

Please note : This was working fine till the previous week, it has started giving this issue
since yesterday.

Could you please tell what can be the reason for this behavior?

Kind Regards,
Sachit Murarka
Mime
View raw message