spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Evans <jeffrey.wayne.ev...@gmail.com>
Subject Why does this spark-shell invocation get suspended due to tty output?
Date Thu, 04 Apr 2019 16:21:11 GMT
Hi all,

I am trying to make our application check the Spark version before
attempting to submit a job, to ensure the user is on a new enough
version (in our case, 2.3.0 or later).  I realize that there is a
--version argument to spark-shell, but that prints the version next to
some ASCII art so a bit of parsing would be needed.  So my initial
idea was to do something like the following:

echo 'System.out.println(sc.version)' | spark-shell 2>/dev/null | grep
-A2 'System.out.println' | grep -v 'System.out.println'

This works fine.  If you run in a shell (assuming spark-shell is on
your PATH), you get the version printed to the first line of stdout.
But I'm noticing some strange behavior when I try to invoke this from
either a Java or Scala application (via their respective
ProcessBuilder functionalities).  To be specific, if that Java/Scala
application is run as a background job, then when the spark-shell
invocation happens (after being forked by a ProcessBuilder), the job
goes into suspended state due to tty output.  This happens even if the
Java/Scala program has had its stdout and stderr redirected to files.

To demonstrate, compile this Java class, which is really just a simple
wrapper for ProcessBuilder.  It passes its main arguments directly to
ProcessBuilder, and creates threads to consume stdout/stderr (printing
them to its own stdout/stderr), then exits when that forked process
dies.

https://gist.github.com/jeff303/e5b44e220db20800752c932cbfbf7ed1

My environment is OS X 10.14.3, Spark 2.4.0 (installed via Homebrew),
Scala stable 2.12.8 (also Homebrew), and Oracle HotSpot JDK 1.8.0_181.
The behavior outlined below happens in both zsh 5.6.2 and bash
4.4.23(1).  Consider the following terminal session

# BEGIN SHELL SESSION
# compile the ProcessBuilderRunner class
javac ProcessBuilderRunner.java

# sanity check; just invoke an echo command
java ProcessBuilderRunner bash -c 'echo hello world'
About to run: bash -c echo hello world
stdout line: hello world
exit value from process: 0
stderr from process:
stdout from process: hello world

# try running the "version check" sequence outlined above in foreground
java ProcessBuilderRunner bash -c "echo
'System.out.println(sc.version)' | spark-shell 2>/dev/null | grep -A2
'System.out.println' | grep -v 'System.out.println'"
[Python:system] [11:08:54]
About to run: bash -c echo 'System.out.println(sc.version)' |
spark-shell 2>/dev/null | grep -A2 'System.out.println' | grep -v
'System.out.println'
stdout line: 2.4.0
stdout line:
exit value from process: 0
stderr from process:
stdout from process: 2.4.0

# run the same thing, but in the background, redirecting outputs to files
java ProcessBuilderRunner bash -c "echo
'System.out.println(sc.version)' | spark-shell 2>/dev/null | grep -A2
'System.out.println' | grep -v 'System.out.println'"
>/tmp/spark-check.out 2>/tmp/spark-check.err &

# after a few seconds, the job is suspended due to tty output
[1]  + 8964 suspended (tty output)  java ProcessBuilderRunner bash -c
> /tmp/spark-check.out 2>

# foreground the job; it will complete shortly thereafter
fg

# confirm the stdout is correct
cat /tmp/spark-check.out
About to run: bash -c echo 'System.out.println(sc.version)' |
spark-shell 2>/dev/null | grep -A2 'System.out.println' | grep -v
'System.out.println'
stdout line: 2.4.0
stdout line:
exit value from process: 0
stdout from process: 2.4.0
# END SHELL SESSION

Why is the backgrounded Java process getting suspended when it tries
to invoke spark-shell here?  Theoretically, all of the programs
involved here should have a well defined sink for their stdout, and in
the foreground everything works correctly.  Also, of note, the same
exact thing run from a Scala class* results in the same behavior.  I
am not that knowledgeable on the finer points of tty handling, so
hopefully someone can point me in the right direction.  Thanks.

* Scala version:
https://gist.github.com/jeff303/2c2c3daa49a9cb588a0de6f1a73255b2

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message