spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matei Zaharia (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-1134) ipython won't run standalone python script
Date Thu, 03 Apr 2014 22:49:15 GMT

     [ https://issues.apache.org/jira/browse/SPARK-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Matei Zaharia updated SPARK-1134:
---------------------------------

    Affects Version/s: 0.9.1

> ipython won't run standalone python script
> ------------------------------------------
>
>                 Key: SPARK-1134
>                 URL: https://issues.apache.org/jira/browse/SPARK-1134
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 0.9.0, 0.9.1
>            Reporter: Diana Carroll
>            Assignee: Diana Carroll
>              Labels: pyspark
>
> Using Spark 0.9.0, python 2.6.6, and ipython 1.1.0.
> The problem: If I want to run a python script as a standalone app, the docs say I should
execute the command "pyspark myscript.py".  This works as long as IPYTHON=0.  But if IPYTHON=1
this doesn't work.
> This problem arose for me because I tried to save myself typing by setting IPYTHON=1
in my shell profile script. Which then meant I was unable to execute pyspark standalone scripts.
> My analysis: 
> in the pyspark script, command line arguments are simply ignored if ipython is used:
> {code}if [[ "$IPYTHON" = "1" ]] ; then
>   exec ipython $IPYTHON_OPTS
> else
>   exec "$PYSPARK_PYTHON" "$@"
> fi{code}
> I thought I could get around this by changing the script to pass $@.  However, this doesn't
work: doing so results in an error saying multiple spark contexts can't be run at once.
> This is because of a feature?/bug? of ipython related to the PYTHONSTARTUP environment
variable.  the pyspark script sets this variable to point to the python/shell.py script, which
initializes the Spark Context.  In regular python, the PYTHONSTARTUP script runs ONLY if python
is invoked in interactive mode; if run with a script, it ignores the variable.  iPython runs
that script every time, regardless.  Which means it will always execute Spark's shell.py script
to initialize the spark context even when it was invoked with a script.
> Proposed solution:
> short term: add this information to the Spark docs regarding iPython.  Something like
"Note, iPython can only be used interactively.  Use regular Python to execute pyspark script
files."
> long term: change the pyspark script to tell if arguments are passed in; if so, just
call python instead of pyspark, or don't set the PYTHONSTARTUP variable?  Or maybe fix shell.py
to detect if it's being invoked in non-interactively and not initialize sc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message