spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashish Dutt <ashish.du...@gmail.com>
Subject Re: PySpark without PySpark
Date Fri, 10 Jul 2015 03:26:21 GMT
Hello Sujit,
Many thanks for your response.
To answer your questions;
Q1) Do you have SPARK_HOME set up in your environment?- Yes, I do. It is
SPARK_HOME="C:/spark-1.3.0/bin"
Q2) Is there a python2 or python subdirectory under the root of your Spark
installation? - Yes, i do have that too. It is called python. To fix this
problem this is what I did,
I downloaded py4j-0.8.2.1-src from here <https://pypi.python.org/pypi/py4j>
which was not there initially when I downloaded the spark package from the
official repository. I then put it in the lib directory
as C:\spark-1.3.0\python\lib. Note I did not extract the zip file. I put it
in as it is.
The pyspark folder of the spark-1.3.0 root folder. What I next did was copy
this file and put it in the  pythonpath. So my python path now reads as
PYTHONPATH="C:/Python27/"

I then rebooted the computer and a silent prayer :-) Then I opened the
command prompt and invoked the command pyspark from the bin directory of
spark and EUREKA, it worked :-)  Attached is the screenshot for the same.
Now, the problem is with IPython notebook. I cannot get it to work with
pySpark.
I have a cluster with 4 nodes using CDH5.4

I was able to resolve the problem. Now the next challenge was to configure
it with IPython. Followed the steps as documented in the blog. And I get
the errors, attached is the screenshot

@Julian, I tried your method too. Attached is the screenshot of the error
message 7.png

Hope you can help me out to fix this problem.
Thank you for your time.

Sincerely,
Ashish Dutt
PhD Candidate
Department of Information Systems
University of Malaya, Lembah Pantai,
50603 Kuala Lumpur, Malaysia

On Fri, Jul 10, 2015 at 12:02 AM, Sujit Pal <sujitatgtalk@gmail.com> wrote:

> Hi Ashish,
>
> Your 00-pyspark-setup file looks very different from mine (and from the
> one described in the blog post). Questions:
>
> 1) Do you have SPARK_HOME set up in your environment? Because if not, it
> sets it to None in your code. You should provide the path to your Spark
> installation. In my case I have spark-1.3.1 installed under $HOME/Software
> and the code block under "# Configure the environment" (or yellow highlight
> in the code below) reflects that.
> 2) Is there a python2 or python subdirectory under the root of your Spark
> installation? In my case its "python" not "python2". This contains the
> Python bindings for spark, so the block under "# Add the PySpark/py4j to
> the Python path" (or green highlight in the code below) adds it to the
> Python sys.path so things like pyspark.SparkContext are accessible in your
> Python environment.
>
> import os
> import sys
>
> # Configure the environment
> if 'SPARK_HOME' not in os.environ:
>     os.environ['SPARK_HOME'] = "/Users/palsujit/Software/spark-1.3.1"
>
> # Create a variable for our root path
> SPARK_HOME = os.environ['SPARK_HOME']
>
> # Add the PySpark/py4j to the Python Path
> sys.path.insert(0, os.path.join(SPARK_HOME, "python", "build"))
> sys.path.insert(0, os.path.join(SPARK_HOME, "python"))
>
> Hope this fixes things for you.
>
> -sujit
>
>
> On Wed, Jul 8, 2015 at 9:52 PM, Ashish Dutt <ashish.dutt8@gmail.com>
> wrote:
>
>> Hi Sujit,
>> Thanks for your response.
>>
>> So i opened a new notebook using the command ipython notebook --profile
>> spark and tried the sequence of commands. i am getting errors. Attached is
>> the screenshot of the same.
>> Also I am attaching the  00-pyspark-setup.py for your reference. Looks
>> like, I have written something wrong here. Cannot seem to figure out, what
>> is it?
>>
>> Thank you for your help
>>
>>
>> Sincerely,
>> Ashish Dutt
>>
>> On Thu, Jul 9, 2015 at 11:53 AM, Sujit Pal <sujitatgtalk@gmail.com>
>> wrote:
>>
>>> Hi Ashish,
>>>
>>> >> Nice post.
>>> Agreed, kudos to the author of the post, Benjamin Benfort of District
>>> Labs.
>>>
>>> >> Following your post, I get this problem;
>>> Again, not my post.
>>>
>>> I did try setting up IPython with the Spark profile for the edX Intro to
>>> Spark course (because I didn't want to use the Vagrant container) and it
>>> worked flawlessly with the instructions provided (on OSX). I haven't used
>>> the IPython/PySpark environment beyond very basic tasks since then though,
>>> because my employer has a Databricks license which we were already using
>>> for other stuff and we ended up doing the labs on Databricks.
>>>
>>> Looking at your screenshot though, I don't see why you think its picking
>>> up the default profile. One simple way of checking to see if things are
>>> working is to open a new notebook and try this sequence of commands:
>>>
>>> from pyspark import SparkContext
>>> sc = SparkContext("local", "pyspark")
>>> sc
>>>
>>> You should see something like this after a little while:
>>> <pyspark.context.SparkContext at 0x1093c9b10>
>>>
>>> While the context is being instantiated, you should also see lots of log
>>> lines scroll by on the terminal where you started the "ipython notebook
>>> --profile spark" command - these log lines are from Spark.
>>>
>>> Hope this helps,
>>> Sujit
>>>
>>>
>>> On Wed, Jul 8, 2015 at 6:04 PM, Ashish Dutt <ashish.dutt8@gmail.com>
>>> wrote:
>>>
>>>> Hi Sujit,
>>>> Nice post.. Exactly what I had been looking for.
>>>> I am relatively a beginner with Spark and real time data processing.
>>>> We have a server with CDH5.4 with 4 nodes. The spark version in our
>>>> server is 1.3.0
>>>> On my laptop I have spark 1.3.0 too and its using Windows 7
>>>> environment. As per point 5 of your post I am able to invoke pyspark
>>>> locally as in a standalone mode.
>>>>
>>>> Following your post, I get this problem;
>>>>
>>>> 1. In section "Using Ipython notebook with spark" I cannot understand
>>>> why it is picking up the default profile and not the pyspark profile. I am
>>>> sure it is because of the path variables. Attached is the screenshot. Can
>>>> you suggest how to solve this.
>>>>
>>>> Current the path variables for my laptop are like
>>>> SPARK_HOME="C:\SPARK-1.3.0\BIN", JAVA_HOME="C:\PROGRAM
>>>> FILES\JAVA\JDK1.7.0_79", HADOOP_HOME="D:\WINUTILS", M2_HOME="D:\MAVEN\BIN",
>>>> MAVEN_HOME="D:\MAVEN\BIN", PYTHON_HOME="C:\PYTHON27\", SBT_HOME="C:\SBT\"
>>>>
>>>>
>>>> Sincerely,
>>>> Ashish Dutt
>>>> PhD Candidate
>>>> Department of Information Systems
>>>> University of Malaya, Lembah Pantai,
>>>> 50603 Kuala Lumpur, Malaysia
>>>>
>>>> On Thu, Jul 9, 2015 at 4:56 AM, Sujit Pal <sujitatgtalk@gmail.com>
>>>> wrote:
>>>>
>>>>> You are welcome Davies. Just to clarify, I didn't write the post (not
>>>>> sure if my earlier post gave that impression, apologize if so), although
I
>>>>> agree its great :-).
>>>>>
>>>>> -sujit
>>>>>
>>>>>
>>>>> On Wed, Jul 8, 2015 at 10:36 AM, Davies Liu <davies@databricks.com>
>>>>> wrote:
>>>>>
>>>>>> Great post, thanks for sharing with us!
>>>>>>
>>>>>>
>>>>>> On Wed, Jul 8, 2015 at 9:59 AM, Sujit Pal <sujitatgtalk@gmail.com>
>>>>>> wrote:
>>>>>> > Hi Julian,
>>>>>> >
>>>>>> > I recently built a Python+Spark application to do search relevance
>>>>>> > analytics. I use spark-submit to submit PySpark jobs to a Spark
>>>>>> cluster on
>>>>>> > EC2 (so I don't use the PySpark shell, hopefully thats what
you are
>>>>>> looking
>>>>>> > for). Can't share the code, but the basic approach is covered
in
>>>>>> this blog
>>>>>> > post - scroll down to the section "Writing a Spark Application".
>>>>>> >
>>>>>> >
>>>>>> https://districtdatalabs.silvrback.com/getting-started-with-spark-in-python
>>>>>> >
>>>>>> > Hope this helps,
>>>>>> >
>>>>>> > -sujit
>>>>>> >
>>>>>> >
>>>>>> > On Wed, Jul 8, 2015 at 7:46 AM, Julian <Julian+Spark@magnetic.com>
>>>>>> wrote:
>>>>>> >>
>>>>>> >> Hey.
>>>>>> >>
>>>>>> >> Is there a resource that has written up what the necessary
steps
>>>>>> are for
>>>>>> >> running PySpark without using the PySpark shell?
>>>>>> >>
>>>>>> >> I can reverse engineer (by following the tracebacks and
reading
>>>>>> the shell
>>>>>> >> source) what the relevant Java imports needed are, but I
would
>>>>>> assume
>>>>>> >> someone has attempted this before and just published something
I
>>>>>> can
>>>>>> >> either
>>>>>> >> follow or install? If not, I have something that pretty
much works
>>>>>> and can
>>>>>> >> publish it, but I'm not a heavy Spark user, so there may
be some
>>>>>> things
>>>>>> >> I've
>>>>>> >> left out that I haven't hit because of how little of pyspark
I'm
>>>>>> playing
>>>>>> >> with.
>>>>>> >>
>>>>>> >> Thanks,
>>>>>> >> Julian
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> --
>>>>>> >> View this message in context:
>>>>>> >>
>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/PySpark-without-PySpark-tp23719.html
>>>>>> >> Sent from the Apache Spark User List mailing list archive
at
>>>>>> Nabble.com.
>>>>>> >>
>>>>>> >>
>>>>>> ---------------------------------------------------------------------
>>>>>> >> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>>>> >> For additional commands, e-mail: user-help@spark.apache.org
>>>>>> >>
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message