systemml-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dusenberr...@gmail.com
Subject Re: Install - Configure Jupyter Notebook
Date Wed, 05 Jul 2017 21:23:23 GMT
For a bit more context, this is the general way of starting Jupyter with PySpark support. 
In contrast, the usual `jupyter notebook` command will only launch Jupyter with a standard
Python kernel.

Additionally, all of the extra "conf" settings in that command refer to settings that could
be placed in the standard `conf/spark-defaults.conf`file of your Spark installation, with
spaces instead of the equals signs, in case you're already familiar with that.

- Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Jul 5, 2017, at 2:14 PM, Niketan Pansare <npansar@us.ibm.com> wrote:
> 
> Hi Gustavo,
> 
> You can paste that code into the commandline:
> $ PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark --master
local[*] --conf "spark.driver.memory=12g" --conf spark.driver.maxResultSize=0 --conf spark.akka.frameSize=128
--conf spark.default.parallelism=100
> 
> The above command tells "pyspark" that the python driver is jupyter. For more details,
please see https://github.com/apache/spark/blob/master/bin/pyspark#L27
> 
> Alternatively, you can follow Arijit's suggestion.
> 
> Thanks,
> 
> Niketan Pansare
> IBM Almaden Research Center
> E-mail: npansar At us.ibm.com
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
> 
> arijit chakraborty ---07/02/2017 04:22:28 AM---Hi Gustavo, You can put that pyspark details
in the jupyter console itself.
> 
> From: arijit chakraborty <akc14@hotmail.com>
> To: "dev@systemml.apache.org" <dev@systemml.apache.org>
> Date: 07/02/2017 04:22 AM
> Subject: Re: Install - Configure Jupyter Notebook
> 
> 
> 
> 
> Hi Gustavo,
> 
> 
> You can put that pyspark details in the jupyter console itself.
> 
> 
> import os
> import sys
> import pandas as pd
> import numpy as np
> 
> spark_path = "C:\spark"
> os.environ['SPARK_HOME'] = spark_path
> os.environ['HADOOP_HOME'] = spark_path
> 
> sys.path.append(spark_path + "/bin")
> sys.path.append(spark_path + "/python")
> sys.path.append(spark_path + "/python/pyspark/")
> sys.path.append(spark_path + "/python/lib")
> sys.path.append(spark_path + "/python/lib/pyspark.zip")
> sys.path.append(spark_path + "/python/lib/py4j-0.10.4-src.zip")
> 
> from pyspark import SparkContext
> from pyspark import SparkConf
> 
> sc = SparkContext("local[*]", "test")
> 
> 
> # SystemML Specifications:
> 
> 
> from pyspark.sql import SQLContext
> import systemml as sml
> sqlCtx = SQLContext(sc)
> ml = sml.MLContext(sc)
> 
> 
> But this is not a very good way of doing it. I did it as I'm using windows and it's easier
to do it like that.
> 
> 
> Regards,
> 
> Arijit
> 
> ________________________________
> From: Gustavo Frederico <gustavo.frederico@thinkwrap.com>
> Sent: Sunday, July 2, 2017 10:16:03 AM
> To: dev@systemml.apache.org
> Subject: Install - Configure Jupyter Notebook
> 
> 
> A basic question: step 3 in https://systemml.apache.org/install-systemml.html <https://systemml.apache.org/install-systemml.html>
 for “Configure Jupyter Notebook” has
> # Start Jupyter Notebook Server
> PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark --master
local[*] --conf "spark.driver.memory=12g" --conf spark.driver.maxResultSize=0 --conf spark.akka.frameSize=128
--conf spark.default.parallelism=100
> Where does that go? There are no details in this step…
> 
> Thanks
> 
> Gustavo
> 
> 
> 

Mime
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message