spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Kelly <>
Subject Re: scikit learn on EMR PySpark
Date Wed, 02 Mar 2016 00:21:22 GMT
Hi, Myles,

We do not install scikit-learn or spark-sklearn on EMR clusters by default,
but you may install them yourself by just doing "sudo pip install
scikit-learn spark-sklearn" (either by ssh'ing to the master instance and
running this manually, or by running it as an EMR Step).

~ Jonathan

On Tue, Mar 1, 2016 at 3:20 PM Gartland, Myles <>

> New to Spark and MLlib. Coming from sickit learn.
> I am launching my Spark 1.6 instance through AWS EMR and pyspark. All the
> examples using Mllib work fine.
> But I have seen a couple examples where you can combine scikit learn
> packages and syntax with mllib.
> Like in this example-
> However, it does not seem that Pyspark on AWS EMR comes with scikit (or
> other standard pydata packages) loaded.
> Is this something you can/should load on pyspark and how would you do it?
> Thanks for assisting.
> Myles

View raw message