spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Holden Karau <hol...@pigscanfly.ca>
Subject Re: Spark Beginner Question
Date Wed, 27 Jul 2016 05:11:47 GMT
So you will need to convert your input DataFrame into something with
vectors and labels to train on - the Spark ML documentation has examples
http://spark.apache.org/docs/latest/ml-guide.html (although the website
seems to be having some issues mid update to Spark 2.0 so if you want to
read it right now
http://spark.apache.org/docs/1.6.2/ml-guide.html#example-pipeline )

As for why some algorithms are available in the RDD API and not the
DataFrame API yet - simply development time. The DataFrame/Pipeline time
will be the actively developed API going forward.

Cheers,

Holden :)

On Tuesday, July 26, 2016, Shi Yu <shiyu.usa@gmail.com> wrote:

> Hello,
>
> *Question 1: *I am new to Spark. I am trying to train classification
> model on Spark DataFrame. I am using PySpark.  And aFrame object in df:ted
> a Spark DataFrame object in df:
>
> from pyspark.sql.types import *
>
> query = """select * from table"""
>
> df = sqlContext.sql(query)
>
> My question is how to continue extend the code to train models (e.g., classification
model etc.) on object df?  I have checked many online resources and haven't seen any similar
approach like the following:
>
> lr = LogisticRegression(maxIter=10, regParam=0.3, elasticNetParam=0.8)
> # Fit the modellrModel = lr.fit(df)
>
> Is it a feasible way to train the model? If yes, where could I find the reference code?
>
> *Question 2:  *Why in MLib dataframe based API there is no SVM model support, however,
in RDD-based APIs there was SVM model?
>
> Thanks a lot!
>
>
> Best,
>
>
> Shi
>
>
>

-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau

Mime
View raw message