spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thodoris Zois <z...@ics.forth.gr>
Subject Re: ML Linear and Logistic Regression - Poor Performance
Date Sat, 28 Apr 2018 00:15:29 GMT
I am in CentOS 7 and I use Spark 2.3.0. Below I have posted my code. Logistic regression took
85 minutes and linear regression 127 seconds… 

My dataset as I said is 128 MB and contains: 1000 features and ~100 classes. 


#SparkSession
ss = SparkSession.builder.getOrCreate()


start = time.time()

#Read data
trainData = ss.read.format("csv").option("inferSchema","true").load(file)

#Calculate Features
assembler = VectorAssembler(inputCols=trainData.columns[1:], outputCol="features")
trainData = assembler.transform(trainData)

#Drop columns
dropColumns = trainData.columns
dropColumns = [e for e in dropColumns if e not in ('_c0', 'features')]
trainData = trainData.drop(*dropColumns)

#Rename column from _c0 to label
trainData = trainData.withColumnRenamed("_c0", "label")

#Logistic regression
lr = LogisticRegression(maxIter=500, regParam=0.3, elasticNetParam=0.8)
lrModel = lr.fit(trainData)

#Output Coefficients
print("Coefficients: " + str(lrModel.coefficientMatrix))



- Thodoris


> On 27 Apr 2018, at 22:50, Irving Duran <irving.duran@gmail.com> wrote:
> 
> Are you reformatting the data correctly for logistic (meaning 0 & 1's) before modeling?
 What are OS and spark version you using?
> 
> Thank You,
> 
> Irving Duran
> 
> 
> On Fri, Apr 27, 2018 at 2:34 PM Thodoris Zois <zois@ics.forth.gr <mailto:zois@ics.forth.gr>>
wrote:
> Hello,
> 
> I am running an experiment to test logistic and linear regression on spark using MLlib.
> 
> My dataset is only 128MB and something weird happens. Linear regression takes about 127
seconds either with 1 or 500 iterations. On the other hand, logistic regression most of the
times does not manage to finish either with 1 iteration. I usually get memory heap error.
> 
> In both cases I use the default cores and memory for driver and I spawn 1 executor with
1 core and 2GBs of memory. 
> 
> Except that, I get a warning about NativeBLAS. I searched in the Internet and I found
that I have to install libgfortran. Even if I did it the warning remains.
> 
> Any ideas for the above?
> 
> Thank you,
> - Thodoris
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org <mailto:user-unsubscribe@spark.apache.org>
> 


Mime
View raw message