spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simone <simone.mirag...@gmail.com>
Subject Pyspark ML - Unable to finish cross validation
Date Mon, 26 Sep 2016 17:23:46 GMT
Hello,

I am using pyspark to train a Logistic Regression model using cross validation with ML. My
dataset is - for testing purposes very small - like no more than 50 records for train.
On the other hand, my "feature" column has a very large size - i.e., 1500+ columns.

I am running on yarn using 3 executors, with 4gb and 4 cores each. I am using cache to store
dataframes.

Unfortunately, my process does not finish and hangs in doing cross validation. 

Any clues? 

Thanks guys

Simone
Mime
View raw message