spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Qian He <hq.ja...@gmail.com>
Subject Spark LogisticRegression got stuck on dataset with millions of columns
Date Tue, 23 Apr 2019 00:02:45 GMT
Hi all,

I'm using Spark provided LogisticRegression to fit a dataset. Each row of
the data has 1.7 million columns, but it is sparse with only hundreds of
1s. The Spark Ui reported high GC time when the model is being trained. And
my spark application got stuck without any response. I have allocated 100
executors and 8g for each executor.

Is there any thing i should do to make the training process go successfully?

Mime
View raw message