spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reza Zadeh <>
Subject Re: [mllib] GradientDescent requires huge memory for storing weight vector
Date Tue, 13 Jan 2015 01:01:01 GMT
I guess you're not using too many features (e.g. < 10m), just that hashing
the index makes it look that way, is that correct?

If so, the simple dictionary that maps your feature index -> rank can be
broadcast and used everywhere, so you can pass mllib just the feature's
rank as its index.


On Mon, Jan 12, 2015 at 4:26 PM, Tianshuo Deng <>

> Hi,
> Currently in GradientDescent.scala, weights is constructed as a dense
> vector:
>     initialWeights = Vectors.dense(new Array[Double](numFeatures))
> And the numFeatures is determined in the loadLibSVMFile as the max index
> of features.
> But in the case of using hash function to compute feature index, it
> results in a huge dense vector being generated taking lots of memory space.
> Any suggestions?

View raw message