spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reza Zadeh <r...@databricks.com>
Subject Re: [mllib] GradientDescent requires huge memory for storing weight vector
Date Tue, 13 Jan 2015 01:01:01 GMT
I guess you're not using too many features (e.g. < 10m), just that hashing
the index makes it look that way, is that correct?

If so, the simple dictionary that maps your feature index -> rank can be
broadcast and used everywhere, so you can pass mllib just the feature's
rank as its index.

Reza

On Mon, Jan 12, 2015 at 4:26 PM, Tianshuo Deng <tdeng@twitter.com.invalid>
wrote:

> Hi,
> Currently in GradientDescent.scala, weights is constructed as a dense
> vector:
>
>     initialWeights = Vectors.dense(new Array[Double](numFeatures))
>
> And the numFeatures is determined in the loadLibSVMFile as the max index
> of features.
>
> But in the case of using hash function to compute feature index, it
> results in a huge dense vector being generated taking lots of memory space.
>
> Any suggestions?
>
>

Mime
View raw message