mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: MultiNormal distribution radius
Date Wed, 14 Nov 2012 17:38:56 GMT
On Wed, Nov 14, 2012 at 3:53 AM, Sean Owen <> wrote:

> The wikipedia article is a fine intro. If your covariances are 0, there's
> not much to know at all. The multivariate normal is just several univariate
> normals, independent in each dimension.

Not quite.  The multivariate normal is several univariate normals
multiplied by a matrix.

> If you want a uniform distribution over a unit sphere, that's different,
> but you're actually also on the right track. You don't need to sample and
> discard, just pick your point as above and normalize to a length randomly
> chosen in (0,radius]. 90% sure that's correct off the top of my head.

In high dimensions, this doesn't work because you get vastly too much mass
near the origin.  You need to sample the radius from a distribution biased
toward larger values.  The idea is that each shell of radius r and
thickness dr needs to have probability according to the volume of the shell
which is proportional to r^(d-1).  If every shell has equal mass, then the
inner shells are much too dense.  The desired cumulative distribution is
proportional to r^d so you can sample r by using the inverse method

    u ~ Uniform(0,1)
    r = Math.pow(u, 1/d)

See for a picture
of 2-d sampling which is the first interesting case.  For higher
dimensional cases, to get a comparable picture, you need to take a slice of
data near a 2-d plane since simply projecting to the x-y plane will give
very biased results.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message