# mahout-user mailing list archives

##### Site index · List index
Message view
Top
From Ted Dunning <ted.dunn...@gmail.com>
Date Wed, 14 Nov 2012 17:38:56 GMT
```On Wed, Nov 14, 2012 at 3:53 AM, Sean Owen <srowen@gmail.com> wrote:

> The wikipedia article is a fine intro. If your covariances are 0, there's
> not much to know at all. The multivariate normal is just several univariate
> normals, independent in each dimension.
>

Not quite.  The multivariate normal is several univariate normals
multiplied by a matrix.

> If you want a uniform distribution over a unit sphere, that's different,
> but you're actually also on the right track. You don't need to sample and
> discard, just pick your point as above and normalize to a length randomly
> chosen in (0,radius]. 90% sure that's correct off the top of my head.
>

In high dimensions, this doesn't work because you get vastly too much mass
near the origin.  You need to sample the radius from a distribution biased
toward larger values.  The idea is that each shell of radius r and
thickness dr needs to have probability according to the volume of the shell
which is proportional to r^(d-1).  If every shell has equal mass, then the
inner shells are much too dense.  The desired cumulative distribution is
proportional to r^d so you can sample r by using the inverse method

u ~ Uniform(0,1)
r = Math.pow(u, 1/d)

See https://dl.dropbox.com/u/36863361/spherical-sampling.png for a picture
of 2-d sampling which is the first interesting case.  For higher
dimensional cases, to get a comparable picture, you need to take a slice of
data near a 2-d plane since simply projecting to the x-y plane will give
very biased results.

```
Mime
• Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message