On Wed, Nov 14, 2012 at 3:53 AM, Sean Owen <srowen@gmail.com> wrote:
> The wikipedia article is a fine intro. If your covariances are 0, there's
> not much to know at all. The multivariate normal is just several univariate
> normals, independent in each dimension.
>
Not quite. The multivariate normal is several univariate normals
multiplied by a matrix.
> If you want a uniform distribution over a unit sphere, that's different,
> but you're actually also on the right track. You don't need to sample and
> discard, just pick your point as above and normalize to a length randomly
> chosen in (0,radius]. 90% sure that's correct off the top of my head.
>
In high dimensions, this doesn't work because you get vastly too much mass
near the origin. You need to sample the radius from a distribution biased
toward larger values. The idea is that each shell of radius r and
thickness dr needs to have probability according to the volume of the shell
which is proportional to r^(d1). If every shell has equal mass, then the
inner shells are much too dense. The desired cumulative distribution is
proportional to r^d so you can sample r by using the inverse method
u ~ Uniform(0,1)
r = Math.pow(u, 1/d)
See https://dl.dropbox.com/u/36863361/sphericalsampling.png for a picture
of 2d sampling which is the first interesting case. For higher
dimensional cases, to get a comparable picture, you need to take a slice of
data near a 2d plane since simply projecting to the xy plane will give
very biased results.
