So if I'm understanding what you are saying is, simply put, that I should investigate the use
L_1 as my distance measure during my measuring of vector distance within a cluster?
On 1 Mar 2013, at 16:24, Ted Dunning wrote:
> What Sean says is just right, except that I was (telegraphically) getting
> at a slightly different point with L_1:
>
> On Wed, Feb 27, 2013 at 7:23 AM, Chris Harrington <chris@heystaks.com>wrote:
>
>> Is L_1 regularization the same as manhattan distance?
>>
>
> L_1 metric is manhattan distance, yes.
>
> L_1 regularization of kmeans refers to something a little bit different.
>
> The idea with regularization is that you add some sort of penalty to the
> function you are optimizing. This penalty pushes the optimization toward a
> solution that you would prefer on some other grounds than just the
> optimization alone. Regularization often helps in solving underdetermined
> systems where there are an infinite number of solutions and we have to pick
> a preferred solution.
>
> There isn't anything that says that you have to be optimizing the same kind
> of function as the regularization. Thus kmeans, which is inherently
> optimizing squared error can quite reasonably be regularized with L_1 (sum
> of the absolute value of the centroids' coefficients).
>
> I haven't tried this at all seriously yet. L_1 regularization tends to
> help drive toward sparsity, but it is normally used in convex problems
> where we can guarantee a findable global optimum. The kmeans problem,
> however, is not convex so adding the regularization may screw things up in
> practice. For textlike data, I have a strong intuition that the idealized
> effect of L_1 should be very good, but the pragmatic effect may not be so
> useful.
