mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benson Margulies <bimargul...@gmail.com>
Subject Re: Logistic Regression Tutorial
Date Thu, 28 Apr 2011 20:24:51 GMT
Chris,

I'm looking a recently-purchased MIA.

The LR example is all about the donut file, which has features that
don't look anything like, even remotely, a full-up bag-of-words
vector.

I'm lacking the point of connection between the vectorization process
(which we have some experience here with running canopy/kmeans) and
the LR example. It's probably some simple principle that I'm failing
to grasp.

--benson


On Thu, Apr 28, 2011 at 4:02 PM, Chris Schilling <chris@cellixis.com> wrote:
> Benson,
>
> The latest chapters in Mahout in Action cover document classification using LR very well.
>
> Chris
>
>
> On Apr 28, 2011, at 12:55 PM, Benson Margulies wrote:
>
>> Mike,
>>
>> in the time available for the experiment I want to perform, all I can
>> imagine doing is turning each document into a bag-of-words feature
>> vector. So, I want to run the pipeline of lucene->vectors->... and
>> train a model. I confess that I don't have the time to try to absorb
>> the underlying math, indeed, I have some co-workers who can help me
>> with that. My problem is entirely plumbing at this point.
>>
>> --benson
>>
>>
>> On Thu, Apr 28, 2011 at 3:52 PM, Mike Nute <mike.nute@gmail.com> wrote:
>>> Benson,
>>>
>>> Lecture 3 in this one is a good intro to the logit model:
>>>
>>> http://see.stanford.edu/see/lecturelist.aspx?coll=348ca38a-3a6d-4052-937d-cb017338d7b1
>>>
>>> The lecture notes are pretty solid too so that might be faster.
>>>
>>> The short version: Logistic Regression is a GLM with the link f^-1(x) =
>>> 1/(1+e^(xB)) and a Binomial likelihood function.  You can alternatively use
>>> Batch or Stochastic Gradient Descent.
>>>
>>> I've never done document classification before though, so I'm not much help
>>> with more complicated things like choosing the feature vector.
>>>
>>> Good Luck,
>>> Mike Nute
>>>
>>> On Thu, Apr 28, 2011 at 3:35 PM, Benson Margulies <bimargulies@gmail.com>wrote:
>>>
>>>> Is there a logistic regression tutorial in the house? I've got a stack
>>>> of files (Arabic ones, no less) and I want to train and score a
>>>> classifier.
>>>>
>>>
>>>
>>>
>>> --
>>> Michael Nute
>>> Mike.Nute@gmail.com
>>>
>
>

Mime
View raw message