mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiaobo Gu <guxiaobo1...@gmail.com>
Subject Re: Is any more detailed documentation aout the sgd logistic regression example.
Date Mon, 02 May 2011 15:58:39 GMT
In our environments data will be prepared inside the relational data
warehouse, and then export as csv files, that's the trainlogistic
command line works well for us, but we will have both numeric and
category predictor variables, does SGD support category variables, and
are there examples about this? because I think the results bellow does
not apply for category variables,

color ~ -0.157*Intercept Term + -0.678*x + -0.416*y
Intercept Term -0.15655
x -0.67841
y -0.41587

On Fri, Apr 22, 2011 at 6:16 AM, Ted Dunning <ted.dunning@gmail.com> wrote:
> The trainlogistic command is (as Stanley says) only a simple example.
>
> You will need to write a program something like TrainNewsGroups for your
> modelers to use.
>
> I agree that the API oriented code in Mahout is not what those users need.
>  I was, however, what my users needed.
>
> It would be great if you would like to contribute a good command line for
> the more advanced SGD classifier training
> API.
>
> On Tue, Apr 19, 2011 at 10:51 PM, Stanley Xu <wenhao.xu@gmail.com> wrote:
>
>> Hi Xiaobo,
>>
>> You could check the chapter 13-16 from <Mahout In Action>, it provided all
>> the parameters the command line tool of 'mahout trainlogistic' could use.
>> But the trainlogistic command is still only a simple example. If you wanted
>> to use that in a production environment, you still have to write the feature
>> encode code by yourself. The code you need to write is pretty easy, just
>> parse the input and put that in a Vector and let the LR train the data.
>>
>> Best wishes,
>> Stanley Xu
>>
>>
>>
>>
>> On Tue, Apr 19, 2011 at 9:09 PM, XiaoboGu <guxiaobo1982@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Thanks for your reply, after some reading of the wiki pages, I think what
>>> I want is a Logistic Regression command-line, since the target users of
>>> Mahout are data analysts, who can't write Java code, a command line is more
>>> convenient. Some specific questions are :
>>> 1. What format should we apply when preparing data for logistic
>>> regression, can we use csv, and should we put the value for the target
>>> variable as the first column in every row the csv file.
>>> 2. What options can we support to the command line if there is one.
>>> 3. How can interpret the results.
>>>
>>> Because Logistic Regression is the working horse of credit scoring in
>>> industry, I think it will make Mahout friends of more analysts if LR support
>>> is smooth.
>>>
>>> Regards,
>>>
>>> Xiaobo Gu
>>>
>>> From: Ted Dunning [mailto:ted.dunning@gmail.com]
>>> Sent: Wednesday, April 13, 2011 1:02 AM
>>> To: user@mahout.apache.org
>>> Cc: Xiaobo Gu
>>> Subject: Re: Is any more detailed documentation aout the sgd logistic
>>> regression example.
>>>
>>> Can you be more specific about what you have and what you want?
>>>
>>> The book Mahout in Action provides quite a lot of details with sample code
>>> for a server farm.
>>>
>>> The TrainNewsGroups example provides code that you can copy.
>>>
>>> Do you have these resources?  Do you want more?  Did you want more theory?
>>>
>>> On Tue, Apr 12, 2011 at 9:11 AM, Xiaobo Gu <guxiaobo1982@gmail.com>
>>> wrote:
>>> Hi,
>>> Documents about sgd logistic regression itself are welcome too.
>>> Regards,
>>>
>>> Xiaobo Gu
>>>
>>>
>>>
>>
>

Mime
View raw message