mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiaobo Gu <guxiaobo1...@gmail.com>
Subject Fwd: Is any more detailed documentation aout the sgd logistic regression example.
Date Thu, 05 May 2011 14:48:54 GMT
On Thu, May 5, 2011 at 10:40 PM, Stanley Xu <wenhao.xu@gmail.com> wrote:
> 1. You could use the command line to add shape as category features, it will
> hash categoryname=value as the feature and set the value as 1.0, it is the
> standard way to convert a category feature to multiple numeric
> feature(convert to 0/1 feature)

Can we just use "word" type for category predictor variables?

> 2. In production mode, don't use csv, you will find most of the time spent
> are on parse the csv data and hash them to features. You might encode the
> feature to vector and serialize them to the file system by MapReduce to
> reduce cost on data parsing.

Currentlly we are not familiar with Vectors, is there a standard way
(command line )to encode csv files into Vector and serialize them into
file system,
And what do you mean by "file system", local file system or HDFS,
because you mentioned MapReduce



> Best wishes,
> Stanley Xu
>
>
>
> On Mon, May 2, 2011 at 11:58 PM, Xiaobo Gu <guxiaobo1982@gmail.com> wrote:
>>
>> In our environments data will be prepared inside the relational data
>> warehouse, and then export as csv files, that's the trainlogistic
>> command line works well for us, but we will have both numeric and
>> category predictor variables, does SGD support category variables, and
>> are there examples about this? because I think the results bellow does
>> not apply for category variables,
>>
>> color ~ -0.157*Intercept Term + -0.678*x + -0.416*y
>> Intercept Term -0.15655
>> x -0.67841
>> y -0.41587
>>
>> On Fri, Apr 22, 2011 at 6:16 AM, Ted Dunning <ted.dunning@gmail.com>
>> wrote:
>> > The trainlogistic command is (as Stanley says) only a simple example.
>> >
>> > You will need to write a program something like TrainNewsGroups for your
>> > modelers to use.
>> >
>> > I agree that the API oriented code in Mahout is not what those users
>> > need.
>> >  I was, however, what my users needed.
>> >
>> > It would be great if you would like to contribute a good command line
>> > for
>> > the more advanced SGD classifier training
>> > API.
>> >
>> > On Tue, Apr 19, 2011 at 10:51 PM, Stanley Xu <wenhao.xu@gmail.com>
>> > wrote:
>> >
>> >> Hi Xiaobo,
>> >>
>> >> You could check the chapter 13-16 from <Mahout In Action>, it provided
>> >> all
>> >> the parameters the command line tool of 'mahout trainlogistic' could
>> >> use.
>> >> But the trainlogistic command is still only a simple example. If you
>> >> wanted
>> >> to use that in a production environment, you still have to write the
>> >> feature
>> >> encode code by yourself. The code you need to write is pretty easy,
>> >> just
>> >> parse the input and put that in a Vector and let the LR train the data.
>> >>
>> >> Best wishes,
>> >> Stanley Xu
>> >>
>> >>
>> >>
>> >>
>> >> On Tue, Apr 19, 2011 at 9:09 PM, XiaoboGu <guxiaobo1982@gmail.com>
>> >> wrote:
>> >>
>> >>> Hi,
>> >>>
>> >>> Thanks for your reply, after some reading of the wiki pages, I think
>> >>> what
>> >>> I want is a Logistic Regression command-line, since the target users
>> >>> of
>> >>> Mahout are data analysts, who can't write Java code, a command line
is
>> >>> more
>> >>> convenient. Some specific questions are :
>> >>> 1. What format should we apply when preparing data for logistic
>> >>> regression, can we use csv, and should we put the value for the target
>> >>> variable as the first column in every row the csv file.
>> >>> 2. What options can we support to the command line if there is one.
>> >>> 3. How can interpret the results.
>> >>>
>> >>> Because Logistic Regression is the working horse of credit scoring in
>> >>> industry, I think it will make Mahout friends of more analysts if LR
>> >>> support
>> >>> is smooth.
>> >>>
>> >>> Regards,
>> >>>
>> >>> Xiaobo Gu
>> >>>
>> >>> From: Ted Dunning [mailto:ted.dunning@gmail.com]
>> >>> Sent: Wednesday, April 13, 2011 1:02 AM
>> >>> To: user@mahout.apache.org
>> >>> Cc: Xiaobo Gu
>> >>> Subject: Re: Is any more detailed documentation aout the sgd logistic
>> >>> regression example.
>> >>>
>> >>> Can you be more specific about what you have and what you want?
>> >>>
>> >>> The book Mahout in Action provides quite a lot of details with sample
>> >>> code
>> >>> for a server farm.
>> >>>
>> >>> The TrainNewsGroups example provides code that you can copy.
>> >>>
>> >>> Do you have these resources?  Do you want more?  Did you want more
>> >>> theory?
>> >>>
>> >>> On Tue, Apr 12, 2011 at 9:11 AM, Xiaobo Gu <guxiaobo1982@gmail.com>
>> >>> wrote:
>> >>> Hi,
>> >>> Documents about sgd logistic regression itself are welcome too.
>> >>> Regards,
>> >>>
>> >>> Xiaobo Gu
>> >>>
>> >>>
>> >>>
>> >>
>> >
>
>

Mime
View raw message