mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Is any more detailed documentation aout the sgd logistic regression example.
Date Thu, 05 May 2011 15:21:45 GMT
On Thu, May 5, 2011 at 7:48 AM, Xiaobo Gu <guxiaobo1982@gmail.com> wrote:

> On Thu, May 5, 2011 at 10:40 PM, Stanley Xu <wenhao.xu@gmail.com> wrote:
> > 1. You could use the command line to add shape as category features, it
> will
> > hash categoryname=value as the feature and set the value as 1.0, it is
> the
> > standard way to convert a category feature to multiple numeric
> > feature(convert to 0/1 feature)
>
> Can we just use "word" type for category predictor variables?
>

Yes.


> > 2. In production mode, don't use csv, you will find most of the time
> spent
> > are on parse the csv data and hash them to features. You might encode the
> > feature to vector and serialize them to the file system by MapReduce to
> > reduce cost on data parsing.
>
> Currentlly we are not familiar with Vectors, is there a standard way
> (command line )to encode csv files into Vector and serialize them into
> file system,
>

There isn't a good command line for this, largely because it is difficult to
describe how to convert each CSV field.  There is some beginnings of efforts
on this, but the results are still limit.


> And what do you mean by "file system", local file system or HDFS,
> because you mentioned MapReduce
>

That shouldn't much matter.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message