mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From qiaoresearcher <qiaoresearc...@gmail.com>
Subject Re: need help on mahout
Date Fri, 09 Nov 2012 17:00:25 GMT
You are right, I have labels for each user, I just need some example code
to run the job quickly.

The example code should have steps similar to what I described: read the
gzip file, construct the webpage set, form the input vector for each user,
then call some classification/clustering algorithm,

does mahout has example like this?

On Fri, Nov 9, 2012 at 10:49 AM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:

> if it is supervised classification, your input should contain the groups.
> te idea is that you extend knowledge hidden in  a smaller perhaps expert
> labeled dataset to the rest of the universe.
> On Nov 9, 2012 8:43 AM, "qiaoresearcher" <qiaoresearcher@gmail.com> wrote:
>
> > It is a supervised classification problem.
> >
> > For example, a very simple case:
> > say, overall we collect 4 pages from the data set:  { web_page 1
>  web_page
> > 2 web_page 3 web_page 4  }
> > then users may have input vectors like:
> > user1 [1 1  0  0]
> > user2 [1 1  0  0]
> > user3 [0 0  1  1]
> > user4 [0 0  1  1]
> > user5 [0 0  1  1]
> >   ...       ....
> >
> > then whatever classification algorithm mahout has should return
> > classification results as
> > group 1 { user1, user2}
> > group 2 { user3, user4, user5 }
> >
> >
> >
> > On Fri, Nov 9, 2012 at 10:29 AM, Sean Owen <srowen@gmail.com> wrote:
> >
> > > First: what question are you trying to answer from this data? You are
> > > trying to classify users into what, for what purpose?
> > >
> > >
> > > On Fri, Nov 9, 2012 at 4:20 PM, qiaoresearcher <
> qiaoresearcher@gmail.com
> > > >wrote:
> > >
> > > > Hi All,
> > > >
> > > > Assume the data is stored in a gzip file which includes many text
> > files.
> > > > Within each text file, each line represents an activity of a user,
> for
> > > > example, a click on a web page.
> > > > the text file will look like:
> > > >
> > > >
> > >
> >
> ----------------------------------------------------------------------------------
> > > > user 1   time11  visiting_web_page11
> > > > user 2   time21  visiting_web_page21
> > > > user 1   time12  visiting_web_page12
> > > > user 1   time13  visiting_web_page13
> > > > user 2   time22  visiting_web_page22
> > > > user 3   time31  visiting_web_page31
> > > > user 1   time14  visiting_web_page14
> > > >  ...           ....                ..........
> > > >
> > > > I am thinking to first construct a web page set like
> > > > { visiting_web_page11, visiting_web_page12, visiting_web_page31,
> > .......
> > > }
> > > >
> > > > then for each user, we form a vector [ 1  0 0  1 0  0  .....    ]
> >  where
> > > > '1' means the user visited that page and 0 means he did not
> > > > then use mahout to classify the users based on the vectors
> > > >
> > > > does mahout has example like this? if not, what kind of java code we
> > need
> > > > to write to process this task?
> > > >
> > > > thanks for any suggestions in advance !
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message