mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From qiaoresearcher <qiaoresearc...@gmail.com>
Subject need help on mahout
Date Fri, 09 Nov 2012 16:20:31 GMT
Hi All,

Assume the data is stored in a gzip file which includes many text files.
Within each text file, each line represents an activity of a user, for
example, a click on a web page.
the text file will look like:
----------------------------------------------------------------------------------
user 1   time11  visiting_web_page11
user 2   time21  visiting_web_page21
user 1   time12  visiting_web_page12
user 1   time13  visiting_web_page13
user 2   time22  visiting_web_page22
user 3   time31  visiting_web_page31
user 1   time14  visiting_web_page14
 ...           ....                ..........

I am thinking to first construct a web page set like
{ visiting_web_page11, visiting_web_page12, visiting_web_page31, ....... }

then for each user, we form a vector [ 1  0 0  1 0  0  .....    ]  where
'1' means the user visited that page and 0 means he did not
then use mahout to classify the users based on the vectors

does mahout has example like this? if not, what kind of java code we need
to write to process this task?

thanks for any suggestions in advance !

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message