mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sameer Tilak <>
Subject Parallel Frequent Pattern Mining input format
Date Tue, 19 Nov 2013 21:55:37 GMT
Hi everyone,
I am interested in using Mahout for analyzing data -- in particular frequent pattern mining
using Mahout's FPG algorithm. My data can be expressed as a MXN matrix. Each row represents
a given user where as columns represent the items (1 if a given user has viewed a particular
item 0 otherwise). We will have millions of rows and columns. I have following two questions:
1. Can anyone please tell me the input file format for the FPG algorithm? The documentation
says that: "Input files have to be in the following format.<optional document id>TAB<TOKEN1>SPACE<TOKEN2>SPACEā€¦."
I looked at retail.dat and accident.dat, but not sure how the  format in documentation is
mapped onto them. Any thoughts on representing data would be great. 
2. Any thoughts on scalability of FPG's implementation to our problem size. 		 	   		  
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message