mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sameer Tilak <ssti...@live.com>
Subject RE: Mahout fpg
Date Wed, 20 Nov 2013 23:01:35 GMT
Dear Sebastian,I tried using ItemSimilarityJob.My data has the following format
Each line contains data in the format:userid    itemid  (I also tried userid, itemcode). Itemcode
is a string. However, I am getting the following error. May be my input format is incorrect.

  ./mahout org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob --input testdata/similarityinput
-o testdata/similarityoutput --similarityClassname SIMILARITY_COOCCURRENCE --maxSimilaritiesPerItem
10    13/11/20 14:46:39 WARN driver.MahoutDriver: No org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.props
found on classpath, will use command-line arguments only13/11/20 14:46:39 INFO common.AbstractJob:
Command line arguments: {--booleanData=[false], --endPhase=[2147483647], --input=[testdata/similarityinput],
--maxPrefs=[500], --maxSimilaritiesPerItem=[10], --minPrefsPerUser=[1], --output=[testdata/similarityoutput],
--similarityClassname=[SIMILARITY_COOCCURRENCE], --startPhase=[0], --tempDir=[temp]}13/11/20
14:46:39 INFO common.AbstractJob: Command line arguments: {--booleanData=[false], --endPhase=[2147483647],
--input=[testdata/similarityinput], --minPrefsPerUser=[1], --output=[temp/prepareRatingMatrix],
--ratingShift=[0.0], --startPhase=[0], --tempDir=[temp]}13/11/20 14:46:41 INFO input.FileInputFormat:
Total input paths to process : 113/11/20 14:46:41 INFO util.NativeCodeLoader: Loaded the native-hadoop
library13/11/20 14:46:41 WARN snappy.LoadSnappy: Snappy native library not loaded13/11/20
14:46:41 INFO mapred.JobClient: Running job: job_201311111627_011513/11/20 14:46:42 INFO mapred.JobClient:
 map 0% reduce 0%13/11/20 14:47:00 INFO mapred.JobClient: Task Id : attempt_201311111627_0115_m_000000_0,
Status : FAILEDjava.lang.NumberFormatException: For input string: "A1234567"	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)	at java.lang.Long.parseLong(Long.java:483)	at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:50)
at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)	at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)	at
org.apache.hadoop.mapred.Child.main(Child.java:249)
13/11/20 14:47:11 INFO mapred.JobClient: Task Id : attempt_201311111627_0115_m_000000_1, Status
: FAILEDjava.lang.NumberFormatException: For input string: "A1234567"	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441)	at java.lang.Long.parseLong(Long.java:483)	at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:50)
at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)	at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)	at
org.apache.hadoop.mapred.Child.main(Child.java:249)

> Date: Wed, 20 Nov 2013 08:22:07 +0100
> From: ssc.open@googlemail.com
> To: user@mahout.apache.org
> Subject: Re: Mahout fpg
> 
> You can use ItemSimilarityJob to find sets of items that cooccur
> together in your users interactions.
> 
> --sebastian
> 
> 
> On 20.11.2013 08:11, Sameer Tilak wrote:
> > 
> > 
> > 
> > Hi Sunil,
> > Thanks for your reply. We can benefit a lot from the parallel frequent pattern matching
functionality. Will there be any alternative in future releases? I guess, we can use older
versions of Mahout if we need that.
> > 
> >> Date: Tue, 19 Nov 2013 19:25:54 -0800
> >> From: suneel_marthi@yahoo.com
> >> Subject: Re: Mahout fpg
> >> To: user@mahout.apache.org
> >>
> >> Fpg has been removed from the codebase as it will not be supported.
> >>
> >>
> >>
> >>
> >>
> >> On Tuesday, November 19, 2013 8:56 PM, Sameer Tilak <sstilak@live.com>
wrote:
> >>  
> >> Hi everyone,I downloaded the latest version of Mahout and did mvn install. When
I try to run fog, I get the following errors. Do I need to download and compile FPG separately?
Looks like somehow it has not been included in the list of valid programs.
> >> 13/11/19 17:49:19 WARN driver.MahoutDriver: Unable to add class: fpg13/11/19
17:49:19 WARN driver.MahoutDriver: No fpg.props found on classpath, will use command-line
arguments onlyUnknown program 'fpg' chosen.Valid program names are:  arff.vector: : Generate
Vectors from an ARFF file or directory  baumwelch: : Baum-Welch algorithm for unsupervised
HMM training  canopy: : Canopy clustering  cat: : Print a file or resource as the logistic
regression models would see it  cleansvd: : Cleanup and verification of SVD output  clusterdump:
: Dump cluster output to text  clusterpp: : Groups Clustering Output In Clusters  cmdump:
: Dump confusion matrix in HTML or text formats  concatmatrices: : Concatenates 2 matrices
of same cardinality into a single matrix  cvb: : LDA via Collapsed Variation Bayes (0th deriv.
approx)  cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.  evaluateFactorization:
: compute RMSE and MAE of a rating
> >>  matrix factorization against probes  fkmeans: : Fuzzy K-means clustering  hmmpredict:
: Generate random sequence of observations by given HMM  itemsimilarity: : Compute the item-item-similarities
for item-based collaborative filtering  kmeans: : K-means clustering  lucene.vector: : Generate
Vectors from a Lucene index  lucene2seq: : Generate Text SequenceFiles from a Lucene index
 matrixdump: : Dump matrix in CSV format  matrixmult: : Take the product of two matrices 
parallelALS: : ALS-WR factorization of a rating matrix  qualcluster: : Runs clustering experiments
and summarizes results in a CSV  recommendfactorized: : Compute recommendations using the
factorization of a rating matrix  recommenditembased: : Compute recommendations using item-based
collaborative filtering  regexconverter: : Convert text files on a per line basis based on
regular expressions  resplit: : Splits a set of SequenceFiles into a number of equal splits
 rowid: :
> >>  Map SequenceFile<Text,VectorWritable> to {SequenceFile<IntWritable,VectorWritable>,
SequenceFile<IntWritable,Text>}  rowsimilarity: : Compute the pairwise similarities
of the rows of a matrix  runAdaptiveLogistic: : Score new production data using a probably
trained and validated AdaptivelogisticRegression model  runlogistic: : Run a logistic regression
model against CSV data  seq2encoded: : Encoded Sparse Vector generation from Text sequence
files  seq2sparse: : Sparse Vector generation from Text sequence files  seqdirectory: : Generate
sequence files (of Text) from a directory  seqdumper: : Generic Sequence File dumper  seqmailarchives:
: Creates SequenceFile from a directory containing gzipped mail archives  seqwiki: : Wikipedia
xml dump to sequence file  spectralkmeans: : Spectral k-means clustering  split: : Split Input
data into test and train sets  splitDataset: : split a rating dataset into training and probe
parts  ssvd: :
> >>  Stochastic SVD  streamingkmeans: : Streaming k-means clustering  svd: : Lanczos
Singular Value Decomposition  testnb: : Test the Vector-based Bayes classifier  trainAdaptiveLogistic:
: Train an AdaptivelogisticRegression model  trainlogistic: : Train a logistic regression
using stochastic gradient descent  trainnb: : Train the Vector-based Bayes classifier  transpose:
: Take the transpose of a matrix  validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression
model against hold-out data set  vecdist: : Compute the distances between a set of Vectors
(or Cluster or Canopy, they must fit in memory) and a list of Vectors  vectordump: : Dump
vectors from a sequence file to text  viterbi: : Viterbi decoding of hidden states from given
output states sequence                           
> > 
> >  		 	   		  
> > 
> 
 		 	   		  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message