mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sameer Tilak <ssti...@live.com>
Subject RE: Mahout fpg
Date Thu, 21 Nov 2013 19:09:00 GMT
Sebastian,Thanks for the clarification.

> Date: Thu, 21 Nov 2013 17:51:12 +0100
> From: ssc.open@googlemail.com
> To: user@mahout.apache.org
> Subject: Re: Mahout fpg
> 
> ItemSimilarityJob does not handle alphanumeric identifiers. You have to
> preprocess your data before running that job.
> 
> --sebastian
> 
> On 21.11.2013 00:28, Sameer Tilak wrote:
> > Yes, changing A1234567 to 1234567 resolves that issue trivially. However, (input:
userid, itemcode) itemcode is alphanumeric and not just numeric. I am sure ItemSimilarityJob
will be able to handle that case, however I need to know to supply the input correctly. I
am currently using:
> > (userid, itemocde)(userid, itemocde)(userid, itemocde)(userid, itemocde)….
> > 
> >> Date: Wed, 20 Nov 2013 15:11:49 -0800
> >> From: suneel_marthi@yahoo.com
> >> Subject: Re: Mahout fpg
> >> To: user@mahout.apache.org
> >>
> >> From the stacktrace:
> >>
> >> FAILEDjava.lang.NumberFormatException: For input string: "A1234567"    
> >> at 
> >> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
  
> >>
> >> Obviously, the input's incorrect.
> >>
> >>
> >>
> >>
> >>
> >> On Wednesday, November 20, 2013 6:02 PM, Sameer Tilak <sstilak@live.com>
wrote:
> >>  
> >> Dear Sebastian,I tried using ItemSimilarityJob.My data has the following format
> >> Each line contains data in the format:userid    itemid  (I also tried userid,
itemcode). Itemcode is a string. However, I am getting the following error. May be my input
format is incorrect.
> >>
> >>   ./mahout org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob
--input testdata/similarityinput -o testdata/similarityoutput --similarityClassname SIMILARITY_COOCCURRENCE
--maxSimilaritiesPerItem 10    13/11/20 14:46:39 WARN driver.MahoutDriver: No org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.props
found on classpath, will use command-line arguments only13/11/20 14:46:39 INFO common.AbstractJob:
Command line arguments: {--booleanData=[false], --endPhase=[2147483647], --input=[testdata/similarityinput],
--maxPrefs=[500], --maxSimilaritiesPerItem=[10], --minPrefsPerUser=[1], --output=[testdata/similarityoutput],
--similarityClassname=[SIMILARITY_COOCCURRENCE], --startPhase=[0], --tempDir=[temp]}13/11/20
14:46:39 INFO common.AbstractJob: Command line arguments: {--booleanData=[false], --endPhase=[2147483647],
--input=[testdata/similarityinput], --minPrefsPerUser=[1], --output=[temp/prepareRatingMatrix],
> >>  --ratingShift=[0.0], --startPhase=[0], --tempDir=[temp]}13/11/20 14:46:41 INFO
input.FileInputFormat: Total input paths to process : 113/11/20 14:46:41 INFO util.NativeCodeLoader:
Loaded the native-hadoop library13/11/20 14:46:41 WARN snappy.LoadSnappy: Snappy native library
not loaded13/11/20 14:46:41 INFO mapred.JobClient: Running job: job_201311111627_011513/11/20
14:46:42 INFO mapred.JobClient:  map 0% reduce 0%13/11/20 14:47:00 INFO mapred.JobClient:
Task Id : attempt_201311111627_0115_m_000000_0, Status : FAILEDjava.lang.NumberFormatException:
For input string: "A1234567"    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
   at java.lang.Long.parseLong(Long.java:441)    at java.lang.Long.parseLong(Long.java:483)
   at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:50)
   at
> >>  org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)    at javax.security.auth.Subject.doAs(Subject.java:415)
   at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> >> 13/11/20 14:47:11 INFO mapred.JobClient: Task Id : attempt_201311111627_0115_m_000000_1,
Status : FAILEDjava.lang.NumberFormatException: For input string: "A1234567"    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
   at java.lang.Long.parseLong(Long.java:441)    at java.lang.Long.parseLong(Long.java:483)
   at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:50)
   at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)    at javax.security.auth.Subject.doAs(Subject.java:415)
   at
> >>  org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> >>
> >>> Date: Wed, 20 Nov 2013 08:22:07 +0100
> >>> From: ssc.open@googlemail.com
> >>> To: user@mahout.apache.org
> >>> Subject: Re: Mahout fpg
> >>>
> >>> You can use ItemSimilarityJob to find sets of items that cooccur
> >>> together in your users interactions.
> >>>
> >>> --sebastian
> >>>
> >>>
> >>> On 20.11.2013 08:11, Sameer Tilak wrote:
> >>>>
> >>>>
> >>>>
> >>>> Hi Sunil,
> >>>> Thanks for your reply. We can benefit a lot from the parallel frequent
pattern matching functionality. Will there be any alternative in future releases? I guess,
we can use older versions of Mahout if we need that.
> >>>>
> >>>>> Date: Tue, 19 Nov 2013 19:25:54 -0800
> >>>>> From: suneel_marthi@yahoo.com
> >>>>> Subject: Re: Mahout fpg
> >>>>> To: user@mahout.apache.org
> >>>>>
> >>>>> Fpg has been removed from the codebase as it will not be supported.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Tuesday, November 19, 2013 8:56 PM, Sameer Tilak <sstilak@live.com>
wrote:
> >>>>>  
> >>>>> Hi everyone,I downloaded the latest version of Mahout and did mvn
install. When I try to run fog, I get the following errors. Do I need to download and compile
FPG separately? Looks like somehow it has not been included in the list of valid programs.
> >>>>> 13/11/19 17:49:19 WARN driver.MahoutDriver: Unable to add class:
fpg13/11/19 17:49:19 WARN driver.MahoutDriver: No fpg.props found on classpath, will use command-line
arguments onlyUnknown program 'fpg' chosen.Valid program names are:  arff.vector: : Generate
Vectors from an ARFF file or directory  baumwelch: : Baum-Welch algorithm for unsupervised
HMM training  canopy: : Canopy clustering  cat: : Print a file or resource as the logistic
regression models would see it  cleansvd: : Cleanup and verification of SVD output  clusterdump:
: Dump cluster output to text  clusterpp: : Groups Clustering Output In Clusters  cmdump:
: Dump confusion matrix in HTML or text formats  concatmatrices: : Concatenates 2 matrices
of same cardinality into a single matrix  cvb: : LDA via Collapsed Variation Bayes (0th deriv.
approx)  cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.  evaluateFactorization:
: compute RMSE and MAE of a rating
> >>>>>  matrix factorization against probes  fkmeans: : Fuzzy K-means clustering
 hmmpredict: : Generate random sequence of observations by given HMM  itemsimilarity: : Compute
the item-item-similarities for item-based collaborative filtering  kmeans: : K-means clustering
 lucene.vector: : Generate Vectors from a Lucene index  lucene2seq: : Generate Text SequenceFiles
from a Lucene index  matrixdump: : Dump matrix in CSV format  matrixmult: : Take the product
of two matrices  parallelALS: : ALS-WR factorization of a rating matrix  qualcluster: : Runs
clustering experiments and summarizes results in a CSV  recommendfactorized: : Compute recommendations
using the factorization of a rating matrix  recommenditembased: : Compute recommendations
using item-based collaborative filtering  regexconverter: : Convert text files on a per line
basis based on regular expressions  resplit: : Splits a set of SequenceFiles into a number
of equal splits 
> >>  rowid: :
> >>>>>  Map SequenceFile<Text,VectorWritable> to {SequenceFile<IntWritable,VectorWritable>,
SequenceFile<IntWritable,Text>}  rowsimilarity: : Compute the pairwise similarities
of the rows of a matrix  runAdaptiveLogistic: : Score new production data using a probably
trained and validated AdaptivelogisticRegression model  runlogistic: : Run a logistic regression
model against CSV data  seq2encoded: : Encoded Sparse Vector generation from Text sequence
files  seq2sparse: : Sparse Vector generation from Text sequence files  seqdirectory: : Generate
sequence files (of Text) from a directory  seqdumper: : Generic Sequence File dumper  seqmailarchives:
: Creates SequenceFile from a directory containing gzipped mail archives  seqwiki: : Wikipedia
xml dump to sequence file  spectralkmeans: : Spectral k-means clustering  split: : Split Input
data into test and train sets  splitDataset: : split a rating dataset into training and probe
parts  ssvd: :
> >>>>>  Stochastic SVD  streamingkmeans: : Streaming k-means clustering
 svd: : Lanczos Singular Value Decomposition  testnb: : Test the Vector-based Bayes classifier
 trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model  trainlogistic: : Train
a logistic regression using stochastic gradient descent  trainnb: : Train the Vector-based
Bayes classifier  transpose: : Take the transpose of a matrix  validateAdaptiveLogistic: :
Validate an AdaptivelogisticRegression model against hold-out data set  vecdist: : Compute
the distances between a set of Vectors (or Cluster or Canopy, they must fit in memory) and
a list of Vectors  vectordump: : Dump vectors from a sequence file to text  viterbi: : Viterbi
decoding of hidden states from given output states sequence                          
> >>>>
> >>>>                            
> >>>>
> >>>
> >  		 	   		  
> > 
> 
 		 	   		  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message