mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sameer Tilak <ssti...@live.com>
Subject RE: Mahout fpg
Date Wed, 20 Nov 2013 23:28:18 GMT
Yes, changing A1234567 to 1234567 resolves that issue trivially. However, (input: userid, itemcode)
itemcode is alphanumeric and not just numeric. I am sure ItemSimilarityJob will be able to
handle that case, however I need to know to supply the input correctly. I am currently using:
(userid, itemocde)(userid, itemocde)(userid, itemocde)(userid, itemocde)….

> Date: Wed, 20 Nov 2013 15:11:49 -0800
> From: suneel_marthi@yahoo.com
> Subject: Re: Mahout fpg
> To: user@mahout.apache.org
> 
> From the stacktrace:
> 
> FAILEDjava.lang.NumberFormatException: For input string: "A1234567"    
> at 
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)   
> 
> Obviously, the input's incorrect.
> 
> 
> 
> 
> 
> On Wednesday, November 20, 2013 6:02 PM, Sameer Tilak <sstilak@live.com> wrote:
>  
> Dear Sebastian,I tried using ItemSimilarityJob.My data has the following format
> Each line contains data in the format:userid    itemid  (I also tried userid, itemcode).
Itemcode is a string. However, I am getting the following error. May be my input format is
incorrect.
> 
>   ./mahout org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob --input
testdata/similarityinput -o testdata/similarityoutput --similarityClassname SIMILARITY_COOCCURRENCE
--maxSimilaritiesPerItem 10    13/11/20 14:46:39 WARN driver.MahoutDriver: No org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.props
found on classpath, will use command-line arguments only13/11/20 14:46:39 INFO common.AbstractJob:
Command line arguments: {--booleanData=[false], --endPhase=[2147483647], --input=[testdata/similarityinput],
--maxPrefs=[500], --maxSimilaritiesPerItem=[10], --minPrefsPerUser=[1], --output=[testdata/similarityoutput],
--similarityClassname=[SIMILARITY_COOCCURRENCE], --startPhase=[0], --tempDir=[temp]}13/11/20
14:46:39 INFO common.AbstractJob: Command line arguments: {--booleanData=[false], --endPhase=[2147483647],
--input=[testdata/similarityinput], --minPrefsPerUser=[1], --output=[temp/prepareRatingMatrix],
>  --ratingShift=[0.0], --startPhase=[0], --tempDir=[temp]}13/11/20 14:46:41 INFO input.FileInputFormat:
Total input paths to process : 113/11/20 14:46:41 INFO util.NativeCodeLoader: Loaded the native-hadoop
library13/11/20 14:46:41 WARN snappy.LoadSnappy: Snappy native library not loaded13/11/20
14:46:41 INFO mapred.JobClient: Running job: job_201311111627_011513/11/20 14:46:42 INFO mapred.JobClient:
 map 0% reduce 0%13/11/20 14:47:00 INFO mapred.JobClient: Task Id : attempt_201311111627_0115_m_000000_0,
Status : FAILEDjava.lang.NumberFormatException: For input string: "A1234567"    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
   at java.lang.Long.parseLong(Long.java:441)    at java.lang.Long.parseLong(Long.java:483)
   at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:50)
   at
>  org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)    at javax.security.auth.Subject.doAs(Subject.java:415)
   at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> 13/11/20 14:47:11 INFO mapred.JobClient: Task Id : attempt_201311111627_0115_m_000000_1,
Status : FAILEDjava.lang.NumberFormatException: For input string: "A1234567"    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
   at java.lang.Long.parseLong(Long.java:441)    at java.lang.Long.parseLong(Long.java:483)
   at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:50)
   at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)    at javax.security.auth.Subject.doAs(Subject.java:415)
   at
>  org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> 
> > Date: Wed, 20 Nov 2013 08:22:07 +0100
> > From: ssc.open@googlemail.com
> > To: user@mahout.apache.org
> > Subject: Re: Mahout fpg
> > 
> > You can use ItemSimilarityJob to find sets of items that cooccur
> > together in your users interactions.
> > 
> > --sebastian
> > 
> > 
> > On 20.11.2013 08:11, Sameer Tilak wrote:
> > > 
> > > 
> > > 
> > > Hi Sunil,
> > > Thanks for your reply. We can benefit a lot from the parallel frequent pattern
matching functionality. Will there be any alternative in future releases? I guess, we can
use older versions of Mahout if we need that.
> > > 
> > >> Date: Tue, 19 Nov 2013 19:25:54 -0800
> > >> From: suneel_marthi@yahoo.com
> > >> Subject: Re: Mahout fpg
> > >> To: user@mahout.apache.org
> > >>
> > >> Fpg has been removed from the codebase as it will not be supported.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Tuesday, November 19, 2013 8:56 PM, Sameer Tilak <sstilak@live.com>
wrote:
> > >>  
> > >> Hi everyone,I downloaded the latest version of Mahout and did mvn install.
When I try to run fog, I get the following errors. Do I need to download and compile FPG separately?
Looks like somehow it has not been included in the list of valid programs.
> > >> 13/11/19 17:49:19 WARN driver.MahoutDriver: Unable to add class: fpg13/11/19
17:49:19 WARN driver.MahoutDriver: No fpg.props found on classpath, will use command-line
arguments onlyUnknown program 'fpg' chosen.Valid program names are:  arff.vector: : Generate
Vectors from an ARFF file or directory  baumwelch: : Baum-Welch algorithm for unsupervised
HMM training  canopy: : Canopy clustering  cat: : Print a file or resource as the logistic
regression models would see it  cleansvd: : Cleanup and verification of SVD output  clusterdump:
: Dump cluster output to text  clusterpp: : Groups Clustering Output In Clusters  cmdump:
: Dump confusion matrix in HTML or text formats  concatmatrices: : Concatenates 2 matrices
of same cardinality into a single matrix  cvb: : LDA via Collapsed Variation Bayes (0th deriv.
approx)  cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.  evaluateFactorization:
: compute RMSE and MAE of a rating
> > >>  matrix factorization against probes  fkmeans: : Fuzzy K-means clustering
 hmmpredict: : Generate random sequence of observations by given HMM  itemsimilarity: : Compute
the item-item-similarities for item-based collaborative filtering  kmeans: : K-means clustering
 lucene.vector: : Generate Vectors from a Lucene index  lucene2seq: : Generate Text SequenceFiles
from a Lucene index  matrixdump: : Dump matrix in CSV format  matrixmult: : Take the product
of two matrices  parallelALS: : ALS-WR factorization of a rating matrix  qualcluster: : Runs
clustering experiments and summarizes results in a CSV  recommendfactorized: : Compute recommendations
using the factorization of a rating matrix  recommenditembased: : Compute recommendations
using item-based collaborative filtering  regexconverter: : Convert text files on a per line
basis based on regular expressions  resplit: : Splits a set of SequenceFiles into a number
of equal splits 
>  rowid: :
> > >>  Map SequenceFile<Text,VectorWritable> to {SequenceFile<IntWritable,VectorWritable>,
SequenceFile<IntWritable,Text>}  rowsimilarity: : Compute the pairwise similarities
of the rows of a matrix  runAdaptiveLogistic: : Score new production data using a probably
trained and validated AdaptivelogisticRegression model  runlogistic: : Run a logistic regression
model against CSV data  seq2encoded: : Encoded Sparse Vector generation from Text sequence
files  seq2sparse: : Sparse Vector generation from Text sequence files  seqdirectory: : Generate
sequence files (of Text) from a directory  seqdumper: : Generic Sequence File dumper  seqmailarchives:
: Creates SequenceFile from a directory containing gzipped mail archives  seqwiki: : Wikipedia
xml dump to sequence file  spectralkmeans: : Spectral k-means clustering  split: : Split Input
data into test and train sets  splitDataset: : split a rating dataset into training and probe
parts  ssvd: :
> > >>  Stochastic SVD  streamingkmeans: : Streaming k-means clustering  svd:
: Lanczos Singular Value Decomposition  testnb: : Test the Vector-based Bayes classifier 
trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model  trainlogistic: : Train
a logistic regression using stochastic gradient descent  trainnb: : Train the Vector-based
Bayes classifier  transpose: : Take the transpose of a matrix  validateAdaptiveLogistic: :
Validate an AdaptivelogisticRegression model against hold-out data set  vecdist: : Compute
the distances between a set of Vectors (or Cluster or Canopy, they must fit in memory) and
a list of Vectors  vectordump: : Dump vectors from a sequence file to text  viterbi: : Viterbi
decoding of hidden states from given output states sequence                          
> > > 
> > >                            
> > > 
> > 
 		 	   		  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message