mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suneel Marthi <suneel_mar...@yahoo.com>
Subject Re: Mahout fpg
Date Wed, 20 Nov 2013 23:11:49 GMT
>From the stacktrace:

FAILEDjava.lang.NumberFormatException: For input string: "A1234567"    
at 
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)   

Obviously, the input's incorrect.





On Wednesday, November 20, 2013 6:02 PM, Sameer Tilak <sstilak@live.com> wrote:
 
Dear Sebastian,I tried using ItemSimilarityJob.My data has the following format
Each line contains data in the format:userid    itemid  (I also tried userid, itemcode).
Itemcode is a string. However, I am getting the following error. May be my input format is
incorrect.

  ./mahout org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob --input testdata/similarityinput
-o testdata/similarityoutput --similarityClassname SIMILARITY_COOCCURRENCE --maxSimilaritiesPerItem
10    13/11/20 14:46:39 WARN driver.MahoutDriver: No org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.props
found on classpath, will use command-line arguments only13/11/20 14:46:39 INFO common.AbstractJob:
Command line arguments: {--booleanData=[false], --endPhase=[2147483647], --input=[testdata/similarityinput],
--maxPrefs=[500], --maxSimilaritiesPerItem=[10], --minPrefsPerUser=[1], --output=[testdata/similarityoutput],
--similarityClassname=[SIMILARITY_COOCCURRENCE], --startPhase=[0], --tempDir=[temp]}13/11/20
14:46:39 INFO common.AbstractJob: Command line arguments: {--booleanData=[false], --endPhase=[2147483647],
--input=[testdata/similarityinput], --minPrefsPerUser=[1], --output=[temp/prepareRatingMatrix],
 --ratingShift=[0.0], --startPhase=[0], --tempDir=[temp]}13/11/20 14:46:41 INFO input.FileInputFormat:
Total input paths to process : 113/11/20 14:46:41 INFO util.NativeCodeLoader: Loaded the native-hadoop
library13/11/20 14:46:41 WARN snappy.LoadSnappy: Snappy native library not loaded13/11/20
14:46:41 INFO mapred.JobClient: Running job: job_201311111627_011513/11/20 14:46:42 INFO mapred.JobClient: 
map 0% reduce 0%13/11/20 14:47:00 INFO mapred.JobClient: Task Id : attempt_201311111627_0115_m_000000_0,
Status : FAILEDjava.lang.NumberFormatException: For input string: "A1234567"    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)   
at java.lang.Long.parseLong(Long.java:441)    at java.lang.Long.parseLong(Long.java:483)   
at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:50)   
at
 org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31)   
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)   
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)   
at java.security.AccessController.doPrivileged(Native Method)    at javax.security.auth.Subject.doAs(Subject.java:415)   
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)   
at org.apache.hadoop.mapred.Child.main(Child.java:249)
13/11/20 14:47:11 INFO mapred.JobClient: Task Id : attempt_201311111627_0115_m_000000_1, Status
: FAILEDjava.lang.NumberFormatException: For input string: "A1234567"    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)   
at java.lang.Long.parseLong(Long.java:441)    at java.lang.Long.parseLong(Long.java:483)   
at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:50)   
at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31)   
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)   
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)   
at java.security.AccessController.doPrivileged(Native Method)    at javax.security.auth.Subject.doAs(Subject.java:415)   
at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)   
at org.apache.hadoop.mapred.Child.main(Child.java:249)

> Date: Wed, 20 Nov 2013 08:22:07 +0100
> From: ssc.open@googlemail.com
> To: user@mahout.apache.org
> Subject: Re: Mahout fpg
> 
> You can use ItemSimilarityJob to find sets of items that cooccur
> together in your users interactions.
> 
> --sebastian
> 
> 
> On 20.11.2013 08:11, Sameer Tilak wrote:
> > 
> > 
> > 
> > Hi Sunil,
> > Thanks for your reply. We can benefit a lot from the parallel frequent pattern matching
functionality. Will there be any alternative in future releases? I guess, we can use older
versions of Mahout if we need that.
> > 
> >> Date: Tue, 19 Nov 2013 19:25:54 -0800
> >> From: suneel_marthi@yahoo.com
> >> Subject: Re: Mahout fpg
> >> To: user@mahout.apache.org
> >>
> >> Fpg has been removed from the codebase as it will not be supported.
> >>
> >>
> >>
> >>
> >>
> >> On Tuesday, November 19, 2013 8:56 PM, Sameer Tilak <sstilak@live.com>
wrote:
> >>  
> >> Hi everyone,I downloaded the latest version of Mahout and did mvn install. When
I try to run fog, I get the following errors. Do I need to download and compile FPG separately?
Looks like somehow it has not been included in the list of valid programs.
> >> 13/11/19 17:49:19 WARN driver.MahoutDriver: Unable to add class: fpg13/11/19
17:49:19 WARN driver.MahoutDriver: No fpg.props found on classpath, will use command-line
arguments onlyUnknown program 'fpg' chosen.Valid program names are:  arff.vector: : Generate
Vectors from an ARFF file or directory  baumwelch: : Baum-Welch algorithm for unsupervised
HMM training  canopy: : Canopy clustering  cat: : Print a file or resource as the logistic
regression models would see it  cleansvd: : Cleanup and verification of SVD output  clusterdump:
: Dump cluster output to text  clusterpp: : Groups Clustering Output In Clusters  cmdump:
: Dump confusion matrix in HTML or text formats  concatmatrices: : Concatenates 2 matrices
of same cardinality into a single matrix  cvb: : LDA via Collapsed Variation Bayes (0th deriv.
approx)  cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.  evaluateFactorization:
: compute RMSE and MAE of a rating
> >>  matrix factorization against probes  fkmeans: : Fuzzy K-means clustering 
hmmpredict: : Generate random sequence of observations by given HMM  itemsimilarity: : Compute
the item-item-similarities for item-based collaborative filtering  kmeans: : K-means clustering 
lucene.vector: : Generate Vectors from a Lucene index  lucene2seq: : Generate Text SequenceFiles
from a Lucene index  matrixdump: : Dump matrix in CSV format  matrixmult: : Take the product
of two matrices  parallelALS: : ALS-WR factorization of a rating matrix  qualcluster: :
Runs clustering experiments and summarizes results in a CSV  recommendfactorized: : Compute
recommendations using the factorization of a rating matrix  recommenditembased: : Compute
recommendations using item-based collaborative filtering  regexconverter: : Convert text
files on a per line basis based on regular expressions  resplit: : Splits a set of SequenceFiles
into a number of equal splits 
 rowid: :
> >>  Map SequenceFile<Text,VectorWritable> to {SequenceFile<IntWritable,VectorWritable>,
SequenceFile<IntWritable,Text>}  rowsimilarity: : Compute the pairwise similarities
of the rows of a matrix  runAdaptiveLogistic: : Score new production data using a probably
trained and validated AdaptivelogisticRegression model  runlogistic: : Run a logistic regression
model against CSV data  seq2encoded: : Encoded Sparse Vector generation from Text sequence
files  seq2sparse: : Sparse Vector generation from Text sequence files  seqdirectory: :
Generate sequence files (of Text) from a directory  seqdumper: : Generic Sequence File dumper 
seqmailarchives: : Creates SequenceFile from a directory containing gzipped mail archives 
seqwiki: : Wikipedia xml dump to sequence file  spectralkmeans: : Spectral k-means clustering 
split: : Split Input data into test and train sets  splitDataset: : split a rating dataset
into training and probe parts  ssvd: :
> >>  Stochastic SVD  streamingkmeans: : Streaming k-means clustering  svd: :
Lanczos Singular Value Decomposition  testnb: : Test the Vector-based Bayes classifier 
trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model  trainlogistic: : Train
a logistic regression using stochastic gradient descent  trainnb: : Train the Vector-based
Bayes classifier  transpose: : Take the transpose of a matrix  validateAdaptiveLogistic:
: Validate an AdaptivelogisticRegression model against hold-out data set  vecdist: : Compute
the distances between a set of Vectors (or Cluster or Canopy, they must fit in memory) and
a list of Vectors  vectordump: : Dump vectors from a sequence file to text  viterbi: : Viterbi
decoding of hidden states from given output states sequence                     
    
> > 
> >                            
> > 
> 
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message