>From the stacktrace:
FAILEDjava.lang.NumberFormatException: For input string: "A1234567"
at
java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
Obviously, the input's incorrect.
On Wednesday, November 20, 2013 6:02 PM, Sameer Tilak <sstilak@live.com> wrote:
Dear Sebastian,I tried using ItemSimilarityJob.My data has the following format
Each line contains data in the format:userid itemid (I also tried userid, itemcode).
Itemcode is a string. However, I am getting the following error. May be my input format is
incorrect.
./mahout org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob --input testdata/similarityinput
-o testdata/similarityoutput --similarityClassname SIMILARITY_COOCCURRENCE --maxSimilaritiesPerItem
10 13/11/20 14:46:39 WARN driver.MahoutDriver: No org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.props
found on classpath, will use command-line arguments only13/11/20 14:46:39 INFO common.AbstractJob:
Command line arguments: {--booleanData=[false], --endPhase=[2147483647], --input=[testdata/similarityinput],
--maxPrefs=[500], --maxSimilaritiesPerItem=[10], --minPrefsPerUser=[1], --output=[testdata/similarityoutput],
--similarityClassname=[SIMILARITY_COOCCURRENCE], --startPhase=[0], --tempDir=[temp]}13/11/20
14:46:39 INFO common.AbstractJob: Command line arguments: {--booleanData=[false], --endPhase=[2147483647],
--input=[testdata/similarityinput], --minPrefsPerUser=[1], --output=[temp/prepareRatingMatrix],
--ratingShift=[0.0], --startPhase=[0], --tempDir=[temp]}13/11/20 14:46:41 INFO input.FileInputFormat:
Total input paths to process : 113/11/20 14:46:41 INFO util.NativeCodeLoader: Loaded the native-hadoop
library13/11/20 14:46:41 WARN snappy.LoadSnappy: Snappy native library not loaded13/11/20
14:46:41 INFO mapred.JobClient: Running job: job_201311111627_011513/11/20 14:46:42 INFO mapred.JobClient:
map 0% reduce 0%13/11/20 14:47:00 INFO mapred.JobClient: Task Id : attempt_201311111627_0115_m_000000_0,
Status : FAILEDjava.lang.NumberFormatException: For input string: "A1234567" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483)
at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:50)
at
org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
13/11/20 14:47:11 INFO mapred.JobClient: Task Id : attempt_201311111627_0115_m_000000_1, Status
: FAILEDjava.lang.NumberFormatException: For input string: "A1234567" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:441) at java.lang.Long.parseLong(Long.java:483)
at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:50)
at org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Date: Wed, 20 Nov 2013 08:22:07 +0100
> From: ssc.open@googlemail.com
> To: user@mahout.apache.org
> Subject: Re: Mahout fpg
>
> You can use ItemSimilarityJob to find sets of items that cooccur
> together in your users interactions.
>
> --sebastian
>
>
> On 20.11.2013 08:11, Sameer Tilak wrote:
> >
> >
> >
> > Hi Sunil,
> > Thanks for your reply. We can benefit a lot from the parallel frequent pattern matching
functionality. Will there be any alternative in future releases? I guess, we can use older
versions of Mahout if we need that.
> >
> >> Date: Tue, 19 Nov 2013 19:25:54 -0800
> >> From: suneel_marthi@yahoo.com
> >> Subject: Re: Mahout fpg
> >> To: user@mahout.apache.org
> >>
> >> Fpg has been removed from the codebase as it will not be supported.
> >>
> >>
> >>
> >>
> >>
> >> On Tuesday, November 19, 2013 8:56 PM, Sameer Tilak <sstilak@live.com>
wrote:
> >>
> >> Hi everyone,I downloaded the latest version of Mahout and did mvn install. When
I try to run fog, I get the following errors. Do I need to download and compile FPG separately?
Looks like somehow it has not been included in the list of valid programs.
> >> 13/11/19 17:49:19 WARN driver.MahoutDriver: Unable to add class: fpg13/11/19
17:49:19 WARN driver.MahoutDriver: No fpg.props found on classpath, will use command-line
arguments onlyUnknown program 'fpg' chosen.Valid program names are: arff.vector: : Generate
Vectors from an ARFF file or directory baumwelch: : Baum-Welch algorithm for unsupervised
HMM training canopy: : Canopy clustering cat: : Print a file or resource as the logistic
regression models would see it cleansvd: : Cleanup and verification of SVD output clusterdump:
: Dump cluster output to text clusterpp: : Groups Clustering Output In Clusters cmdump:
: Dump confusion matrix in HTML or text formats concatmatrices: : Concatenates 2 matrices
of same cardinality into a single matrix cvb: : LDA via Collapsed Variation Bayes (0th deriv.
approx) cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally. evaluateFactorization:
: compute RMSE and MAE of a rating
> >> matrix factorization against probes fkmeans: : Fuzzy K-means clustering
hmmpredict: : Generate random sequence of observations by given HMM itemsimilarity: : Compute
the item-item-similarities for item-based collaborative filtering kmeans: : K-means clustering
lucene.vector: : Generate Vectors from a Lucene index lucene2seq: : Generate Text SequenceFiles
from a Lucene index matrixdump: : Dump matrix in CSV format matrixmult: : Take the product
of two matrices parallelALS: : ALS-WR factorization of a rating matrix qualcluster: :
Runs clustering experiments and summarizes results in a CSV recommendfactorized: : Compute
recommendations using the factorization of a rating matrix recommenditembased: : Compute
recommendations using item-based collaborative filtering regexconverter: : Convert text
files on a per line basis based on regular expressions resplit: : Splits a set of SequenceFiles
into a number of equal splits
rowid: :
> >> Map SequenceFile<Text,VectorWritable> to {SequenceFile<IntWritable,VectorWritable>,
SequenceFile<IntWritable,Text>} rowsimilarity: : Compute the pairwise similarities
of the rows of a matrix runAdaptiveLogistic: : Score new production data using a probably
trained and validated AdaptivelogisticRegression model runlogistic: : Run a logistic regression
model against CSV data seq2encoded: : Encoded Sparse Vector generation from Text sequence
files seq2sparse: : Sparse Vector generation from Text sequence files seqdirectory: :
Generate sequence files (of Text) from a directory seqdumper: : Generic Sequence File dumper
seqmailarchives: : Creates SequenceFile from a directory containing gzipped mail archives
seqwiki: : Wikipedia xml dump to sequence file spectralkmeans: : Spectral k-means clustering
split: : Split Input data into test and train sets splitDataset: : split a rating dataset
into training and probe parts ssvd: :
> >> Stochastic SVD streamingkmeans: : Streaming k-means clustering svd: :
Lanczos Singular Value Decomposition testnb: : Test the Vector-based Bayes classifier
trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model trainlogistic: : Train
a logistic regression using stochastic gradient descent trainnb: : Train the Vector-based
Bayes classifier transpose: : Take the transpose of a matrix validateAdaptiveLogistic:
: Validate an AdaptivelogisticRegression model against hold-out data set vecdist: : Compute
the distances between a set of Vectors (or Cluster or Canopy, they must fit in memory) and
a list of Vectors vectordump: : Dump vectors from a sequence file to text viterbi: : Viterbi
decoding of hidden states from given output states sequence
> >
> >
> >
>
|