mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Lee <wua...@gmail.com>
Subject Re: Mahout fpg
Date Fri, 22 Nov 2013 09:55:13 GMT
I noticed lots of algorithms implementations has deprecated in Mahout 0.8
and removed in 0.9,  but no reasons or comments been marked. Can i ask why?

Btw, Mahout API is a little lack javadoc comments, every contributors of
Mahout should has the responsibility to add more javadoc comments to the
java file they created.


On Fri, Nov 22, 2013 at 3:09 AM, Sameer Tilak <sstilak@live.com> wrote:

> Sebastian,Thanks for the clarification.
>
> > Date: Thu, 21 Nov 2013 17:51:12 +0100
> > From: ssc.open@googlemail.com
> > To: user@mahout.apache.org
> > Subject: Re: Mahout fpg
> >
> > ItemSimilarityJob does not handle alphanumeric identifiers. You have to
> > preprocess your data before running that job.
> >
> > --sebastian
> >
> > On 21.11.2013 00:28, Sameer Tilak wrote:
> > > Yes, changing A1234567 to 1234567 resolves that issue trivially.
> However, (input: userid, itemcode) itemcode is alphanumeric and not just
> numeric. I am sure ItemSimilarityJob will be able to handle that case,
> however I need to know to supply the input correctly. I am currently using:
> > > (userid, itemocde)(userid, itemocde)(userid, itemocde)(userid,
> itemocde)….
> > >
> > >> Date: Wed, 20 Nov 2013 15:11:49 -0800
> > >> From: suneel_marthi@yahoo.com
> > >> Subject: Re: Mahout fpg
> > >> To: user@mahout.apache.org
> > >>
> > >> From the stacktrace:
> > >>
> > >> FAILEDjava.lang.NumberFormatException: For input string: "A1234567"
> > >> at
> > >>
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
> > >>
> > >> Obviously, the input's incorrect.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Wednesday, November 20, 2013 6:02 PM, Sameer Tilak <
> sstilak@live.com> wrote:
> > >>
> > >> Dear Sebastian,I tried using ItemSimilarityJob.My data has the
> following format
> > >> Each line contains data in the format:userid    itemid  (I also tried
> userid, itemcode). Itemcode is a string. However, I am getting the
> following error. May be my input format is incorrect.
> > >>
> > >>   ./mahout
> org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob --input
> testdata/similarityinput -o testdata/similarityoutput --similarityClassname
> SIMILARITY_COOCCURRENCE --maxSimilaritiesPerItem 10    13/11/20 14:46:39
> WARN driver.MahoutDriver: No
> org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob.props
> found on classpath, will use command-line arguments only13/11/20 14:46:39
> INFO common.AbstractJob: Command line arguments: {--booleanData=[false],
> --endPhase=[2147483647], --input=[testdata/similarityinput],
> --maxPrefs=[500], --maxSimilaritiesPerItem=[10], --minPrefsPerUser=[1],
> --output=[testdata/similarityoutput],
> --similarityClassname=[SIMILARITY_COOCCURRENCE], --startPhase=[0],
> --tempDir=[temp]}13/11/20 14:46:39 INFO common.AbstractJob: Command line
> arguments: {--booleanData=[false], --endPhase=[2147483647],
> --input=[testdata/similarityinput], --minPrefsPerUser=[1],
> --output=[temp/prepareRatingMatrix],
> > >>  --ratingShift=[0.0], --startPhase=[0], --tempDir=[temp]}13/11/20
> 14:46:41 INFO input.FileInputFormat: Total input paths to process :
> 113/11/20 14:46:41 INFO util.NativeCodeLoader: Loaded the native-hadoop
> library13/11/20 14:46:41 WARN snappy.LoadSnappy: Snappy native library not
> loaded13/11/20 14:46:41 INFO mapred.JobClient: Running job:
> job_201311111627_011513/11/20 14:46:42 INFO mapred.JobClient:  map 0%
> reduce 0%13/11/20 14:47:00 INFO mapred.JobClient: Task Id :
> attempt_201311111627_0115_m_000000_0, Status :
> FAILEDjava.lang.NumberFormatException: For input string: "A1234567"    at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>    at java.lang.Long.parseLong(Long.java:441)    at
> java.lang.Long.parseLong(Long.java:483)    at
> org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:50)
>    at
> > >>
>  org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31)
>    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)    at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)    at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)    at
> org.apache.hadoop.mapred.Child$4.run(Child.java:255)    at
> java.security.AccessController.doPrivileged(Native Method)    at
> javax.security.auth.Subject.doAs(Subject.java:415)    at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>    at org.apache.hadoop.mapred.Child.main(Child.java:249)
> > >> 13/11/20 14:47:11 INFO mapred.JobClient: Task Id :
> attempt_201311111627_0115_m_000000_1, Status :
> FAILEDjava.lang.NumberFormatException: For input string: "A1234567"    at
> java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
>    at java.lang.Long.parseLong(Long.java:441)    at
> java.lang.Long.parseLong(Long.java:483)    at
> org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:50)
>    at
> org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31)
>    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)    at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)    at
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)    at
> org.apache.hadoop.mapred.Child$4.run(Child.java:255)    at
> java.security.AccessController.doPrivileged(Native Method)    at
> javax.security.auth.Subject.doAs(Subject.java:415)    at
> > >>
>  org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>    at org.apache.hadoop.mapred.Child.main(Child.java:249)
> > >>
> > >>> Date: Wed, 20 Nov 2013 08:22:07 +0100
> > >>> From: ssc.open@googlemail.com
> > >>> To: user@mahout.apache.org
> > >>> Subject: Re: Mahout fpg
> > >>>
> > >>> You can use ItemSimilarityJob to find sets of items that cooccur
> > >>> together in your users interactions.
> > >>>
> > >>> --sebastian
> > >>>
> > >>>
> > >>> On 20.11.2013 08:11, Sameer Tilak wrote:
> > >>>>
> > >>>>
> > >>>>
> > >>>> Hi Sunil,
> > >>>> Thanks for your reply. We can benefit a lot from the parallel
> frequent pattern matching functionality. Will there be any alternative in
> future releases? I guess, we can use older versions of Mahout if we need
> that.
> > >>>>
> > >>>>> Date: Tue, 19 Nov 2013 19:25:54 -0800
> > >>>>> From: suneel_marthi@yahoo.com
> > >>>>> Subject: Re: Mahout fpg
> > >>>>> To: user@mahout.apache.org
> > >>>>>
> > >>>>> Fpg has been removed from the codebase as it will not be supported.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On Tuesday, November 19, 2013 8:56 PM, Sameer Tilak <
> sstilak@live.com> wrote:
> > >>>>>
> > >>>>> Hi everyone,I downloaded the latest version of Mahout and did
mvn
> install. When I try to run fog, I get the following errors. Do I need to
> download and compile FPG separately? Looks like somehow it has not been
> included in the list of valid programs.
> > >>>>> 13/11/19 17:49:19 WARN driver.MahoutDriver: Unable to add class:
> fpg13/11/19 17:49:19 WARN driver.MahoutDriver: No fpg.props found on
> classpath, will use command-line arguments onlyUnknown program 'fpg'
> chosen.Valid program names are:  arff.vector: : Generate Vectors from an
> ARFF file or directory  baumwelch: : Baum-Welch algorithm for unsupervised
> HMM training  canopy: : Canopy clustering  cat: : Print a file or resource
> as the logistic regression models would see it  cleansvd: : Cleanup and
> verification of SVD output  clusterdump: : Dump cluster output to text
>  clusterpp: : Groups Clustering Output In Clusters  cmdump: : Dump
> confusion matrix in HTML or text formats  concatmatrices: : Concatenates 2
> matrices of same cardinality into a single matrix  cvb: : LDA via Collapsed
> Variation Bayes (0th deriv. approx)  cvb0_local: : LDA via Collapsed
> Variation Bayes, in memory locally.  evaluateFactorization: : compute RMSE
> and MAE of a rating
> > >>>>>  matrix factorization against probes  fkmeans: : Fuzzy K-means
> clustering  hmmpredict: : Generate random sequence of observations by given
> HMM  itemsimilarity: : Compute the item-item-similarities for item-based
> collaborative filtering  kmeans: : K-means clustering  lucene.vector: :
> Generate Vectors from a Lucene index  lucene2seq: : Generate Text
> SequenceFiles from a Lucene index  matrixdump: : Dump matrix in CSV format
>  matrixmult: : Take the product of two matrices  parallelALS: : ALS-WR
> factorization of a rating matrix  qualcluster: : Runs clustering
> experiments and summarizes results in a CSV  recommendfactorized: : Compute
> recommendations using the factorization of a rating matrix
>  recommenditembased: : Compute recommendations using item-based
> collaborative filtering  regexconverter: : Convert text files on a per line
> basis based on regular expressions  resplit: : Splits a set of
> SequenceFiles into a number of equal splits
> > >>  rowid: :
> > >>>>>  Map SequenceFile<Text,VectorWritable> to
> {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
>  rowsimilarity: : Compute the pairwise similarities of the rows of a matrix
>  runAdaptiveLogistic: : Score new production data using a probably trained
> and validated AdaptivelogisticRegression model  runlogistic: : Run a
> logistic regression model against CSV data  seq2encoded: : Encoded Sparse
> Vector generation from Text sequence files  seq2sparse: : Sparse Vector
> generation from Text sequence files  seqdirectory: : Generate sequence
> files (of Text) from a directory  seqdumper: : Generic Sequence File dumper
>  seqmailarchives: : Creates SequenceFile from a directory containing
> gzipped mail archives  seqwiki: : Wikipedia xml dump to sequence file
>  spectralkmeans: : Spectral k-means clustering  split: : Split Input data
> into test and train sets  splitDataset: : split a rating dataset into
> training and probe parts  ssvd: :
> > >>>>>  Stochastic SVD  streamingkmeans: : Streaming k-means clustering
>  svd: : Lanczos Singular Value Decomposition  testnb: : Test the
> Vector-based Bayes classifier  trainAdaptiveLogistic: : Train an
> AdaptivelogisticRegression model  trainlogistic: : Train a logistic
> regression using stochastic gradient descent  trainnb: : Train the
> Vector-based Bayes classifier  transpose: : Take the transpose of a matrix
>  validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model
> against hold-out data set  vecdist: : Compute the distances between a set
> of Vectors (or Cluster or Canopy, they must fit in memory) and a list of
> Vectors  vectordump: : Dump vectors from a sequence file to text  viterbi:
> : Viterbi decoding of hidden states from given output states sequence
> > >>>>
> > >>>>
> > >>>>
> > >>>
> > >
> > >
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message