mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nikolaos Romanos Katsipoulakis <popa...@gmail.com>
Subject Web Service Interface for triggering a Hadoop Job
Date Tue, 05 Jun 2012 07:50:21 GMT
Hello everybody.
I want to trigger the execution of an ItemSimilarityJob (mahout 0.7 
snapshot) from a web service
interface. Hence, I want to implement a class that will contain an 
ItemSimilarityJob object and whenever
I get a WS request, it will invoke the ItemSimilarityJob object's run 
method. Is this possible?
And how is it done?
I am posting the code that I have written below:

public class Main {

     public static void main(String[] args) throws IOException {
         Configuration jobConf = new Configuration();
         jobConf.addResource(new Path("/etc/hadoop/conf/core-site.xml"));
         jobConf.addResource(new Path("/etc/hadoop/conf/hdfs-site.xml"));
         jobConf.addResource(new Path("/etc/hadoop/conf/mapred-site.xml"));
         ItemSimilarityJob myJob = new ItemSimilarityJob();
         String[] args1 = { "-Dmapred.input.dir=input/input.txt", 
"-Dmapred.output.dir=output", "--similarityClassname", 
"SIMILARITY_COOCCURRENCE" };
         try {
             myJob.main(args1);
         }catch(Exception e) {
             System.err.println(e.getMessage());
         }
     }

}

The output I get is:

Jun 5, 2012 9:14:46 AM org.apache.mahout.common.AbstractJob parseArguments
SEVERE: Unexpected mapred.output.dir=output while processing 
Job-Specific Options:
usage: <command> [Generic Options] [Job-Specific Options]
Generic Options:
  -archives <paths>              comma separated archives to be unarchived
                                 on the compute machines.
  -conf <configuration file>     specify an application configuration file
  -D <property=value>            use value for given property
  -files <paths>                 comma separated files to be copied to the
                                 map reduce cluster
  -fs <local|namenode:port>      specify a namenode
  -jt <local|jobtracker:port>    specify a job tracker
  -libjars <paths>               comma separated jar files to include in
                                 the classpath.
  -tokenCacheFile <tokensFile>   name of the file with the tokens
Unexpected mapred.output.dir=output while processing Job-Specific Options:
Usage:
  [--input <input> --output <output> --similarityClassname 
<similarityClassname>
--maxSimilaritiesPerItem <maxSimilaritiesPerItem> --maxPrefsPerUser
<maxPrefsPerUser> --minPrefsPerUser <minPrefsPerUser> --booleanData
<booleanData> --threshold <threshold> --help --tempDir <tempDir> 
--startPhase
<startPhase> --endPhase <endPhase>]
Job-Specific Options:
   --input (-i) input                                      Path to job 
input
                                                           directory.
   --output (-o) output                                    The directory
                                                           pathname for 
output.
   --similarityClassname (-s) similarityClassname          Name of 
distributed
                                                           similarity 
measures
                                                           class to 
instantiate,
                                                           alternatively 
use one
                                                           of the 
predefined
                                                           similarities
                                                           
([SIMILARITY_COOCCURRE
                                                           NCE,
                                                           
SIMILARITY_LOGLIKELIHO
                                                           OD,
                                                           
SIMILARITY_TANIMOTO_CO
                                                           EFFICIENT,
                                                           
SIMILARITY_CITY_BLOCK,
                                                           
SIMILARITY_COSINE,
                                                           
SIMILARITY_PEARSON_COR
                                                           RELATION,
                                                           
SIMILARITY_EUCLIDEAN_D
                                                           ISTANCE])
   --maxSimilaritiesPerItem (-m) maxSimilaritiesPerItem    try to cap 
the number
                                                           of similar 
items per
                                                           item to this 
number
                                                           (default: 100)
   --maxPrefsPerUser (-mppu) maxPrefsPerUser               max number of
                                                           preferences to
                                                           consider per 
user,
                                                           users with more
                                                           preferences 
will be
                                                           sampled down
                                                           (default: 1000)
   --minPrefsPerUser (-mp) minPrefsPerUser                 ignore users 
with
                                                           less 
preferences than
                                                           this 
(default: 1)
   --booleanData (-b) booleanData                          Treat input as
                                                           without pref 
values
   --threshold (-tr) threshold                             discard item 
pairs
                                                           with a 
similarity
                                                           value below this
   --help (-h)                                             Print out help
   --tempDir tempDir                                       Intermediate 
output
                                                           directory
   --startPhase startPhase                                 First phase 
to run
   --endPhase endPhase                                     Last phase to 
run

Why do I get the above output?

Thank you in advance.

Nick K.

Mime
View raw message