mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kamal Ali <k...@grokker.com>
Subject factorize-movielens-1M.sh privilegedActionException: reports dir doesn't exist when it does exist
Date Fri, 18 Jan 2013 23:20:47 GMT
I'm a newbie trying to get some mahout commandline examples to work.

I tried executing factorize-movielens-1M.sh but  get an error "input path
does not exist: /tmp/mahout-work-kali/movielens/ratings.csv"
even after i manually created /tmp/mahout-work-ali/ and all its descendant
directories and chmod'd them to 777.

even after i modified factorize-movielens-1M.sh to do a "ls -l " on the
ratings.csv which show /tmp/mahout-work-kali/movielens/ratings.csv
 exists.

[the input file u1.base already has "::" instead of \t as delimiters.]

i'm wondering if the error is something else and is being mis-reported and
some intermediate script/program is just getting a non-zero
return status and falling back on a stock error message.

i am on 64bit mac, jdk1.7. my ssh keys were generated using user "kali".

has anyone had success running factorize-movielens-1M.sh ?

does this factorize*sh only run in mahout local mode ?

is factorize-movielens-1M.sh cruddy and old and some other way
should be used??

i'm primarily interested in getting ALS methods to work,
if someone knows where in the mahout distribution one can find the
latest or most tested ALS implementation (and the maven command to run it)
pls let me know .

THANK YOU!
kamal.

my hadoop-env.sh is at the end of this email.
================================================
./factorize-movielens-1M.sh     $grouplens/ml-100k/u1.base   # grouplens
points to a directory containing the file u1.base
creating work directory at /tmp/mahout-work-kali
kamal: doing ls -l on movie lens dir:
total 1544
drwxrwxrwx  3 kali  wheel     102 Jan 18 12:20 dataset
-rwxrwxrwx  1 kali  wheel  786544 Jan 18 13:46 ratings.csv
kamal: doing wc -l on ratings.csv
   80000 /tmp/mahout-work-kali/movielens/ratings.csv
Converting ratings...
after sed
-rwxrwxrwx  1 kali  wheel  786544 Jan 18 13:47
/tmp/mahout-work-kali/movielens/ratings.csv
kamal: doing head on ratings.csv
1,1,5
1,2,3
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Warning: $HADOOP_HOME is deprecated.

Running on hadoop, using /Users/kali/hadoop/hadoop-1.0.4/bin/hadoop and
HADOOP_CONF_DIR=/Users/kali/hadoop/hadoop-1.0.4/conf
MAHOUT-JOB:
/users/kali/mahout/mahout0.7/examples/target/mahout-examples-0.7-job.jar
Warning: $HADOOP_HOME is deprecated.

13/01/18 13:47:24 INFO common.AbstractJob: Command line arguments:
{--endPhase=[2147483647],
--input=[/tmp/mahout-work-kali/movielens/ratings.csv],
--output=[/tmp/mahout-work-kali/dataset], --probePercentage=[0.1],
--startPhase=[0], --tempDir=[/tmp/mahout-work-kali/dataset/tmp],
--trainingPercentage=[0.9]}
2013-01-18 13:47:24.918 java[53562:1703] Unable to load realm info from
SCDynamicStore
13/01/18 13:47:25 INFO mapred.JobClient: Cleaning up the staging area
hdfs://localhost:9000/tmp/hadoop-kali/mapred/staging/kali/.staging/job_201301151900_0035
13/01/18 13:47:25 ERROR security.UserGroupInformation:
PriviledgedActionException as:kali
cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input
path does not exist: /tmp/mahout-work-kali/movielens/ratings.csv
Exception in thread "main"
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
does not exist: /tmp/mahout-work-kali/movielens/ratings.csv
at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
 at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:962)
 at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:979)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
 at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
 at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
 at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
 at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
 at
org.apache.mahout.cf.taste.hadoop.als.DatasetSplitter.run(DatasetSplitter.java:90)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at
org.apache.mahout.cf.taste.hadoop.als.DatasetSplitter.main(DatasetSplitter.java:64)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
 at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
 at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
after splitDataset
-rwxrwxrwx  1 kali  wheel  786544 Jan 18 13:47
/tmp/mahout-work-kali/movielens/ratings.csv
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Warning: $HADOOP_HOME is deprecated.

Running on hadoop, using /Users/kali/hadoop/hadoop-1.0.4/bin/hadoop and
HADOOP_CONF_DIR=/Users/kali/hadoop/hadoop-1.0.4/conf
MAHOUT-JOB:
/users/kali/mahout/mahout0.7/examples/target/mahout-examples-0.7-job.jar
Warning: $HADOOP_HOME is deprecated.

13/01/18 13:47:31 INFO common.AbstractJob: Command line arguments:
{--alpha=[40], --endPhase=[2147483647], --implicitFeedback=[false],
--input=[/tmp/mahout-work-kali/dataset/trainingSet/], --lambda=[0.065],
--numFeatures=[20], --numIterations=[10],
--output=[/tmp/mahout-work-kali/als/out], --startPhase=[0],
--tempDir=[/tmp/mahout-work-kali/als/tmp]}
2013-01-18 13:47:31.259 java[53605:1703] Unable to load realm info from
SCDynamicStore
13/01/18 13:47:32 INFO mapred.JobClient: Cleaning up the staging area
hdfs://localhost:9000/tmp/hadoop-kali/mapred/staging/kali/.staging/job_201301151900_0036
13/01/18 13:47:32 ERROR security.UserGroupInformation:
PriviledgedActionException as:kali
cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input
path does not exist: /tmp/mahout-work-kali/dataset/trainingSet
Exception in thread "main"
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
does not exist: /tmp/mahout-work-kali/dataset/trainingSet
at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
 at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:962)
 at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:979)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
 at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
 at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
 at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
 at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
 at
org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob.run(ParallelALSFactorizationJob.java:137)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at
org.apache.mahout.cf.taste.hadoop.als.ParallelALSFactorizationJob.main(ParallelALSFactorizationJob.java:98)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
 at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
 at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Warning: $HADOOP_HOME is deprecated.

Running on hadoop, using /Users/kali/hadoop/hadoop-1.0.4/bin/hadoop and
HADOOP_CONF_DIR=/Users/kali/hadoop/hadoop-1.0.4/conf
MAHOUT-JOB:
/users/kali/mahout/mahout0.7/examples/target/mahout-examples-0.7-job.jar
Warning: $HADOOP_HOME is deprecated.

13/01/18 13:47:38 INFO common.AbstractJob: Command line arguments:
{--endPhase=[2147483647],
--input=[/tmp/mahout-work-kali/dataset/probeSet/],
--itemFeatures=[/tmp/mahout-work-kali/als/out/M/],
--output=[/tmp/mahout-work-kali/als/rmse/], --startPhase=[0],
--tempDir=[/tmp/mahout-work-kali/als/tmp],
--userFeatures=[/tmp/mahout-work-kali/als/out/U/]}
2013-01-18 13:47:38.142 java[53645:1703] Unable to load realm info from
SCDynamicStore
13/01/18 13:47:38 INFO mapred.JobClient: Cleaning up the staging area
hdfs://localhost:9000/tmp/hadoop-kali/mapred/staging/kali/.staging/job_201301151900_0037
13/01/18 13:47:38 ERROR security.UserGroupInformation:
PriviledgedActionException as:kali
cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input
path does not exist: /tmp/mahout-work-kali/dataset/probeSet
Exception in thread "main"
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
does not exist: /tmp/mahout-work-kali/dataset/probeSet
at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
 at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:962)
 at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:979)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
 at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
 at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
 at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
 at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
 at
org.apache.mahout.cf.taste.hadoop.als.FactorizationEvaluator.run(FactorizationEvaluator.java:91)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at
org.apache.mahout.cf.taste.hadoop.als.FactorizationEvaluator.main(FactorizationEvaluator.java:68)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
 at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
 at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Warning: $HADOOP_HOME is deprecated.

Running on hadoop, using /Users/kali/hadoop/hadoop-1.0.4/bin/hadoop and
HADOOP_CONF_DIR=/Users/kali/hadoop/hadoop-1.0.4/conf
MAHOUT-JOB:
/users/kali/mahout/mahout0.7/examples/target/mahout-examples-0.7-job.jar
Warning: $HADOOP_HOME is deprecated.

13/01/18 13:47:44 INFO common.AbstractJob: Command line arguments:
{--endPhase=[2147483647],
--input=[/tmp/mahout-work-kali/als/out/userRatings/],
--itemFeatures=[/tmp/mahout-work-kali/als/out/M/], --maxRating=[5],
--numRecommendations=[6],
--output=[/tmp/mahout-work-kali/recommendations/], --startPhase=[0],
--tempDir=[temp], --userFeatures=[/tmp/mahout-work-kali/als/out/U/]}
2013-01-18 13:47:44.859 java[53687:1703] Unable to load realm info from
SCDynamicStore
13/01/18 13:47:45 INFO mapred.JobClient: Cleaning up the staging area
hdfs://localhost:9000/tmp/hadoop-kali/mapred/staging/kali/.staging/job_201301151900_0038
13/01/18 13:47:45 ERROR security.UserGroupInformation:
PriviledgedActionException as:kali
cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input
path does not exist: /tmp/mahout-work-kali/als/out/userRatings
Exception in thread "main"
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
does not exist: /tmp/mahout-work-kali/als/out/userRatings
at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
 at
org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:55)
at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
 at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:962)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:979)
 at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897)
 at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
 at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
 at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
at
org.apache.mahout.cf.taste.hadoop.als.RecommenderJob.run(RecommenderJob.java:95)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
 at
org.apache.mahout.cf.taste.hadoop.als.RecommenderJob.main(RecommenderJob.java:69)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:601)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
 at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

RMSE is:

cat: /tmp/mahout-work-kali/als/rmse/rmse.txt: No such file or directory



Sample recommendations:

cat: /tmp/mahout-work-kali/recommendations/part-m-00000: No such file or
directory


==================================================
# Set Hadoop-specific environment variables here.

# The only required environment variable is JAVA_HOME.  All others are
# optional.  When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.

# The java implementation to use.  Required.
export
JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.7.0_10.jdk/Contents/Home/jre

# Extra Java CLASSPATH elements.  Optional.
# export HADOOP_CLASSPATH=

# The maximum amount of heap to use, in MB. Default is 1000.
# export HADOOP_HEAPSIZE=2000

# Extra Java runtime options.  Empty by default.
# export HADOOP_OPTS=-server

# Command specific options appended to HADOOP_OPTS when specified
export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote
$HADOOP_NAMENODE_OPTS"
export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote
$HADOOP_SECONDARYNAMENODE_OPTS"
export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote
$HADOOP_DATANODE_OPTS"
export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote
$HADOOP_BALANCER_OPTS"
export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote
$HADOOP_JOBTRACKER_OPTS"
# export HADOOP_TASKTRACKER_OPTS=
# The following applies to multiple commands (fs, dfs, fsck, distcp etc)
# export HADOOP_CLIENT_OPTS

# Extra ssh options.  Empty by default.
# export HADOOP_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HADOOP_CONF_DIR"

# Where log files are stored.  $HADOOP_HOME/logs by default.
# export HADOOP_LOG_DIR=${HADOOP_HOME}/logs

# File naming remote slave hosts.  $HADOOP_HOME/conf/slaves by default.
# export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves

# host:path where hadoop code should be rsync'd from.  Unset by default.
# export HADOOP_MASTER=master:/home/$USER/src/hadoop

# Seconds to sleep between slave commands.  Unset by default.  This
# can be useful in large clusters, where, e.g., slave rsyncs can
# otherwise arrive faster than the master can service them.
# export HADOOP_SLAVE_SLEEP=0.1

# The directory where pid files are stored. /tmp by default.
# export HADOOP_PID_DIR=/var/hadoop/pids

# A string representing this instance of hadoop. $USER by default.
# export HADOOP_IDENT_STRING=$USER

# The scheduling priority for daemon processes.  See 'man nice'.
# export HADOOP_NICENESS=10

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message