mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Eastman <j...@windwardsolutions.com>
Subject Re: how to use a custom distance measure with kmeans?
Date Tue, 12 Feb 2013 18:12:36 GMT
You also need to specify a fully-qualified class name

On 2/12/13 11:48 AM, Dan Filimon wrote:
> You need to add the JAR containing the distance measure you want to
> the classpath.
> By default the CLASSPATH is set in line 120 of the mahout script. (the
> script itself is in the bin/ folder of your Mahout installation).
>
> Sadly I don't think that scripts allows you to set the class path by
> default, but it should be a simple add.
> You can either:
> a. add the path to your JAR/class folder manually at line 120
> b. (the cleaner way) add a new variable called something like
> MAHOUT_EXTRA_CLASSPATH to line 120 which you can set to whatever you
> need.
>
> b. is a bit cleaner, but you need to modify the script anyway.
>
> Alternatively, if you dislike fudging with the script you can have a
> closer look at it and see that running 'mahout classpath' gives you
> the classpath it builds. Then you can run the hadoop script directly
> like in line 252 of the script and edit the HADOOP_CLASSPATH (see
> http://stackoverflow.com/questions/3799679/how-to-run-a-hadoop-program).
>
> This should really be better documented. Sorry you're having trouble!
>
> Good luck! :)
>
> On Tue, Feb 12, 2013 at 6:30 PM, Mihai Josan
> <Mihai.Josan@iquestgroup.com> wrote:
>> This is the error I receive:
>>
>> mahout kmeans -i /user/rhadoop/in/sequence/ \
>>>         -c  /user/rhadoop/out/canopy-centroids/clusters-0 \
>>>         -o  /user/rhadoop/out/clusters-out/ \
>>>         -x 10 \
>>>         -dm /home/rhadoop/projects/workspace/mahout_abac/target/classes/clustering/AbacDistanceMeasure.class
>> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
>> Running on hadoop, using /usr/lib/hadoop/bin/hadoop and HADOOP_CONF_DIR=/etc/hadoop/conf
>> MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.1.2-job.jar
>> 13/02/12 17:05:57 INFO common.AbstractJob: Command line arguments: {--clusters=[/user/rhadoop/out/canopy-centroids/clusters-0],
--convergenceDelta=[0.5], --distanceMeasure=[/home/rhadoop/projects/workspace/mahout_abac/target/classes/clustering/AbacDistanceMeasure.class],
--endPhase=[2147483647], --input=[/user/rhadoop/in/sequence/], --maxIter=[10], --method=[mapreduce],
--output=[/user/rhadoop/out/clusters-out2/], --startPhase=[0], --tempDir=[temp]}
>> Exception in thread "main" java.lang.IllegalStateException: java.lang.ClassNotFoundException:
/home/rhadoop/projects/workspace/mahout_abac/target/classes/clustering/AbacDistanceMeasure.class
>>          at org.apache.mahout.common.ClassUtils.instantiateAs(ClassUtils.java:30)
>>          at org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:92)
>>          at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>>          at org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:49)
>>          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>          at java.lang.reflect.Method.invoke(Method.java:597)
>>          at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
>>          at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
>>          at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>>          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>          at java.lang.reflect.Method.invoke(Method.java:597)
>>          at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
>> Caused by: java.lang.ClassNotFoundException: /home/rhadoop/projects/besmart/workspace/mahout_abac/target/classes/clustering/AbacDistanceMeasure.class
>>          at java.lang.Class.forName0(Native Method)
>>          at java.lang.Class.forName(Class.java:169)
>>          at org.apache.mahout.common.ClassUtils.instantiateAs(ClassUtils.java:28)
>>          ... 15 more
>>
>>
>> Is this the proper way to use the custom distance measure? or should I package the
class? and how?
>>
>> Thank you in advance,
>> Mihai Josan
>>
>>> Are you getting any errors?
>>> Can you specify fully qualified class name of your distance measure (like
>>> com.xxx.MyDistanceMeasure) and check?
>>>
>>> Best,
>>> Mahesh Balija,
>>> Calsoft Labs.
>>>
>>>
>>> On Tue, Feb 12, 2013 at 2:28 PM, Mihai Josan <Mihai.Josan@iquestgroup.com>wrote:
>>>
>>>> Hello,
>>>>
>>>> Can you please tell me how can I use a custom made distance measure with
>>>> Mahout in command line?
>>>> I am trying to do a clusterizationusing this distance like:
>>>>
>>>> mahout kmeans -i in/sequence/ \
>>>>         -c  out/centroids/clusters-0 \
>>>>         -o  out/clusters-out/ \
>>>>         -x 10 \
>>>>         -dm MyDistanceMeasure \
>>>>         -ow
>>>>
>>>> Thank you in advance,
>>>> Mihai
>>>>
>


Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message