mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Filimon <dangeorge.fili...@gmail.com>
Subject Re: how to use a custom distance measure with kmeans?
Date Wed, 13 Feb 2013 11:24:39 GMT
Sure, that sounds like an ever better solution!
I didn't read the entire script. :)

On Wed, Feb 13, 2013 at 6:40 AM, Mahesh Balija
<balijamahesh.mca@gmail.com> wrote:
> Hi Dan,
>
>               If we copy the jar containing the custom classes to the
> MAHOUT_HOME/lib folder wont that work fine?
>               Because at line 147 of mahout script it reads all jars under
> lib folder and put into classpath.
>
>               If this won't work prolly there should be some better way to
> add the custom classes to classpath rather than users modifying the script
> file.
>
> Thanks,
> Mahesh Balija,
> Calsoft Labs.
>
> On Tue, Feb 12, 2013 at 10:18 PM, Dan Filimon
> <dangeorge.filimon@gmail.com>wrote:
>
>> You need to add the JAR containing the distance measure you want to
>> the classpath.
>> By default the CLASSPATH is set in line 120 of the mahout script. (the
>> script itself is in the bin/ folder of your Mahout installation).
>>
>> Sadly I don't think that scripts allows you to set the class path by
>> default, but it should be a simple add.
>> You can either:
>> a. add the path to your JAR/class folder manually at line 120
>> b. (the cleaner way) add a new variable called something like
>> MAHOUT_EXTRA_CLASSPATH to line 120 which you can set to whatever you
>> need.
>>
>> b. is a bit cleaner, but you need to modify the script anyway.
>>
>> Alternatively, if you dislike fudging with the script you can have a
>> closer look at it and see that running 'mahout classpath' gives you
>> the classpath it builds. Then you can run the hadoop script directly
>> like in line 252 of the script and edit the HADOOP_CLASSPATH (see
>> http://stackoverflow.com/questions/3799679/how-to-run-a-hadoop-program).
>>
>> This should really be better documented. Sorry you're having trouble!
>>
>> Good luck! :)
>>
>> On Tue, Feb 12, 2013 at 6:30 PM, Mihai Josan
>> <Mihai.Josan@iquestgroup.com> wrote:
>> > This is the error I receive:
>> >
>> > mahout kmeans -i /user/rhadoop/in/sequence/ \
>> >>        -c  /user/rhadoop/out/canopy-centroids/clusters-0 \
>> >>        -o  /user/rhadoop/out/clusters-out/ \
>> >>        -x 10 \
>> >>        -dm
>> /home/rhadoop/projects/workspace/mahout_abac/target/classes/clustering/AbacDistanceMeasure.class
>> >
>> > MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
>> > Running on hadoop, using /usr/lib/hadoop/bin/hadoop and
>> HADOOP_CONF_DIR=/etc/hadoop/conf
>> > MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.1.2-job.jar
>> > 13/02/12 17:05:57 INFO common.AbstractJob: Command line arguments:
>> {--clusters=[/user/rhadoop/out/canopy-centroids/clusters-0],
>> --convergenceDelta=[0.5],
>> --distanceMeasure=[/home/rhadoop/projects/workspace/mahout_abac/target/classes/clustering/AbacDistanceMeasure.class],
>> --endPhase=[2147483647], --input=[/user/rhadoop/in/sequence/],
>> --maxIter=[10], --method=[mapreduce],
>> --output=[/user/rhadoop/out/clusters-out2/], --startPhase=[0],
>> --tempDir=[temp]}
>> > Exception in thread "main" java.lang.IllegalStateException:
>> java.lang.ClassNotFoundException:
>> /home/rhadoop/projects/workspace/mahout_abac/target/classes/clustering/AbacDistanceMeasure.class
>> >         at
>> org.apache.mahout.common.ClassUtils.instantiateAs(ClassUtils.java:30)
>> >         at
>> org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:92)
>> >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>> >         at
>> org.apache.mahout.clustering.kmeans.KMeansDriver.main(KMeansDriver.java:49)
>> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> >         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> >         at java.lang.reflect.Method.invoke(Method.java:597)
>> >         at
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
>> >         at
>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
>> >         at
>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> >         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> >         at java.lang.reflect.Method.invoke(Method.java:597)
>> >         at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
>> > Caused by: java.lang.ClassNotFoundException:
>> /home/rhadoop/projects/besmart/workspace/mahout_abac/target/classes/clustering/AbacDistanceMeasure.class
>> >         at java.lang.Class.forName0(Native Method)
>> >         at java.lang.Class.forName(Class.java:169)
>> >         at
>> org.apache.mahout.common.ClassUtils.instantiateAs(ClassUtils.java:28)
>> >         ... 15 more
>> >
>> >
>> > Is this the proper way to use the custom distance measure? or should I
>> package the class? and how?
>> >
>> > Thank you in advance,
>> > Mihai Josan
>> >
>> >> Are you getting any errors?
>> >> Can you specify fully qualified class name of your distance measure
>> (like
>> >> com.xxx.MyDistanceMeasure) and check?
>> >>
>> >> Best,
>> >> Mahesh Balija,
>> >> Calsoft Labs.
>> >>
>> >>
>> >> On Tue, Feb 12, 2013 at 2:28 PM, Mihai Josan <
>> Mihai.Josan@iquestgroup.com>wrote:
>> >>
>> >> > Hello,
>> >> >
>> >> > Can you please tell me how can I use a custom made distance measure
>> with
>> >> > Mahout in command line?
>> >> > I am trying to do a clusterizationusing this distance like:
>> >> >
>> >> > mahout kmeans -i in/sequence/ \
>> >> >        -c  out/centroids/clusters-0 \
>> >> >        -o  out/clusters-out/ \
>> >> >        -x 10 \
>> >> >        -dm MyDistanceMeasure \
>> >> >        -ow
>> >> >
>> >> > Thank you in advance,
>> >> > Mihai
>> >> >
>>

Mime
View raw message