mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Mahalanobis users out there?
Date Tue, 01 Mar 2011 17:26:34 GMT
Vasil,

If you are suggesting a change in Mahout, can you to to to
https://issues.apache.org/jira/browse/MAHOUT
 <https://issues.apache.org/jira/browse/MAHOUT>and file an issue with a
patch?

In case the terminology is new for you, an issue is a bug report or
enhancement request and a patch is
the output of svn diff or git format-patch.

You can get more information about this process here:
https://cwiki.apache.org/confluence/display/MAHOUT/How+To+Contribute

On Tue, Mar 1, 2011 at 1:11 AM, Vasil Vasilev <vavasilev@gmail.com> wrote:

> Hi Lance,
>
> I did a small test with the Mahalanobis Distance Measure and Dirichlet
> clustering. Unfortunately it was not very successful at the first time,
> because its "configure" method was never called.
> I did some changes in the Mahout code to be able to run it and used the
> following code in the
> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job class:
>
> /**
>   * Run the job using supplied arguments, deleting the output directory if
> it exists beforehand
>   *
>   * @param input
>   *          the directory pathname for input points
>   * @param output
>   *          the directory pathname for output points
>   * @param modelDistribution
>   *          the ModelDistribution
>   * @param numModels
>   *          the number of Models
>   * @param maxIterations
>   *          the maximum number of iterations
>   * @param alpha0
>   *          the alpha0 value for the DirichletDistribution
>   */
>  public void run(Path input,
>                  Path output,
>                  ModelDistribution<VectorWritable> modelDistribution,
>                  int numModels,
>                  int maxIterations,
>                  double alpha0,
>                  boolean emitMostLikely,
>                  double threshold)
>    throws IOException, ClassNotFoundException, InstantiationException,
> IllegalAccessException,
>           SecurityException, InterruptedException {
>      Configuration conf = new Configuration();
>
>      if(modelDistribution instanceof DistanceMeasureClusterDistribution)
>        {
>            DistanceMeasure measure =
> ((DistanceMeasureClusterDistribution)modelDistribution).getMeasure();
>            if(measure instanceof MahalanobisDistanceMeasure)
>            {
>                Vector meanVector = new DenseVector(new double [] {0.0,
> 22.0, 25.0});
>
> ((MahalanobisDistanceMeasure)measure).setMeanVector(meanVector);
>                Matrix m= new DenseMatrix(new double [][] {{1.0, 0.0, 0.0},
> {0.0, 1.0, 0.0}, {0.0, 0.0, 1.0}});
>
> ((MahalanobisDistanceMeasure)measure).setCovarianceMatrix(m);
>
>                Path inverseCovarianceFile = new
> Path("output/MahalanobisDistanceMeasureInverseCovarianceFile");
>                conf.set("MahalanobisDistanceMeasure.inverseCovarianceFile",
> "output/MahalanobisDistanceMeasureInverseCovarianceFile");
>                FileSystem fs =
> FileSystem.get(inverseCovarianceFile.toUri(), conf);
>                MatrixWritable inverseCovarianceMatrix = new
>
> MatrixWritable(((MahalanobisDistanceMeasure)measure).getInverseCovarianceMatrix());
>                DataOutputStream out = fs.create(inverseCovarianceFile);
>                try {
>                  inverseCovarianceMatrix.write(out);
>                } finally {
>                    out.close();
>                }
>
>                Path meanVectorFile = new
> Path("output/MahalanobisDistanceMeasureMeanVectorFile");
>                conf.set("MahalanobisDistanceMeasure.meanVectorFile",
> "output/MahalanobisDistanceMeasureMeanVectorFile");
>                fs = FileSystem.get(meanVectorFile.toUri(), conf);
>                VectorWritable meanVectorWritable = new
> VectorWritable(meanVector);
>                out = fs.create(meanVectorFile);
>                try {
>                    meanVectorWritable.write(out);
>                } finally {
>                    out.close();
>                }
>
>                conf.set("MahalanobisDistanceMeasure.maxtrixClass",
> MatrixWritable.class.getName());
>                conf.set("MahalanobisDistanceMeasure.vectorClass",
> VectorWritable.class.getName());
>            }
>        }
>
>    Path directoryContainingConvertedInput = new Path(output,
> DIRECTORY_CONTAINING_CONVERTED_INPUT);
>    SynthInputDriver.runJob(input, directoryContainingConvertedInput,
> "org.apache.mahout.math.RandomAccessSparseVector");
>    //InputDriver.runJob(input, directoryContainingConvertedInput,
> "org.apache.mahout.math.RandomAccessSparseVector");
>    DirichletDriver.run(conf, directoryContainingConvertedInput,
>                        output,
>                        modelDistribution,
>                        numModels,
>                        maxIterations,
>                        alpha0,
>                        true,
>                        emitMostLikely,
>                        threshold,
>                        true);
>
>    try {
>
>
> ClusteredPointsConverter.convertClusteredPoints(directoryContainingConvertedInput,
> new Path(output, "clusteredPoints"),  new Path(output,
> "convertedClusteredPoints"),
> "org.apache.mahout.math.RandomAccessSparseVector");
>    } catch (InvocationTargetException e) {
>        // TODO Auto-generated catch block
>        e.printStackTrace();
>    }
>
>    // run ClusterDumper
>    ClusterDumper clusterDumper =
>        new ClusterDumper(new Path(output, "clusters-" + maxIterations), new
> Path(output, "convertedClusteredPoints"));
>    clusterDumper.printClusters(null);
>  }
>
> On Tue, Mar 1, 2011 at 10:12 AM, Lance Norskog <goksron@gmail.com> wrote:
>
> > Does anybody use the Mahalanobis distance measure class? If so, what for?
> > And how do you prepare the input matrices?
> >
> > Lance
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message