mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aleksander Sadecki <aleksander.sade...@pi.esisar.grenoble-inp.fr>
Subject Re: How to list all vectors from a cluster
Date Fri, 23 May 2014 12:56:10 GMT
Hi,

To be honest, I have no idea what I am doing wrong...

I posted this issue on StackOverlow. If you have any idea what is wrong here I will be happy
to see your answer.

Still my output is empty

Here you can see my post:

http://stackoverflow.com/q/23829740/1021970



Hello,

i'm using mahout 0.8

import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.SequenceFile;
import org.apache.mahout.clustering.Cluster;
import org.apache.mahout.clustering.canopy.CanopyDriver;
import org.apache.mahout.clustering.classify.WeightedVectorWritable;
import org.apache.mahout.clustering.kmeans.KMeansDriver;
import org.apache.mahout.common.Pair;
import org.apache.mahout.common.distance.DistanceMeasure;
import org.apache.mahout.common.distance.ManhattanDistanceMeasure;
import org.apache.mahout.common.distance.TanimotoDistanceMeasure;
import org.apache.mahout.common.iterator.sequencefile.PathFilters;
import org.apache.mahout.common.iterator.sequencefile.PathType;
import 
org.apache.mahout.common.iterator.sequencefile.SequenceFileDirIterable;
import org.apache.mahout.math.NamedVector;
import org.apache.mahout.math.Vector;
import org.apache.mahout.math.VectorWritable;
import org.apache.mahout.utils.vectors.VectorHelper;
import org.apache.mahout.utils.vectors.lucene.Driver;




On 23/05/14 04:36, Aleksander Sadecki wrote:
> Hi,
>
> Thank you.
>
> Which version of Apache Mahout you are using? Could you paste here your imports? Thanks
>
> ==================================
> Projet Industriel PI16 – SICAP
> ==================================
> Equipe: Deschamps Mathias
>           Razafindramaka Rado
>          Sadecki Aleksander
>        
>
> Encadrée par: Brun Emmanuel
>
> Salle C104
> ==================================
> ESISAR
> 50 rue Barthelemy de Laffemas
> BP 54
> 26902 Valence cedex 9
> ==================================
> tel: 04 56 52 99 16
> fax: 04 75 75 94 44
> ==================================
>
>
> ----- Oryginalna wiadomość -----
> Od: "Angel Luis Scull" <ascullp@facinf.uho.edu.cu>
> Do: user@mahout.apache.org
> Wysłane: czwartek, 22 maj 2014 19:40:24
> Temat: Re: How to list all vectors from a cluster
>
> Hi
>
> that work for me
>    ...
> Path path = new Path(workPath + kmeansClustersPath +
> "/clusteredPoints/part-m-0");
> for (Pair<IntWritable, WeightedVectorWritable> record : new
> SequenceFileDirIterable<IntWritable, WeightedVectorWritable>(path,
> PathType.GLOB,
>                   PathFilters.logsCRCFilter(), conf)) {
>               NamedVector vec = ((NamedVector)
> record.getSecond().getVector());
>               System.out.println(record.getFirst().get() + "  " +
> vec.getName());
>
>           }
> ...
>
>
> On 22/05/14 11:22, Aleksander Sadecki wrote:
>> Hi,
>>
>> Thank you very much!
>>
>> I am trying to implement a Java function with this class.
>>
>> I wrote this piece of code:
>>
>> 		ClusterDumper dumper = new ClusterDumper(new Path(partMDir), new Path(
>> 				seqDir));
>>
>> 		Map<Integer, List<WeightedPropertyVectorWritable>> dumped = dumper
>> 				.getClusterIdToPoints();
>>
>> 		for (Integer numberOfList : dumped.keySet()) {
>> 			List<WeightedPropertyVectorWritable> listWithVectors = dumped
>> 					.get(numberOfList);
>>
>> 			for (WeightedPropertyVectorWritable vec : listWithVectors) {
>> 				System.out.println(vec.getVector().toString());
>> 			}
>> 		}
>>
>> when I run it, I have got an exception.
>>
>> Constructor takes 2 parameters:
>>
>> ClusterDumper(seqFileDir, pointsDir) and I do not know which files should I pass
here...
>>
>> I have got 9 files:
>>
>> 		String s1 = root + "synthetic_control.data";
>> 		String s2 = root + "synthetic_control.seq";
>> 		String s3 = root + ".synthetic_control.seq.crc";
>> 		String s4 = outputDir + "\\clusteredPoints\\part-m-0";
>> 		String s5 = outputDir + "\\clusteredPoints\\.part-m-0.crc";
>> 		String s6 = outputDir + "\\clusters-0-final\\_policy";
>> 		String s7 = outputDir + "\\clusters-0-final\\part-r-00000";
>> 		String s8 = outputDir + "\\clusters-0-final\\._policy.crc";
>> 		String s9 = outputDir + "\\clusters-0-final\\.part-r-00000.crc";
>>
>> 		Path p1 = new Path(s1);
>> 		Path p2 = new Path(s2);
>> 		Path p3 = new Path(s3);
>> 		Path p4 = new Path(s4);
>> 		Path p5 = new Path(s5);
>> 		Path p6 = new Path(s6);
>> 		Path p7 = new Path(s7);
>> 		Path p8 = new Path(s8);
>> 		Path p9 = new Path(s9);
>>
>> I tried to find which 2 should I use but nothing works.
>>
>> Some of them gives me:
>>
>> synthetic_control.data not a SequenceFile
>>
>> another one:
>>
>> org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.IntWritable
>>
>> or sometimes there is no excpetion but output is empty.
>>
>> Could you help me?
>>
>> Thank you in advance


Mime
View raw message