mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From WangRamon <ramon_w...@hotmail.com>
Subject RE: How to find which point belongs which cluster after running KMeansClusterer
Date Fri, 04 Nov 2011 14:40:46 GMT


 > Subject: Re: How to find which point belongs which cluster after running KMeansClusterer
> From: gsingers@apache.org
> Date: Fri, 4 Nov 2011 06:49:49 -0400
> To: user@mahout.apache.org
> 
> 
> On Nov 4, 2011, at 3:28 AM, WangRamon wrote:
> 
> > 
> > Thanks, that's what i need. I have another question, is there a recommend value
for the iteration and convergenceDelta in K-Means? Thanks a lot.  Cheers Ramon
> 
> 
> It's usually determined by testing (what's the minimum values you need that give you
good results), but also by how long it takes for your system to run and what your business
requirements are.  Both of those values are really meant to be save guards against a runaway
process since k-means isn't guaranteed to converge. What do you mean by k-means isn't guaranteed
to converge?
> 
> 
> >> Date: Fri, 4 Nov 2011 08:07:01 +0530
> >> From: pranjan@xebia.com
> >> To: user@mahout.apache.org
> >> Subject: Re: How to find which point belongs which cluster after running KMeansClusterer
> >> 
> >> Transform your vector in a NamedVector.
> >> 
> >> On 04-11-2011 08:02, WangRamon wrote:
> >>> OK, me again, I checked the KMeansDriver code for output points information,
following is the code:   Map<Text, Text> props = new HashMap<Text, Text>();
> >>>    props.put(new Text("distance"), new Text(String.valueOf(nearestDistance)));
> >>>    context.write(new IntWritable(nearestCluster.getId()), new WeightedPropertyVectorWritable(1,
vector, props)); It's good to output point(the vector) and distance information,  but usually
we need something like a name in real business to identify the the point, name <-->
vector/point,  and this information is not written out, if we can add this information, that's
will be much more better.   Cheers  Ramon
> >>>> Subject: Re: How to find which point belongs which cluster after running
KMeansClusterer
> >>>> From: gsingers@apache.org
> >>>> Date: Thu, 3 Nov 2011 08:28:19 -0400
> >>>> To: user@mahout.apache.org
> >>>> 
> >>>> There is code for this, it's in two places (on trunk, at least):
> >>>> 
> >>>> 1. ClusterDumper:
> >>>> public static Map<Integer, List<WeightedVectorWritable>>
readPoints(Path pointsPathDir, Configuration conf) {
> >>>>    Map<Integer, List<WeightedVectorWritable>> result = new
TreeMap<Integer, List<WeightedVectorWritable>>();
> >>>>    for (Pair<IntWritable, WeightedVectorWritable> record :
> >>>>            new SequenceFileDirIterable<IntWritable, WeightedVectorWritable>(
> >>>>                    pointsPathDir, PathType.LIST, PathFilters.logsCRCFilter(),
conf)) {
> >>>>      // value is the cluster id as an int, key is the name/id of the
> >>>>      // vector, but that doesn't matter because we only care about printing
> >>>>      // it
> >>>>      //String clusterId = value.toString();
> >>>>      int keyValue = record.getFirst().get();
> >>>>      List<WeightedVectorWritable> pointList = result.get(keyValue);
> >>>>      if (pointList == null) {
> >>>>        pointList = Lists.newArrayList();
> >>>>        result.put(keyValue, pointList);
> >>>>      }
> >>>>      pointList.add(record.getSecond());
> >>>>    }
> >>>>    return result;
> >>>>  }
> >>>> 
> >>>> 2. ClusterDumperWriter:
> >>>> List<WeightedVectorWritable> points = clusterIdToPoints.get(value.getId());
//look up the points by cluster id
> >>>>    if (points != null) {
> >>>>      writer.write("\tWeight : [props - optional]:  Point:\n\t");
> >>>>      for (Iterator<WeightedVectorWritable> iterator = points.iterator();
iterator.hasNext(); ) {
> >>>>        WeightedVectorWritable point = iterator.next();
> >>>>        writer.write(String.valueOf(point.getWeight()));
> >>>> 
> >>>> On Nov 3, 2011, at 5:48 AM, WangRamon wrote:
> >>>> 
> >>>>> Yes, Paritosh, it's a bit missleading for new users, I will start
to check KMeansDriver, thanks for your quickly reply.
> >>>>>> Date: Thu, 3 Nov 2011 15:02:28 +0530
> >>>>>> From: pranjan@xebia.com
> >>>>>> To: user@mahout.apache.org
> >>>>>> Subject: Re: How to find which point belongs which cluster after
running KMeansClusterer
> >>>>>> 
> >>>>>> I also thought in the beginning that using KMeansClusterer and
> >>>>>> ClusterDumper will help in getting all vectors belonging to
a cluster,
> >>>>>> but it did not help me a lot.
> >>>>>> 
> >>>>>> I used KMeansDriver which I think is easy enough to use.
> >>>>>> 
> >>>>>> After execution the records are written in the form
> >>>>>> <cluster id><vector>
> >>>>>> 
> >>>>>> "context.write(new Text(cluster.getIdentifier()), cluster);"
> >>>>>> 
> >>>>>> So, what helped me was to process this into a map with cluster
Id as the
> >>>>>> key and vector list as the value. I read the clustered points
and all
> >>>>>> the data in the map in the form. In the end, the list against
each
> >>>>>> cluster id was what I needed.
> >>>>>> 
> >>>>>> Hope this helps.
> >>>>>> 
> >>>>>> Regards,
> >>>>>> Paritosh
> >>>>>> 
> >>>>>> On 03-11-2011 14:23, WangRamon wrote:
> >>>>>>> 
> >>>>>>> 
> >>>>>>> Hi All I'm using KMeansClusterer, I will use KMeansDriver
on a Hadoop environment later, but I think it will be easy to understand it by using KMeansClusterer,
OK, so the question is i cannot find a way to find the cluster a point should belong to after
running KMeansClusterer, I expect I can get some API on the Cluster interface to get all points/vector
belong to this cluster, but... so did i miss something? Thanks a lot.  Cheers Ramon  		 	
  		  
> >>>>>>> 
> >>>>>>> 
> >>>>>>> -----
> >>>>>>> No virus found in this message.
> >>>>>>> Checked by AVG - www.avg.com
> >>>>>>> Version: 10.0.1411 / Virus Database: 2092/3992 - Release
Date: 11/02/11
> >>>>> 		 	   		  
> >>>> --------------------------------------------
> >>>> Grant Ingersoll
> >>>> http://www.lucidimagination.com
> >>>> 
> >>>> 
> >>>> 
> >>> 		 	   		  
> >>> 
> >>> 
> >>> -----
> >>> No virus found in this message.
> >>> Checked by AVG - www.avg.com
> >>> Version: 10.0.1411 / Virus Database: 2092/3992 - Release Date: 11/02/11
> >> 
> > 		 	   		  
> 
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com
> 
> 
> 
> 
> 
 		 	   		  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message