mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bob Morris <morris....@gmail.com>
Subject structure of part-r-00000 and SequenceFile.Reader NullPointerException
Date Wed, 09 Jul 2014 01:53:46 GMT
I'm doing Canopy clustering with CanopyDriver on a sequence file of
NamedVectors and seem to get the expected set of map and reduce
directories. But when I try to read the part-r-0000 file with a
SequenceFile.Reader, an attempt to iterate over the reader, I immediately
get a NullPointerException ---apparently either in the key or the value; I
don't know where.  Here's a pretty minimal exhibit of the code doing little
but attempting to count the clusters. (n practice I need to do more that
that, and ultimately don't care about anything but generating the names of
the vectors in each cluster).  Any suggestions are welcome.
I'm using mahout 0.9 and hadoop-core 1.2.1

Thanks
Bob

public class DupDumper2 {
private String datasetDir = null;

public DupDumper2(String datasetDir) {
this.datasetDir = datasetDir;
}

public int dumpCandidates() throws Exception {
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
int outCount=0;
String dumperInputFile="/tmp/bbg/clusters/clusters-0-final/part-r-00000";
//test with constant
SequenceFile.Reader clusterReader = new SequenceFile.Reader(fs,
new Path(dumperInputFile), conf);
IntWritable key = new IntWritable();
VectorWritable value = new VectorWritable();
 while (clusterReader.next(key, value)) {  //key = clusterid, value = list
of vectorid ?
 /* System.out.println(key.toString() + " " +
 value.get().asFormatString());*/
outCount++;
}
clusterReader.close();
return outCount;
}
public static void  main(String[] args) throws Exception {
DupDumper2 dumper = new DupDumper2(args[0]);
 int howMany =  dumper.dumpCandidates();
System.out.println(howMany);
}
}
The Exception trace is below. DupDumper2 .java:29 is the "while" line
Exception in thread "main" java.lang.NullPointerException at
org.apache.mahout.math.VectorWritable.toString(VectorWritable.java:232) at
java.lang.String.valueOf(String.java:2854) at
java.lang.StringBuilder.append(StringBuilder.java:128) at
org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1936) at
org.filteredpush.duplicates.DupDumper2.dumpCandidates(DupDumper2.java:29)
at org.filteredpush.duplicates.DupDumper2.main(DupDumper2.java:41)


-- 
Robert A. Morris

Emeritus Professor  of Computer Science
UMASS-Boston
100 Morrissey Blvd
Boston, MA 02125-3390


Filtered Push Project
Harvard University Herbaria
Harvard University

email: morris.bob@gmail.com
web: http://efg.cs.umb.edu/
web: http://wiki.filteredpush.org
http://www.cs.umb.edu/~ram
===
The content of this communication is made entirely on my
own behalf and in no way should be deemed to express
official positions of The University of Massachusetts at Boston or Harvard
University.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message