mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.IntWritable
Date Thu, 10 Jun 2010 16:23:52 GMT
Yeah, you simply can't cast between IntWritable and LongWritable, sadly.
You need to convert your Long document ids to Integer.  Since you're pulling
documents from Solr, the docIds should be sequential and start small,
in which case they're all well under Integer.MAX_VALUE, and so a trivial
MapReduce (well, Map, no Reduce) job with a Mapper like this should work:

public class M extends Mapper<LongWritable, Writable, IntWritable, Writable>
{
  private final IntWritable i = new IntWritable(0);
  public void map(LongWritable key, Writable value, Context c)
  {
     i.set((int)k.get());
     c.collect(i, value);
  }
}

Run that over your input first, and you should be set.

  -jake

On Thu, Jun 10, 2010 at 7:20 AM, Kris Jack <mrkrisjack@gmail.com> wrote:

> Got a little further by making some more class changes...
>
> //
> public class GenSimMatrixJob extends AbstractJob {
>
>    public GenSimMatrixJob() {
>
>    }
>
>    @Override
>    public int run(String[] strings) throws Exception {
>        addOption("numDocs", "nd", "Number of documents in the input");
>        addOption("numTerms", "nt", "Number of terms in the input");
>
>        Map<String,String> parsedArgs = parseArguments(strings);
>        if (parsedArgs == null) {
>          // FIXME
>          return 0;
>        }
>
>        Configuration originalConf = getConf();
>        String inputPathString = originalConf.get("mapred.input.dir");
>        String outputTmpPathString = parsedArgs.get("--tempDir");
>        int numDocs = Integer.parseInt(parsedArgs.get("--numDocs"));
>        int numTerms = Integer.parseInt(parsedArgs.get("--numTerms"));
>
>        DistributedRowMatrix text = new
> DistributedRowMatrix(inputPathString,
>                outputTmpPathString, numDocs, numTerms);
>
>        text.configure(new JobConf(getConf()));
>
>        DistributedRowMatrix transpose = text.transpose();
>
>        DistributedRowMatrix similarity = transpose.times(transpose);
>
>        System.out.println("Similarity matrix lives: " +
> similarity.getRowPath());
>
>         return 1;
>    }
>
>    public static void main(String[] args) throws Exception {
>        ToolRunner.run(new GenSimMatrixJob(), args);
>    }
>
> }
> //
>
> Giving the error...
>
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further
> details.
> 10-Jun-2010 15:16:28 org.apache.hadoop.metrics.jvm.JvmMetrics init
> INFO: Initializing JVM Metrics with processName=JobTracker, sessionId=
> 10-Jun-2010 15:16:28 org.apache.hadoop.mapred.JobClient
> configureCommandLineOptions
> WARNING: Use GenericOptionsParser for parsing the arguments. Applications
> should implement Tool for the same.
> 10-Jun-2010 15:16:28 org.apache.hadoop.mapred.JobClient
> configureCommandLineOptions
> WARNING: No job jar file set.  User classes may not be found. See
> JobConf(Class) or JobConf#setJar(String).
> 10-Jun-2010 15:16:28 org.apache.hadoop.mapred.FileInputFormat listStatus
> INFO: Total input paths to process : 1
> 10-Jun-2010 15:16:28 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
> INFO: Running job: job_local_0001
> 10-Jun-2010 15:16:28 org.apache.hadoop.mapred.FileInputFormat listStatus
> INFO: Total input paths to process : 1
> 10-Jun-2010 15:16:28 org.apache.hadoop.util.NativeCodeLoader <clinit>
> WARNING: Unable to load native-hadoop library for your platform... using
> builtin-java classes where applicable
> 10-Jun-2010 15:16:28 org.apache.hadoop.io.compress.CodecPool
> getDecompressor
> INFO: Got brand-new decompressor
> 10-Jun-2010 15:16:28 org.apache.hadoop.mapred.MapTask runOldMapper
> INFO: numReduceTasks: 1
> 10-Jun-2010 15:16:28 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
> <init>
> INFO: io.sort.mb = 100
> 10-Jun-2010 15:16:29 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
> <init>
> INFO: data buffer = 79691776/99614720
> 10-Jun-2010 15:16:29 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
> <init>
> INFO: record buffer = 262144/327680
> 10-Jun-2010 15:16:29 org.apache.hadoop.mapred.LocalJobRunner$Job run
> WARNING: job_local_0001
> java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
> cast to org.apache.hadoop.io.IntWritable
>    at
>
> org.apache.mahout.math.hadoop.TransposeJob$TransposeMapper.map(TransposeJob.java:1)
>    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>    at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> 10-Jun-2010 15:16:29 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
> INFO:  map 0% reduce 0%
> 10-Jun-2010 15:16:29 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
> INFO: Job complete: job_local_0001
> 10-Jun-2010 15:16:29 org.apache.hadoop.mapred.Counters log
> INFO: Counters: 0
>
>
>
> 2010/6/10 Kris Jack <mrkrisjack@gmail.com>
>
> > In the attempt to create a document-document similarity matrix, I am
> > getting the following error:
> >
> > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> > SLF4J: Defaulting to no-operation (NOP) logger implementation
> > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for
> further
> > details.
> > 10-Jun-2010 13:25:04 org.apache.hadoop.metrics.jvm.JvmMetrics init
> > INFO: Initializing JVM Metrics with processName=JobTracker, sessionId=
> > 10-Jun-2010 13:25:04 org.apache.hadoop.mapred.JobClient
> > configureCommandLineOptions
> > WARNING: Use GenericOptionsParser for parsing the arguments. Applications
> > should implement Tool for the same.
> > 10-Jun-2010 13:25:04 org.apache.hadoop.mapred.JobClient
> > configureCommandLineOptions
> > WARNING: No job jar file set.  User classes may not be found. See
> > JobConf(Class) or JobConf#setJar(String).
> > 10-Jun-2010 13:25:04 org.apache.hadoop.mapred.FileInputFormat listStatus
> > INFO: Total input paths to process : 1
> > 10-Jun-2010 13:25:05 org.apache.hadoop.mapred.JobClient
> monitorAndPrintJob
> > INFO: Running job: job_local_0001
> > 10-Jun-2010 13:25:05 org.apache.hadoop.mapred.FileInputFormat listStatus
> > INFO: Total input paths to process : 1
> > 10-Jun-2010 13:25:05 org.apache.hadoop.util.NativeCodeLoader <clinit>
> > WARNING: Unable to load native-hadoop library for your platform... using
> > builtin-java classes where applicable
> > 10-Jun-2010 13:25:05 org.apache.hadoop.io.compress.CodecPool
> > getDecompressor
> > INFO: Got brand-new decompressor
> > 10-Jun-2010 13:25:05 org.apache.hadoop.mapred.MapTask runOldMapper
> > INFO: numReduceTasks: 1
> > 10-Jun-2010 13:25:05 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
> > <init>
> > INFO: io.sort.mb = 100
> > 10-Jun-2010 13:25:05 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
> > <init>
> > INFO: data buffer = 79691776/99614720
> > 10-Jun-2010 13:25:05 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
> > <init>
> > INFO: record buffer = 262144/327680
> > 10-Jun-2010 13:25:05 org.apache.hadoop.mapred.LocalJobRunner$Job run
> > WARNING: job_local_0001
> > java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
> > cast to org.apache.hadoop.io.IntWritable
> >     at
> >
> org.apache.mahout.math.hadoop.TransposeJob$TransposeMapper.map(TransposeJob.java:1)
> >     at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> >     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> >     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> >     at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> > 10-Jun-2010 13:25:06 org.apache.hadoop.mapred.JobClient
> monitorAndPrintJob
> > INFO:  map 0% reduce 0%
> > 10-Jun-2010 13:25:06 org.apache.hadoop.mapred.JobClient
> monitorAndPrintJob
> > INFO: Job complete: job_local_0001
> > 10-Jun-2010 13:25:06 org.apache.hadoop.mapred.Counters log
> > INFO: Counters: 0
> > Exception in thread "main" java.lang.RuntimeException:
> java.io.IOException:
> > Job failed!
> >     at
> >
> org.apache.mahout.math.hadoop.DistributedRowMatrix.transpose(DistributedRowMatrix.java:163)
> >     at
> >
> org.apache.mahout.math.hadoop.GenSimMatrixLocal.generateMatrix(GenSimMatrixLocal.java:24)
> >     at
> >
> org.apache.mahout.math.hadoop.GenSimMatrixLocal.main(GenSimMatrixLocal.java:34)
> > Caused by: java.io.IOException: Job failed!
> >     at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
> >     at
> >
> org.apache.mahout.math.hadoop.DistributedRowMatrix.transpose(DistributedRowMatrix.java:158)
> >     ... 2 more
> >
> >
> > I created a test solr index with 3 documents and generated a sparse
> feature
> > matrix out of it using mahout's
> > org.apache.mahout.utils.vectors.lucene.Driver.
> >
> > I then ran the following code using the sparse feature matrix as input
> > (mahoutIndexTFIDF.vec).
> >
> > {
> >     private void generateMatrix() {
> >         String inputPath = "/home/kris/data/mahoutIndexTFIDF.vec";
> >         String tmpPath = "/tmp/matrixMultiplySpace";
> >         int numDocuments = 3;
> >         int numTerms = 4;
> >
> >         DistributedRowMatrix text = new DistributedRowMatrix(inputPath,
> >           tmpPath, numDocuments, numTerms);
> >
> >         JobConf conf = new JobConf("similarity job");
> >         text.configure(conf);
> >
> >         DistributedRowMatrix transpose = text.transpose();
> >
> >         DistributedRowMatrix similarity = transpose.times(transpose);
> >
> >         System.out.println("Similarity matrix lives: " +
> > similarity.getRowPath());
> >     }
> >
> >     public static void main (String [] args) {
> >         GenSimMatrixLocal similarity = new GenSimMatrixLocal();
> >
> >         similarity.generateMatrix();
> >     }
> > }
> >
> > Anyone see why there is a problem between LongWritable and IntWritable
> > casting?  Does it need to be configured differently?
> >
> > Thanks,
> > Kris
> >
> >
> >
> >
>
>
> --
> Dr Kris Jack,
> http://www.mendeley.com/profiles/kris-jack/
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message