mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colum Foley <columfo...@gmail.com>
Subject Elephant-Bird SequenceFileStorage VectorWritable Producing Empty Vectors
Date Fri, 01 Mar 2013 11:39:23 GMT
Hello,


I am trying to store Mahout RandomAccessSparseVector using
elephant-bird and pig. The data is of the form
key(text),value(RandomAccessSparseVector). when I run pig describe it
presents the following:

pair: {key: int,val: (cardinality: int,entries: {entry: (index:
int,value: double)})}

My problem is that when I try to store tuples using elephant-bird's
SequenceFileStorage as follows:

store clusteredOut into 'logsvectors.dat' using
com.twitter.elephantbird.pig.store.SequenceFileStorage (
   '-c com.twitter.elephantbird.pig.util.TextConverter',
   '-c com.twitter.elephantbird.pig.mahout.VectorWritableConverter  -- -sparse'
);

It runs successfully but when I examine the resulting Sequencefile all
the vectors are empty.

On the other hand, if I run the following instead:

store clusteredOut into 'logsvectors.dat' using
com.twitter.elephantbird.pig.store.SequenceFileStorage ();

ie do not specify the types of the key or value.

The vectors are non-empty but are of type text..and this causes my
clustering algorithm to fail(as they are expecting VectorWritable).

So my problem is that I need to output in VectorFileFormat, but when I
do the resulting vectors are empty.

Anyone else have experience with this issue?

Many thanks,
Colum

Mime
View raw message