mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sameer Tilak <ssti...@live.com>
Subject Elephant-Bird, Pig, and Mahout
Date Thu, 05 Dec 2013 17:35:56 GMT
Hi All,
I have some question about using EB's VectorWritableConverter in my Pig script for data vectorization.
I am generating the tuples using a UDF, however for 
simplicity I am loading the data from a file in the following code. My 
UDF returns tuples of the form (1,0,1,1...) etc.

My map.dat file has the following format:

1,0,1,1
0,1,1,1,
0,0,1,1,
1,1,0,0,
.......
.......
........

I register the necessary jar files. 

%declare SEQFILE_LOADER 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
%declare TEXT_CONVERTER 'com.twitter.elephantbird.pig.util.TextConverter';
%declare LONG_CONVERTER 'com.twitter.elephantbird.pig.util.LongWritableConverter';
%declare VECTOR_CONVERTER 'com.twitter.elephantbird.pig.mahout.VectorWritableConverter';

/* Loading from a file instead of UDF for simplicity */

A = LOAD 'map.dat';

/*
 I am not sure how to use the VectorWritableConverter to convert tuple 
in the relation A to a vector using VectorWritableConverter */
B = FOREACH A GENERATE $VECTOR_CONVERTER();

DUMP B;
 		 	   		  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message