mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank Scholten <>
Subject Annotation based vectorizer
Date Mon, 03 Feb 2014 21:52:37 GMT
Hi all,

I put together a utility which vectorizes plain old Java objects annotated
with @Feature and @Target via Mahout's vector encoders.

See my Github branch:

and the unit test:

Use it like this:

class NewsgroupPost {

  private String newsgroup;

  @Feature(encoder = TextValueEncoder.class)
  private String newsgroup;

  // Getters & setters


AnnotationBasedVectorizer<NewsgroupPost> vectorizer = new

Here the vectorizer scans the NewsgroupPost's annotations. Then you can do

NewsgroupPost post = ...

Vector vector = vectorizer.vectorize(post);
int target = vectorizer.getTarget(post);
int numFeatures = vectorizer.getNumberOfFeatures();

Note that vectorize() and getTarget() methods are genericly typed and due
to the type token passed in the constructor we can enforce that only
NewsgroupPosts are accepted.

The vectorizer uses a Dictionary for encoding the target.




  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message