mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank Scholten <fr...@frankscholten.nl>
Subject Vectorizing arbitrary value types with seq2sparse
Date Fri, 06 May 2011 20:02:02 GMT
Hi everyone,

At the moment seq2sparse can generate vectors from sequence values of
type Text. More specifically, SequenceFileTokenizerMapper handles Text
values.

Would it be useful if seq2sparse could be configured to vectorize
value types such as a Blog article with several textual fields like
title, content, tags and so on?

Or is it easier to create a separate job for this or use Pig or
anything like that?

Frank

Mime
View raw message