mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Create vector from text
Date Thu, 11 Oct 2012 06:59:10 GMT
You have to tokenize your text and then use some form of vector encoding.

If you have a known dictionary of all interesting words, you can simply
make a vector as long as the number of words in your dictionary and put a 1
in the right place.

If you don't want to do that either because you don't know all the words in
advance or because the number of words is too large, you can use
a TextValueEncoder to do the deed.  There is sample code in the Mahout in
Action code for this and Chapter 14 in Mahout in Action talks about the
code.  You can get the code from http://github.com/tdunning/MiA

On Wed, Oct 10, 2012 at 11:44 PM, JAGANADH G <jaganadhg@gmail.com> wrote:

> Hi All
>
> As of mahout 0.7 a classifier takes vector for classification.
> an anybody guide me how to create vector from text. I am not looking to
> create vector from a file stored in HDFS or local file system.
> In runtime my system will be recieving text input to perform
> classification.
>
> Best regards
>
> --
> **********************************
> JAGANADH G
> http://jaganadhg.in
> *ILUGCBE*
> http://ilugcbe.org.in
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message