mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Schelter <...@apache.org>
Subject Re: Fwd: Mahout Naive Bayes CSV Classification
Date Sun, 04 May 2014 11:25:51 GMT
Hi Jossef,

You have to vectorize and normalize your data. The input for naive bayes 
is a sequencefile containing a Text object as key (your label) and a 
VectorWritable that holds a vector with the data.

Instructions to run NaiveBayes can be found here:

https://mahout.apache.org/users/classification/bayesian.html

--sebastian


On 05/03/2014 07:40 PM, Jossef Harush wrote:
> I have these 2 CSV files:
>
>     1. train-set.csv
>     2. test-set.csv
>
> Both of them are in the same structure (with different content) and similar
> to this example (http://i.stack.imgur.com/jsckr.png) :
>
> [image: enter image description here]
>
> Each column is a feature and the last column - class, is the name of the
> class to predict.
>
> .
>
> *Can anyone please provide a sample code for:*
>
>     1. Initializing Naive Bayes with a CSV file (model creation, training,
>     required pre-processing, etc...)
>     2. For a given CSV row - predicting a class
>
> Thanks!
>
> .
>
> .
>
> BTW -
>
> I'm using Mahout 0.9 and Hadoop 2.4 and iv'e already tried to follow these
> links:
>
> http://web.archiveorange.com/archive/v/y0uRZw9Q4iHdjrm4Rfsu
> http://chimpler.wordpress.com/2013/03/13/using-the-mahout-naive-bayes-classifier-to-automatically-classify-twitter-messages/
>
> .
> ‚Äč
>


Mime
View raw message