mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From vaibhav srivastava <>
Subject Re: Naive Bayes Classifier Sentiment Analysis
Date Tue, 29 Jul 2014 13:24:27 GMT
If you want to create a test set and if you do not want to measure accuracy.
Then you can make an instance of claasifier and load your model on that
classifier and then can find the best score.
Look at  navie bayes test code.
Hope this help. Thanks .
On 29 Jul 2014 12:53, "Luca Filipponi" <> wrote:

> Hi , I am trying to develop sentiment analysis on italian tweet from
> twitter using the naive bayes classifier, but I've some trouble.
> My idea was to classify a lot of tweet as positive, negative or neautral,
> and using that as training set for the Classifier. To do that I've wrote a
> sequence file, in the format <Text,Text>, where in the key there is
>  /label/tweetID and in the key the text, and then the text of all the
> dataset is converted in tfidf vector, using mahout utilities.
> Then I'm using the command:
> ./mahout trainnb and ./mahout testnb to check the classifier, and the
> score is right (I've got nearly 100% because the test set is the same as
> the train set)
> My question is if I want to use a test set that is unlabeled how should it
> be created? because if the format isn't like:
> key = /label/  the classifier can't find the label and I've got an
> exception
> but in a new dataset, obviously this will be unlabeled because i need to
> classify that, so I don't know what put in the key of the sequence file.
> I've searched online for some example, but the only ones that I've found
> use the split command, on the original dataset, and then testing on part of
> that, but isn't my case.
> Every idea for developing a better sentiment analysis is welcome, thanks
> in advance for the help.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message