mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bhaskar Ghosh <bjgin...@yahoo.co.in>
Subject Re: unknown test data twenty-newsgroups example
Date Fri, 01 Oct 2010 10:33:13 GMT
Hi Robin/Neil,

I was also trying the 20Newsgroups example, and was following your conversation. 
I am confused now with the use of the word 'instance'.
I actually could not get the meaning of these lines:

extra file or extra line, duplicated instances(to decrease the weights) or
>duplicate feature in the same instance to increase the weights(classic
>tf-idf)
 
Let me list what I understood. Pl confirm if I got it correct?

Add duplicate extra lines many times in an extra file (conforming to the format 
required by the Bayes Classifier) in the format 
><class-name1><tab><word1> <word2>
>If I want to increase the weight of word1 and word2, so that text with those 
>words have higher chance of getting classified as <class-name1>
Thanks
Bhaskar Ghosh
Hyderabad, India

http://www.google.com/profiles/bjgindia

"Ignorance is Bliss... Knowledge never brings Peace!!!"




________________________________
From: Robin Anil <robin.anil@gmail.com>
To: neil.ghosh@gmail.com
Cc: user@mahout.apache.org
Sent: Thu, 30 September, 2010 9:59:47 PM
Subject: Re: unknown test data twenty-newsgroups example

On Thu, Sep 30, 2010 at 9:45 PM, Neil Ghosh <neil.ghosh@gmail.com> wrote:
>
> Do you mean , I should 1st create the model with correct data in correct
> folder (Label).
>
>
Now you throw an instance at it and you will get the correct label, well
most of the time.



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message