mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bhaskar Ghosh <>
Subject Re: unknown test data twenty-newsgroups example
Date Fri, 01 Oct 2010 10:33:13 GMT
Hi Robin/Neil,

I was also trying the 20Newsgroups example, and was following your conversation. 
I am confused now with the use of the word 'instance'.
I actually could not get the meaning of these lines:

extra file or extra line, duplicated instances(to decrease the weights) or
>duplicate feature in the same instance to increase the weights(classic
Let me list what I understood. Pl confirm if I got it correct?

Add duplicate extra lines many times in an extra file (conforming to the format 
required by the Bayes Classifier) in the format 
><class-name1><tab><word1> <word2>
>If I want to increase the weight of word1 and word2, so that text with those 
>words have higher chance of getting classified as <class-name1>
Bhaskar Ghosh
Hyderabad, India

"Ignorance is Bliss... Knowledge never brings Peace!!!"

From: Robin Anil <>
Sent: Thu, 30 September, 2010 9:59:47 PM
Subject: Re: unknown test data twenty-newsgroups example

On Thu, Sep 30, 2010 at 9:45 PM, Neil Ghosh <> wrote:
> Do you mean , I should 1st create the model with correct data in correct
> folder (Label).
Now you throw an instance at it and you will get the correct label, well
most of the time.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message