mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alok Tanna <tannaa...@gmail.com>
Subject Mahout : 20-newsgroups Classification Example : Split command
Date Thu, 14 Jan 2016 19:31:58 GMT
Hi ,

This request is in referece to the 20-newsgroups Classification Example on
the below link
https://mahout.apache.org/users/classification/twenty-newsgroups.html

I am able to run the example and get the results as mentioned in the link,
but when I am trying to do this example without the split command the
results are not same. Also when I try to run the other test data against
the same model results are not accurate.

Can we have this example run without the split command ?

Basically I am trying to do this :

I took both the datasets for training & testing.

Run below commands on both sets:
1. seqdirectory
2. seq2sparse

Now I  have vectors generated for both datasets.
- Run trainnb command using first dataset's vectors output. So instead of
training a model on 80% of the data, I am  using the whole dataset.
- Run testnb command using second dataset's vectors output. This is not the
20% of the data, it's completely new dataset, solely used for testing.

So instead of using mahout split, we I have specified separate dataset for
testing the model.

Results for this exercise is totally different then what I get when I am
using split command to split the data .


Thanks & Regards,

Alok R. Tanna

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message