mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Palumbo <ap....@outlook.com>
Subject RE: Using existing model to train again
Date Mon, 26 May 2014 16:25:17 GMT
Hi Subbu,  

There is currently no way to update an already trained Naive Bayes Model.  You'd have to retrain
on the full 2 million records.  

You could probably hack TrainNaiveBayesJob.java [1] to meet your needs if you anticipated
this as something that you'd need to do in the future, but your new data will have to be vectorized
in the exact same manner as the original data to update the model correctly- this would limit
you to pure term frequencies (no IDF transformation) and would not allow for anything like
maxDFPercent, etc.

Andy

[1]https://github.com/apache/mahout/blob/master/mrlegacy/src/main/java/org/apache/mahout/classifier/naivebayes/training/TrainNaiveBayesJob.java


> Hi team,
> I have trained a model in naive Bayes using training data of 1 million
> records. Now I have another 1 million records . Can I add this new training
> data to the existing model and train it again to get a new model instead of
> passing all the 2 million records at once to get a model.
>
> Thanks,
> Subbu
>
 		 	   		  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message