mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Svetlomir Kasabov <>
Subject CsvRecordFactory usage recomendation
Date Sat, 11 Jun 2011 21:13:51 GMT

I have a question:

I have seen, that some of the mahout examples use the class for parsing training and test examples. Would you 
recommend this class also for actual usage in production? This would 
mean, that I should create a CSV file from my real data (in my case, it 
is in a relational database), and then use the CSV file in order to 
train my (online logistic regression) model. This approach would have 
the advantage of having the 'extracted' data as CSV which can be used 
for quick re-training, without DB access...

Or should I omit the intermediate step with the CSV file and train my 
(online logistic regression) model directly with the data from the 
relational database? Which of the both approaches would be better?

Thank you!


View raw message