mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Krugler <kkrugler_li...@transpac.com>
Subject Part 2 blog post on extracting text features
Date Mon, 22 Jul 2013 02:57:34 GMT
Hi Mahouters,

I just posted part 2 of a series on extracting text features for machine learning…

http://www.scaleunlimited.com/2013/07/21/text-feature-selection-for-machine-learning-part-2/

The top five terms (by LLR score) in emails written by Ted are now u_k, v_k, sgd, regress,
and categori. Which is way better than the very first results (see previous blog post), which
were v3, 3, v2, q, and 0.00000

Regards,

-- Ken

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr






Mime
View raw message