mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Kaplan <>
Subject Mahout Clustering Help Please
Date Wed, 12 Aug 2015 12:49:29 GMT
Hi all,
Hope someone can please point me in the right direction,
Very new to mahout..
Here's my scenario:

I have written a system that collects Classifieds items from multiple
websites - phones,cars,antiques and many more using scrapy, all the items
are then ingested into Solr - +- 3 million entries.
 This is then the backend for my search engine

 I want to be able to extract meaningful information to accurately
calculate realistic price average etc. I need guidance/perhaps examples in
accurate outlier detection, categorization etc extreme beginner in machine
learning so need to know if that's what I should be using

 Part of my challenge is the broad range of items/categories, different
levels of skewed data etc. e.g. finding outliers with "iphone" results when
many of those are cheap iphone accessories.

Basically it seems i need to cluster/classify but not sure exactly how to
go about it, because i do already have the categories for 500K of the
entries, example category "Cell Phones & Accessories - Accessories"

And then actually connecting Mahout to Solr...

Many thanks!

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message