mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: Needs clue to create a Proof of Concept recommender
Date Tue, 09 Aug 2011 06:54:27 GMT
You don't need ARFF, no. You can write some Java code to write a
SequenceFile directly, one entry at a time. It would take a little study of
the code to understand how it works but it's probably just 10 lines.

What is the "OME" error?

Results can live wherever you want; HDFS is the most natural choice for a
SequenceFile.

You say you're making a recommender but sounds like your task now is
clustering?

On Tue, Aug 9, 2011 at 7:27 AM, Jeffrey <mycyberpet@yahoo.com> wrote:

> Hi,
>
> I am trying to implement a recommender system for my postgraduate project.
> I currently have all my data (collected using flickr API) stored in the
> MySQL database in RDF form using Redland <http://librdf.org> (lol, PHP is
> my main language hence Redland).
>
> The recommender system is basically designed similarly with the paper
> published by Jonathan Gemmell et. al (reference listed below), where tag
> clusters are also generated to find out the similarity measure between
> clusters and items/users (hence was really frustrating when I failed to dump
> the points for fuzzy k-means cluster). I am currently reading some articles
> on implementing taste (recommender framework) with mahout but the use cases
> described in the article are quite different than what I am about to
> implement.
>
> I am still trying to build the tag clusters properly now. Each tag is now
> represented as a vector of resources (each equivalent to a row in item-tag
> matrix), I am currently generate the vector by converting a pre-generated
> arff by following this tutorial <
> https://cwiki.apache.org/confluence/display/MAHOUT/Creating+Vectors+from+Weka%27s+ARFF+Format>.
> Is there another way of doing this (is it possible to generate the vectors
> without first generate arff)? I have also read this <
> https://cwiki.apache.org/confluence/display/MAHOUT/Creating+Vectors+from+Text>
> but can't seem to relate it to my use case right now.
>
> Since I can't dump the points for the clusters using cluster dumper (keep
> getting OME) I would probably calculate the degree of membership manually.
> Where should I store the result (MySQL via JDBC? Hadoop Bigtable?
> Cassandra?) so that I can reuse it later for further calculation (eg.
> similarity of an item with a cluster)?
>
> Reference:
> Shepitsen, Andriy; Gemmell, Jonathan; Mobasher, Bamshad; Burke
> Robin. Personalized Recommendation in Folksonomies. Proceedings of the 2nd
> International Conference on Recommender Systems. Lausanne, Switzerland.
> October 23, 2008.
>
> p/s: I probably really should find a copy of "Mahout in Action" since I
> keep seeing it being recommended.
>
> best wishes,
> Jeffrey04
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message