mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: Needs clue to create a Proof of Concept recommender
Date Tue, 09 Aug 2011 07:19:04 GMT
You need some glue code here -- what you need to create in Java is a
SequenceFile.Writer, and feed that to a VectorWritable, which knows how to
write vectors in the right format. It's straightforward but needs some
coding. There's no magic that ingests SQL and outputs this.

Yes, but where the the memory error? then we can say what setting to change.
Is it a Hadoop worker?

OK, so we're on clustering, good to clarify. So the question is just how to
get the input in the right place and format and how to avoid that error?

On Tue, Aug 9, 2011 at 8:15 AM, Jeffrey <mycyberpet@yahoo.com> wrote:

> Hi Sean,
>
> Thanks for the help, is currently reading <
> http://wiki.apache.org/hadoop/SequenceFile> for more information (please
> let me know if I am not reading the right document). So in short, by using
> the API, I can produce a SequenceFile by feeding the sql result containing
> image and tag data into it?
>
> OME - Out of Memory Error lol (for more information on my attempt to
> cluster my test data, please refer to <
> http://mahout.markmail.org/search/?q=#query:+page:30+mid:nseo36uopmgat5iv+state:results>,
> let me know if the link is broken)
>
> Yea, I am making a recommender, but I can't implement the whole thing at
> once and I have no idea how to implement the other parts right now (yea,
> have the habit of breaking a project into small parts). My current task is
> to implement the tag clustering component as mentioned in the previous mail.
>
> @Jeffrey04
>
> ------------------------------
> *From:* Sean Owen <srowen@gmail.com>
> *To:* user@mahout.apache.org; Jeffrey <mycyberpet@yahoo.com>
> *Sent:* Tuesday, August 9, 2011 2:54 PM
> *Subject:* Re: Needs clue to create a Proof of Concept recommender
>
> You don't need ARFF, no. You can write some Java code to write a
> SequenceFile directly, one entry at a time. It would take a little study of
> the code to understand how it works but it's probably just 10 lines.
>
> What is the "OME" error?
>
> Results can live wherever you want; HDFS is the most natural choice for a
> SequenceFile.
>
> You say you're making a recommender but sounds like your task now is
> clustering?
>
> On Tue, Aug 9, 2011 at 7:27 AM, Jeffrey <mycyberpet@yahoo.com> wrote:
>
> Hi,
>
> I am trying to implement a recommender system for my postgraduate project.
> I currently have all my data (collected using flickr API) stored in the
> MySQL database in RDF form using Redland <http://librdf.org> (lol, PHP is
> my main language hence Redland).
>
> The recommender system is basically designed similarly with the paper
> published by Jonathan Gemmell et. al (reference listed below), where tag
> clusters are also generated to find out the similarity measure between
> clusters and items/users (hence was really frustrating when I failed to dump
> the points for fuzzy k-means cluster). I am currently reading some articles
> on implementing taste (recommender framework) with mahout but the use cases
> described in the article are quite different than what I am about to
> implement.
>
> I am still trying to build the tag clusters properly now. Each tag is now
> represented as a vector of resources (each equivalent to a row in item-tag
> matrix), I am currently generate the vector by converting a pre-generated
> arff by following this tutorial <
> https://cwiki.apache.org/confluence/display/MAHOUT/Creating+Vectors+from+Weka%27s+ARFF+Format>.
> Is there another way of doing this (is it possible to generate the vectors
> without first generate arff)? I have also read this <
> https://cwiki.apache.org/confluence/display/MAHOUT/Creating+Vectors+from+Text>
> but can't seem to relate it to my use case right now.
>
> Since I can't dump the points for the clusters using cluster dumper (keep
> getting OME) I would probably calculate the degree of membership manually.
> Where should I store the result (MySQL via JDBC? Hadoop Bigtable?
> Cassandra?) so that I can reuse it later for further calculation (eg.
> similarity of an item with a cluster)?
>
> Reference:
> Shepitsen, Andriy; Gemmell, Jonathan; Mobasher, Bamshad; Burke
> Robin. Personalized Recommendation in Folksonomies. Proceedings of the 2nd
> International Conference on Recommender Systems. Lausanne, Switzerland.
> October 23, 2008.
>
> p/s: I probably really should find a copy of "Mahout in Action" since I
> keep seeing it being recommended.
>
> best wishes,
> Jeffrey04
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message