mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yash Patel <yashpatel1...@gmail.com>
Subject Re: User based recommender
Date Wed, 03 Dec 2014 14:22:41 GMT
I figured out how to parse csv files and use a map of Userid,item id and
build a normal recommender,which gives user a recommendation of some items.

Although this method isn't able to utilize all my data considering its only
using two columns.

I have multiple different columns such as category,shipping location,item
price,online user, etc.

How can i use all these different columns and improve recommendation
quality(ie.calculate more precise similarity between users by use of
location,item price) ?

Best Regards,
Yash Patel



On Sat, Nov 29, 2014 at 10:47 PM, Yash Patel <yashpatel1230@gmail.com>
wrote:

> Thank you for the guidance.
>
> I will try building something rough and ask questions if i run into any
> errors.
>
>
>
>
> On Sat, Nov 29, 2014 at 10:38 PM, Pat Ferrel <pat@occamsmachete.com>
> wrote:
>
>> The Mahout site is a good starting point for using any of the
>> recommenders.
>>
>> http://mahout.apache.org/users/recommender/intro-itembased-hadoop.html
>>
>> On Nov 29, 2014, at 1:33 PM, Yash Patel <yashpatel1230@gmail.com> wrote:
>>
>> Can you give me some more details on the Hadoop mapreduce item-based
>> cooccurrence recommender.
>>
>>
>> Best Regards,
>> Yash Patel
>>
>> On Fri, Nov 28, 2014 at 7:21 PM, Pat Ferrel <pat@occamsmachete.com>
>> wrote:
>>
>> > I built this app with it: https://guide.finderbots.com
>> >
>> > The app uses MongoDB, Ruby on Rails, and Solr 4.3. Once the model comes
>> > out of the job it is csv text—therefore language and architecture
>> neutral.
>> > I load the data from spark-itemsimilarity into MongoDB using java. Solr
>> is
>> > set up for full-text indexing and queries using data from MongoDB. The
>> > queries are made to Solr through REST from Ruby UX code. You can replace
>> > any component in this stack with whatever you wish and use whatever
>> > language you are comfortable with.
>> >
>> > Alternatively you could modify the UI of Solr or Elasticsearch—both are
>> in
>> > Java.
>> >
>> > If you use any of the other Mahout recommenders they create all recs for
>> > all known users so you’ll still need to build a way to serve those
>> results.
>> > People often use DBs for this and integrate with their web app
>> framework.
>> >
>> > On Nov 28, 2014, at 10:03 AM, Yash Patel <yashpatel1230@gmail.com>
>> wrote:
>> >
>> > I looked up spark row similarity but i am not sure if it will suit my
>> needs
>> > as i want to build my recommender as a java application possibly with an
>> > interface.
>> >
>> >
>> > On Fri, Nov 28, 2014 at 5:43 PM, Pat Ferrel <pat@occamsmachete.com>
>> wrote:
>> >
>> >> Some references:
>> >>
>> >> small free book here, which talks about the general idea:
>> >> https://www.mapr.com/practical-machine-learning
>> >> preso, which talks about mixing actions or other indicators:
>> >>
>> >
>> http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/
>> >> two blog posts:
>> >>
>> >
>> http://occamsmachete.com/ml/2014/08/11/mahout-on-spark-whats-new-in-recommenders/
>> >>
>> >
>> http://occamsmachete.com/ml/2014/09/09/mahout-on-spark-whats-new-in-recommenders-part-2/
>> >> mahout docs:
>> >>
>> http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html
>> >>
>> >> Build Mahout from this source: https://github.com/apache/mahout This
>> > will
>> >> run stand-alone on a dev machine, then if your data is too big for a
>> > single
>> >> machine you can run it on a Spark + Hadoop cluster. The data this
>> creates
>> >> can be put into a DB or indexed directly by a search engine (Solr or
>> >> Elasticsearch). Choose the search engine you want then queries of a
>> > user’s
>> >> item id history will go there--results will be an ordered list of item
>> > ids
>> >> to recommend.
>> >>
>> >> The core piece is the command line job: “mahout spark-itemsimilarity”,
>> >> which can parse csv data. The options specify what columns are used for
>> > ids.
>> >>
>> >> Start out simple by looking only at user and item IDs. Then you can add
>> >> other cross-cooccurrence indicators for multiple actions later pretty
>> >> easily.
>> >>
>> >>
>> >> On Nov 28, 2014, at 12:14 AM, Yash Patel <yashpatel1230@gmail.com>
>> > wrote:
>> >>
>> >> The mahout + search engine recommender seems what would be best for the
>> >> data i have.
>> >>
>> >> Kindly get back to me at your earliest convenience.
>> >>
>> >>
>> >>
>> >> Best Regards,
>> >> Yash Patel
>> >>
>> >> On Thu, Nov 27, 2014 at 9:58 PM, Pat Ferrel <pat@occamsmachete.com>
>> > wrote:
>> >>
>> >>> Mahout has several recommenders so no need to create one from
>> > components.
>> >>> They all make use of the similarity of preferences between
>> users—that’s
>> >> why
>> >>> they are in the category of collaborative filtering.
>> >>>
>> >>> Primary Mahout Recommenders:
>> >>> 1) Hadoop mapreduce item-based cooccurrence recommender. Creates all
>> > recs
>> >>> for all users. Uses “Mahout IDs"
>> >>> 2) ALS-WR hadoop mapreduce, uses matrix factorization to reduce noise
>> in
>> >>> the data. Sometimes better for small data sets than #1. Uses “Mahout
>> > IDs"
>> >>> 3) Mahout + search engine: cooccurrence type. Extremely flexible,
>> works
>> >>> with multiple actions (multi-modal), works for new users that have
>> some
>> >>> history, has a scalable server (from the search engine) but is more
>> >>> difficult to integrate than #1 or #2. Uses your own ids and reads csv
>> >> files.
>> >>>
>> >>> The rest of the data seems to apply either to the user or the item and
>> > so
>> >>> would be used in different ways. #1 an #2 can only use user id and
>> item
>> >> id
>> >>> but some post recommendation weighting or filtering can be applied.
#3
>> >> can
>> >>> use multiple attributes in different ways. For instance if category
is
>> > an
>> >>> item attribute you can create two actions, user-pref-for-an-item, and
>> >>> user-pref-for-a-category. Assuming you want to recommend an item (not
>> >>> category) you can create a cross-ccoccurrence indicator for the second
>> >>> action and use the data to make your item recs better. #3 is the only
>> >>> methods that supports this.
>> >>>
>> >>> Pick a recommender and we can help more with data prep.
>> >>>
>> >>>
>> >>> On Nov 26, 2014, at 1:34 PM, Yash Patel <yashpatel1230@gmail.com>
>> > wrote:
>> >>>
>> >>> Hello everyone,
>> >>>
>> >>> wow i am quite happy to see so many inputs from people.
>> >>>
>> >>> I apologize for not providing more details.
>> >>>
>> >>> Although this is not my complete dataset the fields i have chosen to
>> use
>> >>> are:
>> >>>
>> >>> customer id - numeric
>> >>> item id - text
>> >>> postal code - text
>> >>> item category ´- text
>> >>> potential growth - text
>> >>> territory - text
>> >>>
>> >>>
>> >>> Basically i was thinking of finding similar users and recommending
>> them
>> >>> items that users like them have bought but they haven't.
>> >>>
>> >>> Although i would very much like to hear your opinions as i am not so
>> >>> familiar with clustering,classifiers etc.
>> >>>
>> >>> I found that mahout takes sequence files converted into vectors but
i
>> >>> couldn't understand how would i do it on my data specifically and more
>> >>> importantly make a recommender system out of it.
>> >>>
>> >>> Also i am wondering how to combine the importance of a specific
>> customer
>> >>> through the potential growth attribute.
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> Best Regards,
>> >>> Yash Patel
>> >>>
>> >>> On Wed, Nov 26, 2014 at 9:03 PM, Pat Ferrel <pat@occamsmachete.com>
>> >> wrote:
>> >>>
>> >>>> All very good points but note that spark-itemsimilarity may take
the
>> >>> input
>> >>>> directly since you specify column numbers for
>> <UID><ITEMID><PREF_VALUE>
>> >>>>
>> >>>> On Nov 26, 2014, at 11:43 AM, parnab kumar <parnab.2007@gmail.com>
>> >>> wrote:
>> >>>>
>> >>>> kindly elaborate... your requirements... your dataset fields ...and
>> > what
>> >>>> you want to recommend to an user... Usually a set of item is
>> > recommended
>> >>> to
>> >>>> an user. In your case what are your items ?
>> >>>>
>> >>>> The standard input is <UID><ITEMID><PREF_VALUE>
. Clearly your data
>> is
>> >>> not
>> >>>> in this format which will let you use directly the algorithms in
>> > Mahout.
>> >>>>
>> >>>> A little more info from your side will help us to give your the
right
>> >>>> pointers.
>> >>>>
>> >>>> On Wed, Nov 26, 2014 at 7:16 PM, Yash Patel <yashpatel1230@gmail.com
>> >
>> >>>> wrote:
>> >>>>
>> >>>>> Dear Mahout Team,
>> >>>>>
>> >>>>> I am a student new to machine learning and i am trying to build
a
>> user
>> >>>>> based recommender using mahout.
>> >>>>>
>> >>>>> My dataset is a csv file as an input but it has many fields
as text
>> > and
>> >>> i
>> >>>>> understand mahout needs numeric values.
>> >>>>>
>> >>>>> Can you give me a headstart as to where i should start and what
kind
>> > of
>> >>>>> tools i need to parse the text colummns,
>> >>>>>
>> >>>>> Also an idea on which classifiers or clustering methods i should
use
>> >>>> would
>> >>>>> be highly appreciated.
>> >>>>>
>> >>>>>
>> >>>>> Best Regards;
>> >>>>> Yash Patel
>> >>>>>
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >>
>> >>
>> >
>> >
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message