mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: [slightly off topic] Determining Importance
Date Wed, 19 Jan 2011 16:33:22 GMT
Yes.  The logistic regression (aka SGD) stuff is ideal for this kind of
model.  Since all those models are Writables, implementing any storage
scheme is as easy as spilling the model to a byte array and writing that
where you like.  It might be useful to sparsify the matrices internal to the
model before doing so to save space but the basic idea will work and the
default regularizer is pretty aggressive about forcing coefficients to zero.

In general, the SGD package pushes towards fast handling of small inputs,
easy integration and and easy deployment at the expense of ultimate
scalability.  Other models have different trade-offs.

On Wed, Jan 19, 2011 at 5:49 AM, Grant Ingersoll <gsingers@apache.org>wrote:

> Finally got around to reading this, thanks for the link.  They put forth
> basically what Robin said as a model: global x user = importance.
>
> It strikes me that Mahout has some/many of the core pieces of this puzzle
> with the addition of logistic regression stuff (especially when you factor
> in Ted's Shop It To Me use case in Mahout in Action).  We don't have the
> BigTable/HBase/? storage options integrated, but that can't be all that hard
> either
>
>
> On Jan 6, 2011, at 11:38 AM, Sebastian Schelter wrote:
>
> > Stumbled upon a paper that might fit into this discussion:
> >
> > "The Learning Behind Gmail Priority Inbox" http://goo.gl/DXjga
> >
> > --sebastian
> >
> > Am 05.01.2011 19:17, schrieb Ted Dunning:
> >> I wonder if the right notion is that importance is some notional
> aggregate
> >> of relevance over all users, queries and times.  The aggregate might be
> >> maximum or something similar.
> >>
> >> That would make importance be a measure of whether a resource is likely
> to
> >> ever be relevant.
> >>
> >> On Wed, Jan 5, 2011 at 6:43 AM, Niall Riddell <niall.riddell@xspca.com
> >wrote:
> >>
> >>> I think that notion of Importance implies the need for some form of
> action
> >>> based on that Information which distinguishes it from relevant or
> merely
> >>> interesting information.
> >>>
> >>> Niall
> >>>
> >>> On 4 January 2011 05:36, Robin Anil <robin.anil@gmail.com> wrote:
> >>>
> >>>> Relevance is a personal choice. Global importance + Personalization
> and
> >>> the
> >>>> ratio of the blend == Better(No one knows whats best yet :)
> >>>>
> >>>> Robin
> >>>>
> >>>>
> >>>> On Tue, Jan 4, 2011 at 9:53 AM, Lance Norskog <goksron@gmail.com>
> wrote:
> >>>>
> >>>>> Yup- the one-word story would be 'interesting' rather than
> 'relevant'.
> >>>>> Context matters: anything from the searcher to moment-to-moment
> >>>>> differences. Intertwined with this is attention.
> >>>>>
> >>>>> In econ-speak, the user has a resource called 'attention'.  You
are
> >>>>> talking about optimizing the utils received when the user spends
this
> >>>>> resource. ('util' is a unitless measure of'what you got when you
> >>>>> spent'.)
> >>>>>
> >>>>> Lance
> >>>>>
> >>>>> On Mon, Jan 3, 2011 at 2:30 PM, Grant Ingersoll <gsingers@apache.org
> >
> >>>>> wrote:
> >>>>>>
> >>>>>> On Jan 3, 2011, at 3:32 PM, Dinesh B Vadhia wrote:
> >>>>>>
> >>>>>>> We could end-up in a hair-splitting hole.  Sounds like you
want to
> >>> be
> >>>>> able to identify things (items) that are relevant and important.
 You
> >>>> could
> >>>>> also say, items that are relevant and of value.
> >>>>>>
> >>>>>> Yes, I would agree.
> >>>>>>
> >>>>>>>
> >>>>>>> Describing the use-case might help?
> >>>>>>
> >>>>>> The use case is I am writing on the topic (well, a bunch of
topics)
> >>> and
> >>>>> the thought occurred to me that an organizing principal of this
> >>>> particular
> >>>>> section is best summed up by the word Importance, namely "Identifying
> >>>>> Important Content and People".  What I would like to be able to
do is
> >>>> point
> >>>>> a user at the most relevant/important research in the area as well
as
> >>>> some
> >>>>> open source implementations that help solve the problem and also
> >>> provide
> >>>> the
> >>>>> basic theory behind it.  When I first outlined the section, I was
> >>> mainly
> >>>>> going to focus on graph algorithms like PageRank, but it occurred
to
> me
> >>>>> recently that it was broader than that.   Hence the question being
> >>> aimed
> >>>>> more at the academic side of the equation and not so much at the
> >>>>> implementation side (besides, I would agree with most others here
> that
> >>>> the
> >>>>> actual implementations focus on either categorization or graph
> >>>> approaches.)
> >>>>>>
> >>>>>> From Twitter, there were other suggestions of things to look
into:
> >>>>> significance, novelty, surprisal, information gain.
> >>>>>>
> >>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> From: Grant Ingersoll
> >>>>>>> Sent: Monday, January 03, 2011 11:41 AM
> >>>>>>> To: user@mahout.apache.org
> >>>>>>> Subject: Re: [slightly off topic] Determining Importance
> >>>>>>>
> >>>>>>>
> >>>>>>> I guess Relevance is a useful word to describe it, but I
don't
> think
> >>>> it
> >>>>> resonates as well  (that is, Joe on the street is much more likely
to
> >>> say
> >>>>> "That is important to me" than to say "That is relevant to me".)
> >>>>>>>
> >>>>>>> If we split hairs, Wikipedia defines relevance as "... how
> >>> pertinent,
> >>>>> connected, or applicable something is to a given matter."  Webster
> has
> >>>>> important as "marked by or indicative of significant worth or
> >>> consequence
> >>>> :
> >>>>> valuable in content or relationship" -- I think importance has a
> >>> stronger
> >>>>> connotation than relevance.  Under these definitions, I think
> something
> >>>> can
> >>>>> be relevant but still not be important.  Certainly everything that
is
> >>>>> important is also relevant.  And certainly all the studies around
> >>>> relevance
> >>>>> are important (!) to the discussion, but what I'm getting at is
a bit
> >>>> deeper
> >>>>> (I think, but I can be dissuaded).
> >>>>>>>
> >>>>>>> I would also agree with Ted here in that I don't think PageRank
is
> >>>>> necessarily a measure of relevance (the page, after all, is on the
> >>> given
> >>>>> matter or not based on it's keywords, but it is Important because
of
> >>> the
> >>>>> fact that everyone else has said so).  I also wonder if we aren't
> >>> clouded
> >>>> by
> >>>>> the use of relevance in search terms, particularly in keyword-based
> >>>>> approaches.  Importance to me factors in many other things (including
> >>>>> personalization).  Again, maybe I'm splitting hairs.
> >>>>>>>
> >>>>>>> -Grant
> >>>>>>>
> >>>>>>> On Jan 3, 2011, at 2:19 PM, Ted Dunning wrote:
> >>>>>>>
> >>>>>>>> That is close, but I think that there is something else
going on
> >>> with
> >>>>> this
> >>>>>>>> as well.
> >>>>>>>>
> >>>>>>>> Is page rank a measure of relevance?  Not really (to
my mind)
> >>>>>>>>
> >>>>>>>> Relevance has a strong notion of context.  What is relevant
to me
> >>> in
> >>>>> one
> >>>>>>>> moment may not be relevant the next moment.
> >>>>>>>>
> >>>>>>>> On Mon, Jan 3, 2011 at 11:13 AM, Dinesh B Vadhia
> >>>>>>>> <dineshbvadhia@hotmail.com>wrote:
> >>>>>>>>
> >>>>>>>>> Yep, what I'd call it too - relevance.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> From: Jake Mannix
> >>>>>>>>> Sent: Monday, January 03, 2011 10:48 AM
> >>>>>>>>> To: user@mahout.apache.org
> >>>>>>>>> Subject: Re: [slightly off topic] Determining Importance
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I've got one word for you, Grant:
> >>>>>>>>>
> >>>>>>>>> Relevance.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>> --------------------------
> >>>>>>> Grant Ingersoll
> >>>>>>> http://www.lucidimagination.com
> >>>>>>>
> >>>>>>
> >>>>>> --------------------------
> >>>>>> Grant Ingersoll
> >>>>>> http://www.lucidimagination.com
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Lance Norskog
> >>>>> goksron@gmail.com
> >>>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Niall Riddell
> >>> *xSpace Analytics Ltd*
> >>> *
> >>>
> >>>
> ------------------------------------------------------------------------------------------------------------
> >>> *
> >>> T: +44 161 408 3830
> >>> M:+44 778 696 3830
> >>> Skype: niall.riddell
> >>> *
> >>>
> >>>
> ------------------------------------------------------------------------------------------------------------
> >>> *
> >>>
> >>
> >
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem docs using Solr/Lucene:
> http://www.lucidimagination.com/search
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message