mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: RecommenderJob uses indirection for ItemIDs
Date Sun, 12 Jun 2011 23:28:48 GMT
No all vectors here use int to express dimension. It is nothing to do with
sparseness.
On Jun 13, 2011 12:26 AM, "Lance Norskog" <goksron@gmail.com> wrote:
> Ah! So if it was a sparse vector it could be indexed directly. Or the
> mapping could be with a hash-indexed representation as used with
> Lucene vectors.
>
> On Sun, Jun 12, 2011 at 3:43 AM, Sean Owen <srowen@gmail.com> wrote:
>> The keys have to be hashed to be used as int offsets into a vector. While
>> loading the mapping isn't ideal it does only scale as the number of items
>> and users.
>>  On Jun 12, 2011 3:47 AM, "Lance Norskog" <goksron@gmail.com> wrote:
>>> The RecommenderJob makes a "side" file which maps a fabricated integer
>>> index to a long ItemID. Why is this needed? Couldn't the
>>> RecommenderJob propagate the long ItemID directly? Note that this
>>> forces all instances of AggregateAndReduceRecommender to load the
>>> entire map. Part of the Map/Reduce rules are 'nothing needs to know
>>> everything'.
>>>
>>> Is this a sparse/dense optimization? If so, have the distributed
>>> algorithms advanced enough to make this indirection unnecessary?
>>>
>>> --
>>> Lance Norskog
>>> goksron@gmail.com
>>
>
>
>
> --
> Lance Norskog
> goksron@gmail.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message