mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: How to index by long ID in RandomAccessSparseVector
Date Tue, 08 May 2012 08:29:29 GMT
That's right. It ought to be uncommon but can happen. For recommenders, it
"only" means that you start to treat two users or two items as the same
thing. That doesn't do much harm though. Maybe one user's recs are a little
funny.

I do think it would have been useful to index by long, but that would have
significantly increased memory requirements too.

(In developing Myrrix I have switched to use a data structure indexed by
long though, because it becomes more necessary to avoid the mapping.)

On Tue, May 8, 2012 at 9:13 AM, 冯伟 <whitepapers824@gmail.com> wrote:

> I have read some code about item-based recommendation in version-0.6,
> starting from "org.apache.mahout.cf.taste.
> hadoop.item.RecommenderJob". I found that there is a Long-to-Int mapping
> provided by the function "int TasteHadoopUtils.idToIndex(long)".
> Long-to-Int is performed both on userId and itemId. I wonder if it possible
> to have two long mapped into one int? If it is the case, then we would
> likely to merge vectors from different itemids/uids, right? This is quite
> confusing.
>
> Is it better to provide a RandomAccessSparseVector implemented by
> OpenLongDoubleHashMap instead of OpenIntDoubleHashMap? Thanks in advance.
>
> ----------------------
> Wei Feng
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message