mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <dlie...@gmail.com>
Subject Re: Top items in a vector
Date Mon, 25 Apr 2011 21:55:54 GMT
assuming of course that you don't have enough space to put all
distinct items into the heap?

On Mon, Apr 25, 2011 at 2:03 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
> if you "forget" some of the items you saw along with the counters,
> how'd you add them back to the heap?
>
>
> On Mon, Apr 25, 2011 at 1:08 PM, Jake Mannix <jake.mannix@gmail.com> wrote:
>> On Mon, Apr 25, 2011 at 1:03 PM, Dmitriy Lyubimov <dlieu.7@gmail.com> wrote:
>>
>>> There's nothing in Mahout to do this (afaik).
>>>
>>> There's a statistical method for this doing that in linear time, which
>>> is very easy to implement (and btw would make an excellent addition to
>>> Mahout), google up 'countsketch'.
>>>
>>
>> Taking top k items in a list of length n, when n >> k, is already O(n), if
>> you check the bottom of heap first before inserting.  Not sure if
>> countsketch
>> is necessary for the usual use case.
>>
>>  -jake
>>
>

Mime
View raw message