mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <sro...@gmail.com>
Subject Re: Big Longs in RecommenderJob
Date Tue, 08 Jun 2010 11:42:55 GMT
I committed a change to just Varint. It is clever enough that I award
myself a pat on the back:

  public static long readSignedVarLong(DataInput in) throws IOException {
    long raw = readUnsignedVarLong(in);
    return (((raw << 63) >> 63) ^ raw) >> 1;
  }

becomes

  public static long readSignedVarLong(DataInput in) throws IOException {
    long raw = readUnsignedVarLong(in);
    long temp = (((raw << 63) >> 63) ^ raw) >> 1;
    return temp ^ ((raw >> 63) << 63);
  }

and likewise for writing. It basically treats negative values as
unsigned when asked to write unsigned and all is well.

Obvious right?


On Tue, Jun 8, 2010 at 1:46 AM, Sean Owen <srowen@gmail.com> wrote:
> Really, the mistake here (is mine and) is writing these IDs as signed
> values. As used in the recommender bit, the IDs are already
> nonnegative longs and so can be written with the current
> implementation just fine, if encoded as unsigned.
>
> That is part 2 of what I should change here since it will increase
> encoding efficiency a little.
>
> On Tue, Jun 8, 2010 at 12:36 AM, Ted Dunning <ted.dunning@gmail.com> wrote:
>> The other solution would be to be satisfied with 62 bits of id space and
>> only generate "small" longs.
>>
>> On Mon, Jun 7, 2010 at 3:39 PM, Sean Owen <srowen@gmail.com> wrote:
>>
>>> Yeah the problem is that signed values are zig-zag encoded into an
>>> unsigned value, which loses 1 bit, in addition to losing another bit
>>> by mapping to unsigned values.
>>>
>>> Still there is definitely a way to make it work; the encoding is
>>> certainly defined for larger values and there is a need for it. I can
>>> work on the right fix.
>>>
>>
>

Mime
View raw message