mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dean Jones <dean.m.jo...@gmail.com>
Subject Re: Alternative Naive Bayes Datastore?
Date Thu, 16 Sep 2010 12:07:35 GMT
Hi Drew,

On 15 September 2010 13:58, Drew Farris <drew.farris@gmail.com> wrote:
> Hi Dean,
>
> Does jdbm only support java-based serialization? From my experience
> I've seen that java's serialization is generally an order of magnitude
> slower and less space efficient than the equivalent hand-rolled
> serialization such as you'd find in implementations of the Writable
> class. That is precisely why you won't see Serializable used much in
> Mahout. Perhaps you could use RandomAccessSparseVector combined with
> VectorWritable to read/write to/from a byte array backed
> DataOutput/DataInput stream?
>

As far as I can tell, the jdbm HTree and BTree stores do only support
java serialization. I've patched RandomAccessSparseVector locally to
implement Serializable, and this is now working for me. I absolutely
agree that the native java serialization is not the best, but it is
widely used by popular persistence and ipc mechanisms, which may be
something you want to consider.

For me, we have our own sparse vector implementations, so I can switch
to using those, but that would obviously impact my ability to
contribute this work back to mahout, which is something I'd like to do
(and have got clearance from my employer for).

Hmm, maybe I should create an issue on the mahout jira, attach a
patch, and we could take it from there?

Dean.

Mime
View raw message