james-server-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mihai Soloi <mihai.so...@gmail.com>
Subject Re: GSoC: Avro Serialization over HBase
Date Tue, 12 Jun 2012 11:32:36 GMT
On 12.06.2012 11:30, Eric Charles wrote:
> Hi Mihai,
> Glad to hear your exams are over (I hope they went fine) :)
Hi Eric,

Thanks, they went very well, I got high marks.
> As Ioan said, Avro serialization HBase will be deprecated in favor of 
> Protobuf (if I understand well...).

I think Avro could be changed rather easily with Protobuf as they're 
both doing basically the same thing, only that Avro uses JSON schemas 
and can be used with any other language, which is of no of value to the 
> I also like Avro because it gives you serialization & storage format 
> in one box, but is this what we want? The key point here is more an 
> effective access to the persisted data.

If the data is passed through Avro we'll have it serialized and 
deserialization is basically handled by Avro, but we'll always have to 
interact with the schemas. In Protobuf we have the objects compiled into 
our classes, from what i gather it's mostly usefull for RPC, but Avro 
also has the protocol in which by using the avro-maven-plugin you can 
generate you own classes with which to interact. I can't say I'm an 
expert in either but I fancy Avro.
> There has been a few tentatives so far to marry HBase and Lucene (see 
> [1], [2], [3] and [4] for example, see also [5] for a more recent 
> article).
Thank you for the github links, i will look thouroughly through the 
projects. I was already aware of Basene and Solandra(former Lucandra), 
they have simillar aproaches.
> The questions I am wondering:
> 1. Will you focus on a 'generic' solution (reusable outside James), or 
> on a very specific one tuned/optimized only for James mailbox needs?
I was thinking of writing generic code so that maybe it could be used 
outside of James but the data format would be specific to James mailbox 
needs, so the answer in the end is that it will be tuned for James.
> 2. What strategy will you take (custom Directory or custom 
> IndexReader/Writer, usage of Coprocessor or not...)?
I was thinking that a custom Directory was the way to go, but I soon 
realized that it's not as simple as it sounds and overriding the higher 
level classes of IndexReader and IndexWriter would be more 
appropriate.(as in article [5]) So by bypassing the Directory I would 
have to make use of Hbase Coprocessors. As far as I can think of it, a 
RegionObserver would be employed to gather frequently performed on data 
for the Lucene queries and Endpoints.

[1] https://github.com/akkumar/hbasene
[2] https://github.com/thkoch2001/lucehbase
[3] https://github.com/jasonrutherglen/HBASE-SEARCH
[4] https://github.com/jasonrutherglen/LUCENE-FOR-HBASE
[5] http://www.infoq.com/articles/LuceneHbase

To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

View raw message