james-server-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ioan Eugen Stan <stan.ieu...@gmail.com>
Subject Re: GSoC: Avro Serialization over HBase
Date Tue, 12 Jun 2012 13:17:59 GMT
2012/6/12 Eric Charles <eric@apache.org>:
> True.
> What do you intend to store in Avro format (these bytes being retrieved by
> any means on the RPC side)?
> Thx, Eric
>

Well, if we stay true to the article: the info about Terms and Fields
[1] (see the end).  I'm hoping Mihai can get a version up and running
by mid term and see what we can improve after this (co-processors,
etc). It should be a general enough implementation to be used outside.
We won't get close to elastic search/plain lucene performance results,
dough.

We'll see.

[1] http://www.infoq.com/articles/LuceneHbase

> On 06/12/2012 02:14 PM, Ioan Eugen Stan wrote:
>>
>> Hi,
>>
>>  From what I know Avro deprecation is for RPC communication. The
>> Put/Delete/ etc operations are serialized with Avro instead of the
>> usual Writables. Regardless of what serialization the RPC sub-system
>> uses, the data stored by the operations (Put/Get/Delete) is viewed as
>> byte array. If we store Avro objects as binary blobs in HBase then we
>> have no issues.
>>
>> Cheers,
>>
>> 2012/6/12 Mihai Soloi<mihai.soloi@gmail.com>:
>>>
>>> On 12.06.2012 11:30, Eric Charles wrote:
>>>>
>>>>
>>>> Hi Mihai,
>>>>
>>>> Glad to hear your exams are over (I hope they went fine) :)
>>>
>>>
>>> Hi Eric,
>>>
>>> Thanks, they went very well, I got high marks.
>>>
>>>>
>>>> As Ioan said, Avro serialization HBase will be deprecated in favor of
>>>> Protobuf (if I understand well...).
>>>
>>>
>>>
>>> I think Avro could be changed rather easily with Protobuf as they're both
>>> doing basically the same thing, only that Avro uses JSON schemas and can
>>> be
>>> used with any other language, which is of no of value to the project.
>>>
>>>>
>>>> I also like Avro because it gives you serialization&  storage format
in
>>>>
>>>> one box, but is this what we want? The key point here is more an
>>>> effective
>>>> access to the persisted data.
>>>
>>>
>>>
>>> If the data is passed through Avro we'll have it serialized and
>>> deserialization is basically handled by Avro, but we'll always have to
>>> interact with the schemas. In Protobuf we have the objects compiled into
>>> our
>>> classes, from what i gather it's mostly usefull for RPC, but Avro also
>>> has
>>> the protocol in which by using the avro-maven-plugin you can generate you
>>> own classes with which to interact. I can't say I'm an expert in either
>>> but
>>> I fancy Avro.
>>>
>>>>
>>>>
>>>> There has been a few tentatives so far to marry HBase and Lucene (see
>>>> [1],
>>>> [2], [3] and [4] for example, see also [5] for a more recent article).
>>>>
>>> Thank you for the github links, i will look thouroughly through the
>>> projects. I was already aware of Basene and Solandra(former Lucandra),
>>> they
>>> have simillar aproaches.
>>>
>>>> The questions I am wondering:
>>>>
>>>> 1. Will you focus on a 'generic' solution (reusable outside James), or
>>>> on
>>>> a very specific one tuned/optimized only for James mailbox needs?
>>>
>>>
>>> I was thinking of writing generic code so that maybe it could be used
>>> outside of James but the data format would be specific to James mailbox
>>> needs, so the answer in the end is that it will be tuned for James.
>>>
>>>>
>>>> 2. What strategy will you take (custom Directory or custom
>>>> IndexReader/Writer, usage of Coprocessor or not...)?
>>>
>>>
>>> I was thinking that a custom Directory was the way to go, but I soon
>>> realized that it's not as simple as it sounds and overriding the higher
>>> level classes of IndexReader and IndexWriter would be more
>>> appropriate.(as
>>> in article [5]) So by bypassing the Directory I would have to make use of
>>> Hbase Coprocessors. As far as I can think of it, a RegionObserver would
>>> be
>>> employed to gather frequently performed on data for the Lucene queries
>>> and
>>> Endpoints.
>>>
>>>
>>>
>>> [1] https://github.com/akkumar/hbasene
>>> [2] https://github.com/thkoch2001/lucehbase
>>> [3] https://github.com/jasonrutherglen/HBASE-SEARCH
>>> [4] https://github.com/jasonrutherglen/LUCENE-FOR-HBASE
>>> [5] http://www.infoq.com/articles/LuceneHbase
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
>>> For additional commands, e-mail: server-dev-help@james.apache.org
>>>
>>
>>
>>
>
> --
> eric | http://about.echarles.net | @echarles
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
> For additional commands, e-mail: server-dev-help@james.apache.org
>



-- 
Ioan Eugen Stan / http://axemblr.com / Tools for Clouds

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Mime
View raw message