gora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Renato MarroquĂ­n Mogrovejo <renatoj.marroq...@gmail.com>
Subject Re: Updated GORA-174 HBase information - unions
Date Tue, 12 Feb 2013 16:31:06 GMT

2013/2/10 Alfonso Nishikawa <alfonso.nishikawa@gmail.com>:
> Hi Renato,
>> So what you are proposing is to store and extra index at the beginning
>> of the actual value? or does HBase do this automatically? What about
>> if bytes were being written? couldn't some type of corruption happen
>> and make this unusable?
> The extra byte at the beginning of the actual value is part of Avro :)
> Gora-hbase must adhere to avro specs, so that is really the union
> sourcecode update.
> In the case of bytes, first is encoded a 'long' with the length of the
> bytes, followed with the bytes data.
> I got all from Avro Specs at [2].

Thanks! I overlooked the binary encoding specification ;)
The problem with Cassandra is that not everything is written down as
bytes (well it probably is but deeper down in the code). Please look
at column types [1].
So what would you suggest to do in cases where non-appendable column
types are used e.g. BooleanType, UUIDType, and others? I mean in
columns storing integers or decimals, I think we could append a single
value to determine what type of serializer to use, but I dunno what to
do in those other cases.

>>> think now is better expressed.
>>> If no one think is wrong, I will implement solution-1 and solution-2(this
>>> means maybe quite work, so do we maintain it? -I vote yes).
>> So does your solution have two parts? or are they two separate
>> possible solutions?
> There are two potencial different problems (incompatibilities with
> legacy data), so we can choose to leave them behind both, only one, or
> none. Lewis voted for facing both (same as I), so I guess we will
> mainaint data compatibility until version 1.0.

This is a part I am not understanding very well. You guys are saying
that legacy data is a problem, but why is this a problem if we haven't
been supporting Avro Union in the past? This is a new feature, not an
upgrade. And for what I am understanding, the second issue was on
marking as deprecated the support for Union data types. But then
again, if we are able to support Union data types, this would be the
first time.
Am I understanding things correctly here? Lewis? Alfonso? anyone else?

>> You said on another email that HBase could persist Union data types
>> directly without having to modify it (did I get that right? or am I
>> confusing stuff? ) so implementing this would be just to tell HBase to
>> save the union data type but not actually writing this extra byte? I
>> wasn't able to find the avro documentation talking about this, could
>> you please point me to where this is?
> Sorry, surely my fault because I always express myself wrong. You need
> to write that index. Solution 1 [3] avoids writing that index but is
> an exception for only null-or-onetype unions.

Ok, I see. But what about unions with more than one type? shouldn't we
think in solving this once for all?
We also have to keep in mind that the same solution might not be
applicable to all data stores, but we should be able to provide the
same features across all the supported data stores.

>>> I had to restore my git server, but in this case not all went right, so now
>>> is up again at [1].
>> Thanks! and great work documenting this issue! (:

Renato M.

[1] http://www.datastax.com/docs/1.0/ddl/column_family#about-data-types-comparators-and-validators

>> Renato M.
> Thank you for your comments and questions! :)
> Best regards,
> Alfonso Nishikawa
> [2] - http://avro.apache.org/docs/current/spec.html#binary_encoding
> [3] - https://people.apache.org/~alfonsonishikawa/gora-174.html

View raw message