gora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alfonso Nishikawa <alfonso.nishik...@gmail.com>
Subject Re: Gora-174 in Gora-Cassandra
Date Wed, 06 Feb 2013 23:19:41 GMT
Hi Renato,

I saw in the code that Cassandra has its own serializers. Can you give us a
small summary about how does it works and what affects before your
modifications? This will help understanding your aproaches.

Does Cassandra have some penalties for the new column? In HBase that
approach is not necessary since the union-index gets serialized (by Avro)
and stored before the proper data (I know you know that :) just
remembering).

About generating classes, there's no need to modify the compiler (check if
you really need to modify it). Taking into account that an union can't have
2 same types (avro specs):
- When you are writing, you can implement the approach of avro show in
GenericData#resolveUnion():333 [0] (avro 1.3.3) called from [1], where
iterates on union types until matches the type of the data being written.
- When reading, you know the index. The aproach of Avro is in [2].

I suggest not modifying (if possible) because for HBase it gets a
duplicated state, where one will be ignored and becomes noise in the
structures.
My oppinion, of course :)

Thanks for all!!

Best regards,

Alfonso Nishikawa

[0] -
http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/avro/1.3.3/org/apache/avro/generic/GenericData.java?av=f#333
[1] - GenericDatumWriter#write():59 -
http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/avro/1.3.3/org/apache/avro/generic/GenericDatumWriter.java?av=f#59
[2] - GenericDatumReader#read():84 -
http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/avro/1.3.3/org/apache/avro/generic/GenericDatumReader.java?av=f#77


2013/2/6 Renato MarroquĂ­n Mogrovejo <renatoj.marroquin@gmail.com>

> Hi all,
>
> This is a really long overdue email. Finally I got the time to get
> around to this while I am on holidays (:
>
> I've made some changes to the Gora-Cassandra to support AvroUnion data
> types even though Cassandra doesn't rely on Avro for serializing data.
>  So what it has been done is a workaround to save specialized data
> types e.g. UNIONS. I faced the same problems and doubts that Alfonso
> described, and Alfonso, your post was very illustrative mate ;)
>
> I will just explain the general approach so the changes can be
> understood and the changes themselves can be found inside the code, or
> reply to this email to talk about it.
>
> ** For storing Union data **
> We are creating a new column only on at the moment in which we are
> flushing the data into the data store. This generated column will
> store the index of the schema used within the Union data type.
>
> ** For retrieving Union data **
> Retrieving the data directly from Cassandra, Gora can make it by
> itself. The problem here was to determine which serializer to use
> while getting this data back. So the first thing to do is to get the
> value stored within the generated column, and use that value to select
> the appropriate serializer. After that is just using what Gora has in
> it.
>
> ** For generating classes **
> I am not particularly happy with the changes I've made here. I changed
> GoraCompiler directly to create the extra field to store the selected
> schema of the Union data type. I tried to only add a new field to the
> schema before compiling and then let the compiler work but I kept on
> getting a lock exception from Avro which didn't let me get through
> this change as I wanted. If anybody could help me out on how to do it,
> then  give me a shout! :)
>
> I didn't know where to upload this patch or to Gora-174 because it
> addresses an issues caused by it, or to create a new issue to handle
> the Avro Union per data store.
> Thanks for reading until the end!
>
>
> Renato M.
>



-- 
"Drinking bloody marys all night will make you feel like a corpse in the
morning."

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message