gora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Renato MarroquĂ­n Mogrovejo <renatoj.marroq...@gmail.com>
Subject Re: Changes to GORA-174 tests
Date Tue, 07 May 2013 23:18:33 GMT
Hi Alfonso,

First of all, thanks for pushing this issue!

2013/5/7 Alfonso Nishikawa <alfonso.nishikawa@gmail.com>:
> Hi all,
> In order to accomplish GORA-174 ([0] GORA compiler does not handle
> ["string", "null"] unions in the AVRO schema), it has been noticed by Lewis
> that we ("I" specially ;) should stick to the requirements of the issue.
> With no doubt this is true!
> I would want to open a short (short short!) debate about that specification
> because I fee reluctant until an acknowledge (and Lewis suggested to ask to
> all). Here is Nutch's WebPage schema as example:
> {
>   "type": "record",
>   "name": "WebPage",
>   "namespace": "org.apache.gora.examples.generated",
>   "fields" : [
>     {"name": "url", "type": "string"},
>     {"name": "content", "type": ["null","bytes"]},
>     {"name": "parsedContent", "type": {"type":"array", "items": "string"}},
>     {"name": "outlinks", "type": {"type":"map", "values":"string"}},
>     {"name": "metadata", "type": {
>       "name": "Metadata",
>       "type": "record",
>       "namespace": "org.apache.gora.examples.generated",
>       "fields": [
>         {"name": "version", "type": "int"},
>         {"name": "data", "type": {"type": "map", "values": "string"}}
>       ]
>     }}
>   ]
> }
> At this moment I saw that in the original issue NUTCH-1477 [1] the problem
> was about a ["null","bytes"], so I think we must not stick to solving only
> ["null","string"].

I thought we were solving single-type union types. So there shouldn't
be a difference in persisting ["null","bytes"] or ["null","string"] as
they are both single-type unions. In Gora-Cassandra, we serialize
everything into bytes, and then depending on the schema we retrieved
as required. We don't need metadata at this point because the value
will be null, or whatever else.

> In the schema shown here will happen that "metadata" is mandatory and
> GORA-174 does not talk about optional records. Maybe we should fix that too.

Sorry but why is the metadata field required? Is it because of Nutch
or anything implicit in Avro?

> Another more thing: ["null","string"] requirement implies that nested
> records must handle it too. In the example above, "Metadata : data" should
> allow a map of ["null","string"], and *lets suppoose "Metadata : version"
> was String*. allow "Metadata : version of type ["null","string"].

This is true, we should test if our current approaches solve this as
well, and if not, then they would be incomplete. We will have to go
over that again in Gora-Cassandra ):

> If this is not desired, will have to redefine the issue requisites. For
> example something like: "allow [null,String] on topmost records fields".
> ===============
> Taking ONLY GORA-174 title: ["null","string"] I will have to make this
> modifications:
> - Modify Nutch's webpage.avsc. "Content" will have to be mandatory :(

Why is this? I mean making "content" mandatory

> - Modify tests. Specifically testGetNested() to check nested
> ["null","strings"]. I think Cassandra module does will not pass this test.

Yeah mate, this is truth. I think if we are "supporting" single-type
unions, then nested records for this feature should be supported as

> ===============
> Lewis told about creating other issues for nested and mutitype-unions. It's
> not my view, but I agree the common decision :)
> Opinions?

I think it is better to create another issue for the nested issues as
well, so in that way we can traceback changes more easily and make
patches more digestible for people. Maybe we should just relate them
within JIRA to know that those issues are actually related or maybe
marking them as one depends on the other one.

> Thanks at least for reading and getting to this line! :)

Thank you for taking the time to write this! (:

Renato M.

> Regards,
> Alfonso Nishikwa
> [0] - https://issues.apache.org/jira/browse/GORA-174
> [1] - https://issues.apache.org/jira/browse/NUTCH-1477
> --
> "Drinking bloody marys all night will make you feel like a corpse in the
> morning."

View raw message