gora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alfonso Nishikawa <alfonso.nishik...@gmail.com>
Subject Re: Changes to GORA-174 tests
Date Tue, 07 May 2013 23:35:56 GMT
Hi, Renato.

Thanks for your answer.

About "content":["null","bytes"], I was reasoning from implementing only
["null","string"], but if implements ["null","type"] as in Cassandra
module, all is fine. *But* the title of the issue must be changed :) So
accumulo, dynamodb, etc, fulfil ["null","type"]. And this get's us to tell
what is "type": records included or excluded?

About "metadata" in Nutch's WebPage, at this moment is mandatory, but in
NUTCH-1477 [1] was declared as optional. This is related with the question
just behind this line.

With ["null","type"] including records, the nesting and recursiveness
arrive.

So the question is: where to cut?

-in order of completeness-

1.- Only ["null","string"] in first level of the record (no nesting)?
2.- Full ["null","string"]?
3.- ["null","type"] except records?
4.- ["null","type"] and only 1 level nested records? (if records optionals,
nutch will need at least this)
5.- full ["null","type"] ?

Lewis told about ["null","string"], I guess "2.- full [null,string]".
What you told seems like "3.- [null,type] except records".
I always (wrongly) thought about "5.- full [null,type]".

I proposed the modifications to tests clases for "2.- full [null,string]",
but I think you would like tests modifications for "3.- [null,type] except
records". There is no problem in making modifications for "3.-".

Am I wrong in my thoughts?

Thanks!

Regards,

Alfonso Nishikawa

[1] - https://issues.apache.org/jira/browse/NUTCH-1477



2013/5/7 Renato MarroquĂ­n Mogrovejo <renatoj.marroquin@gmail.com>

> Hi Alfonso,
>
> First of all, thanks for pushing this issue!
>
>
> 2013/5/7 Alfonso Nishikawa <alfonso.nishikawa@gmail.com>:
> > Hi all,
> >
> > In order to accomplish GORA-174 ([0] GORA compiler does not handle
> > ["string", "null"] unions in the AVRO schema), it has been noticed by
> Lewis
> > that we ("I" specially ;) should stick to the requirements of the issue.
> > With no doubt this is true!
> >
> > I would want to open a short (short short!) debate about that
> specification
> > because I fee reluctant until an acknowledge (and Lewis suggested to ask
> to
> > all). Here is Nutch's WebPage schema as example:
> >
> > {
> >   "type": "record",
> >   "name": "WebPage",
> >   "namespace": "org.apache.gora.examples.generated",
> >   "fields" : [
> >     {"name": "url", "type": "string"},
> >     {"name": "content", "type": ["null","bytes"]},
> >     {"name": "parsedContent", "type": {"type":"array", "items":
> "string"}},
> >     {"name": "outlinks", "type": {"type":"map", "values":"string"}},
> >     {"name": "metadata", "type": {
> >       "name": "Metadata",
> >       "type": "record",
> >       "namespace": "org.apache.gora.examples.generated",
> >       "fields": [
> >         {"name": "version", "type": "int"},
> >         {"name": "data", "type": {"type": "map", "values": "string"}}
> >       ]
> >     }}
> >   ]
> > }
> >
> > At this moment I saw that in the original issue NUTCH-1477 [1] the
> problem
> > was about a ["null","bytes"], so I think we must not stick to solving
> only
> > ["null","string"].
>
> I thought we were solving single-type union types. So there shouldn't
> be a difference in persisting ["null","bytes"] or ["null","string"] as
> they are both single-type unions. In Gora-Cassandra, we serialize
> everything into bytes, and then depending on the schema we retrieved
> as required. We don't need metadata at this point because the value
> will be null, or whatever else.
>
> > In the schema shown here will happen that "metadata" is mandatory and
> > GORA-174 does not talk about optional records. Maybe we should fix that
> too.
>
> Sorry but why is the metadata field required? Is it because of Nutch
> or anything implicit in Avro?
>
> > Another more thing: ["null","string"] requirement implies that nested
> > records must handle it too. In the example above, "Metadata : data"
> should
> > allow a map of ["null","string"], and *lets suppoose "Metadata : version"
> > was String*. allow "Metadata : version of type ["null","string"].
>
> This is true, we should test if our current approaches solve this as
> well, and if not, then they would be incomplete. We will have to go
> over that again in Gora-Cassandra ):
>
> > If this is not desired, will have to redefine the issue requisites. For
> > example something like: "allow [null,String] on topmost records fields".
> >
> > ===============
> > Taking ONLY GORA-174 title: ["null","string"] I will have to make this
> > modifications:
> >
> > - Modify Nutch's webpage.avsc. "Content" will have to be mandatory :(
>
> Why is this? I mean making "content" mandatory
>
> > - Modify tests. Specifically testGetNested() to check nested
> > ["null","strings"]. I think Cassandra module does will not pass this
> test.
>
> Yeah mate, this is truth. I think if we are "supporting" single-type
> unions, then nested records for this feature should be supported as
> well.
>
> > ===============
> >
> > Lewis told about creating other issues for nested and mutitype-unions.
> It's
> > not my view, but I agree the common decision :)
> >
> > Opinions?
>
> I think it is better to create another issue for the nested issues as
> well, so in that way we can traceback changes more easily and make
> patches more digestible for people. Maybe we should just relate them
> within JIRA to know that those issues are actually related or maybe
> marking them as one depends on the other one.
>
> > Thanks at least for reading and getting to this line! :)
>
> Thank you for taking the time to write this! (:
>
>
> Renato M.
>
> > Regards,
> >
> > Alfonso Nishikwa
> >
> > [0] - https://issues.apache.org/jira/browse/GORA-174
> > [1] - https://issues.apache.org/jira/browse/NUTCH-1477
> >
> > --
> > "Drinking bloody marys all night will make you feel like a corpse in the
> > morning."
>



-- 
"Drinking bloody marys all night will make you feel like a corpse in the
morning."

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message