gora-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Furkan KAMACI <furkankam...@gmail.com>
Subject Re: GSoC Ideas
Date Thu, 15 Mar 2018 10:57:04 GMT
Hi Fellows,

I'll also mentor a project this year and I can help for the topics already
mentioned. I think that https://issues.apache.org/jira/browse/GORA-532 could
be another issue for GSoC. Also,
https://issues.apache.org/jira/browse/GORA-450 can be a warm-up issue for
any GSoC projects which can I collaborate.

Kind Regards,

On Thu, Mar 15, 2018 at 12:59 PM, Renato Marroquín Mogrovejo <
renatoj.marroquin@gmail.com> wrote:

> Hi Lewis,
> Thanks for pointing out [0]! I guess it makes sense, and there might be
> some performance to be gained when doing the transformation directly from
> Avro to Arrow.
> Yes, Lewis I totally agree with you in that having Gora to serialize all
> Hadoop metrics would be an awesome project. Is that a project for GSoC
> already?   Are you planning to mentor any projects?
> Also regarding this project integration topic, have you thought about
> proving Any23 a way to read/write xml, html, json objects through Gora? Do
> you think that would be an interesting project for the Any23 community?
> Best,
> Renato M.
> 2018-03-15 8:26 GMT+01:00 lewis john mcgibbney <lewismc@apache.org>:
>> I should also say, ALL of the projects below which I have named require
>> the Gora dependency to be upgraded.
>> Lewis
>> On Thu, Mar 15, 2018 at 12:24 AM, lewis john mcgibbney <
>> lewismc@apache.org> wrote:
>>> Hi Renato,
>>> On Wed, Mar 14, 2018 at 3:22 PM, Renato Marroquín Mogrovejo <
>>> renatoj.marroquin@gmail.com> wrote:
>>>> Hey guys,
>>>> There might not be an integration/convertors of Arrow to Avro (and/or
>>>> viceversa) because there are parquet readers that can take avro and once
>>>> stuff is in parquet, then arrow can be used directly.
>>> Yes there might not be. I actually raised this issue [0] a wee while ago
>>> on the Arrow list. At that time I was told, "...The use case you outline
>>> makes a lot of sense for Arrow to help out with. We don't yet have an AVRO
>>> <> Arrow converter written but it is something that would be great to
>>> have." So maybe that would be something to keep in mind.
>>> [0] https://s.apache.org/2GwS
>>>> Regarding if an integration of Parquet with Gora, I think it would be
>>>> interesting to make it easier for people to read and write parquet files
>>>> providing a higher level api as Gora provides. However, for you @Talat,
>>>> that knows Gora pretty well, maybe you could take another project that
>>>> helps Gora more. For example, fixing the integration with Nutch. There are
>>>> multiple loose ends in Nutch 2.x and Gora that we have neglected as a
>>>> community.
>>>> IMHO that should be GSOC project.
>>> ACK, other existing projects which consume Gora are (off the top of my
>>> head),
>>>    - Chukwa - https://s.apache.org/cW6a
>>>    - Giraph - https://github.com/apache/giraph/tree/trunk/giraph-gora
>>>    - Camel - https://camel.apache.org/gora.html
>>>    - Nutch 2.X - https://github.com/apache/nutch/tree/2.x
>>> An interesting idea I had where Gora could be implemented would be in
>>> Hadoop metrics
>>> https://hadoop.apache.org/docs/current/hadoop-project-dist/h
>>> adoop-common/Metrics.html
>>> This would provide provide a text book usage for Gora to store Hadoop
>>> metrics in some datastore which would then be exposed for query and
>>> analysis.
>>>> I can't mentored it because I do not have enough insights on this, but
>>>> @Lewis and @Talat you can probably tackle this as mentor and student. This
>>>> would be an awesome contribution to the project as there are quite a lot
>>>> people going over Nutch and trying to use it with Gora.
>>>> Just my 2c
>>> Understood Renato, no biggie. Thanks for your input. I know you are
>>> working with Parquet alot these days so your input is appreciated.
>>> Lewis
>> --
>> http://home.apache.org/~lewismc/
>> http://people.apache.org/keys/committer/lewismc

View raw message