gora-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lewis john mcgibbney <lewi...@apache.org>
Subject Re: GSoC Ideas
Date Fri, 16 Mar 2018 00:48:46 GMT
Hi Talat,
In all honesty I don't have the same time I used to, to look into this.
I have been experimenting using Arrow with multi-dimensional array-based
data but nothing else.
I would therefore be learning probably as much as you if this project was
to go ahead.

On Thu, Mar 15, 2018 at 3:46 PM, Talat Uyarer <talat@uyarer.com> wrote:

> @Lewis I found a PR[0] on Arrow Git repo. I guess they stuck with avro-c
> library. Do you know do they need implement same thing for all languages
> which are supported by them or they just need to implement a wrapper ?
> If we can use Arrow for our internal serialization, Gora will be super
> fast with zero copy support. :)
> [0] https://github.com/apache/arrow/pull/1026
> My 2 cent
> On Thu, Mar 15, 2018 at 12:24 AM, lewis john mcgibbney <lewismc@apache.org
> > wrote:
>> Hi Renato,
>> On Wed, Mar 14, 2018 at 3:22 PM, Renato MarroquĂ­n Mogrovejo <
>> renatoj.marroquin@gmail.com> wrote:
>>> Hey guys,
>>> There might not be an integration/convertors of Arrow to Avro (and/or
>>> viceversa) because there are parquet readers that can take avro and once
>>> stuff is in parquet, then arrow can be used directly.
>> Yes there might not be. I actually raised this issue [0] a wee while ago
>> on the Arrow list. At that time I was told, "...The use case you outline
>> makes a lot of sense for Arrow to help out with. We don't yet have an AVRO
>> <> Arrow converter written but it is something that would be great to
>> have." So maybe that would be something to keep in mind.
>> [0] https://s.apache.org/2GwS
>>> Regarding if an integration of Parquet with Gora, I think it would be
>>> interesting to make it easier for people to read and write parquet files by
>>> providing a higher level api as Gora provides. However, for you @Talat,
>>> that knows Gora pretty well, maybe you could take another project that
>>> helps Gora more. For example, fixing the integration with Nutch. There are
>>> multiple loose ends in Nutch 2.x and Gora that we have neglected as a
>>> community.
>>> IMHO that should be GSOC project.
>> ACK, other existing projects which consume Gora are (off the top of my
>> head),
>>    - Chukwa - https://s.apache.org/cW6a
>>    - Giraph - https://github.com/apache/giraph/tree/trunk/giraph-gora
>>    - Camel - https://camel.apache.org/gora.html
>>    - Nutch 2.X - https://github.com/apache/nutch/tree/2.x
>> An interesting idea I had where Gora could be implemented would be in
>> Hadoop metrics
>> https://hadoop.apache.org/docs/current/hadoop-project-dist/
>> hadoop-common/Metrics.html
>> This would provide provide a text book usage for Gora to store Hadoop
>> metrics in some datastore which would then be exposed for query and
>> analysis.
>>> I can't mentored it because I do not have enough insights on this, but
>>> @Lewis and @Talat you can probably tackle this as mentor and student. This
>>> would be an awesome contribution to the project as there are quite a lot of
>>> people going over Nutch and trying to use it with Gora.
>>> Just my 2c
>> Understood Renato, no biggie. Thanks for your input. I know you are
>> working with Parquet alot these days so your input is appreciated.
>> Lewis
> --
> Talat UYARER
> Websitesi: http://talat.uyarer.com
> Twitter: http://twitter.com/talatuyarer
> Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304


View raw message