gora-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Renato MarroquĂ­n Mogrovejo <renatoj.marroq...@gmail.com>
Subject Re: [DISCUSS] Abstracting away Hadoop/MapReduce as Data Processing Layer
Date Wed, 09 Jul 2014 09:45:10 GMT
2014-07-09 11:10 GMT+02:00 Henry Saputra <henry.saputra@gmail.com>:

> Internally, Apache Spark can use Hadoop input format for its
> distributed data structure (a.k.a RDD).
> So, I guess we could still join the cool kids with Spark via our input
> format implementation.

Cool Henry! I didn't know about we could use Hadoop input formats for
Spark's RDD :)

> However, I could think of other improvements that could be useful
> (apology to Lewis if I hijacked his discussion):
> 1. Pluggable serialization mechanism to allow other like Thrift or
> Protocol Buffer instead of just Avro.

Yes, we have been talking about this as well for quite some time. I think
we have two options in here: a) Changing the way we hold objects in memory
to make it not only Avro. b) Keeping the Avro objects for in-memory
processing and serializing using different formats (including
native/datastore format). I think both options should be doable at some
point as well.

> 2. Directly work with DAG frameworks like Spark or Flink (incubating)
> to provide client module to directly use Gora via their abstraction,
> i.e RDD for Spark and Dataset for Flink.

Yes! We have to continue integrating with other projects, specially with
popular projects which could give Gora more visibility in the open source
So what do you think is the "low hanging" fruit here Henry?I mean there is
a lot to do, but we should start putting things into our Roadmap so at
least we know what we have to do.

Renato M.

> - Henry
> On Mon, Jul 7, 2014 at 8:19 AM, Lewis John Mcgibbney
> <lewis.mcgibbney@gmail.com> wrote:
> > Hi Folks,
> > Many people know the way that things are going with regards to in-memory
> > computing being 'the' hot topic on the planet right now (outside of the
> > world cup).
> > We have made good strides in Gora to get it to where it is as a top level
> > project. It has also become aparent to me that something we embrace very
> > well is the notion of abstraction and flexability in the way we modules
> are
> > implemented via DataStore API.
> > One thing which is apparent to me though, is that we may be restricting
> the
> > project scope and capablities if we do not embrace new technologies
> within
> > our development model.
> > I am of course talking about embracing the Spark paradigm within Gora and
> > abstracting ourselves away from the traditional MapReduce Input/Output
> > Formats which we currently use.
> > A colleague of mine was at Spark Summit last week in San Francisco and
> > mentioned that there is ongoing work to move towards a connector-based
> > approach for IO so that different datastores can be used within Spark
> SQL.
> > The point I want to pose here is where can we take advantage of this in
> an
> > attempt to further grow the Gora community and improve the project?
> > Thanks in advance for any thoughts folks.
> > Lewis
> >
> >
> >
> > --
> > *Lewis*

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message