flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From nirvanesque <...@git.apache.org>
Subject [GitHub] incubator-flink pull request: hadoopcompatibility: Implementations...
Date Wed, 27 Aug 2014 13:30:33 GMT
Github user nirvanesque commented on the pull request:

    Hello Artem and mentors,
    First of all nice greetings from INRIA, France.
    Hope you had an enjoyable experience in GSOC!
    Thanks to Robert (rmetzger) for forwarding me here ...
    At INRIA, we are starting to adopt Stratosphere / Flink.
    The top-level goal is to enhance performance in User Defined Functions (UDFs) with long
workflows using multiple M-R, by using the larger set of Second Order Functions (SOFs) in
Stratosphere / Flink.
    We will demonstrate this improvement by implementing some Use Cases for business purposes.
    For this purpose, we have chosen some customer analysis Use Cases using weblogs and related
data, for 2 companies (who appeared interested to try using Stratosphere / Flink )
    - a mobile phone app developer: http://www.tribeflame.com
    - an anti-virus & Internet security software company: www.f-secure.com
    I will be happy to share with you these Use Cases, if you are interested. Just ask me
    At present, we are typically in the profiles of Alice-Bob-Sam, as described in [Artem's
GSoC proposal](https://github.com/stratosphere/stratosphere/wiki/GSoC-2014-Project-Proposal-Draft-by-Artem-Tsikiridis).
    Hadoop seems to be the starting square for our Stratosphere / Flink journey.
    Same is the situation with developers in the above 2 companies :-)
    We have installed and run some example programmes from Flink / Stratosphere (versions
0.5.2 and 0.6). We use a cluster (the grid5000 for our Hadoop & Stratosphere installations)
    We have some good understanding of Hadoop and its use in Streaming and Pipes in conjunction
with scripting languages (Python & R specifically)
    In the first phase, we would like to run some "Hadoop-like" jobs (mainly multiple M-R
workflows) on Stratosphere, preferably with extensive Java or Scala programming.
    I refer to your [GSoC project map](https://github.com/stratosphere/stratosphere/wiki/%5BGSoC-14%5D-A-Hadoop-abstraction-layer-for-Stratosphere-%28Project-Map-and-Notes%29)
which seems very interesting.
    If we could have a Hadoop abstraction as you have mentioned, that would be ideal for our
first phase.
    In later phases, when we implement complex join and group operations, we would dive deeper
into Stratosphere / Flink Java or Scala APIs
    Hence, I would like to know, what is the current status in this direction?
    What has been implemented already? In which version onwards? How to try them?
    What is yet to be implemented? When - which versions?
    You may also like to see [my discussion with Robert on this page](http://flink.incubator.apache.org/docs/0.6-incubating/cli.html#comment-1558297261).
    I am still mining into different discussions - here as well as on JIRA.
    Please do refer me to the relevant links, JIRA tickets, etc if that saves your time in
re-typing large replies.
    It will help us to catch up fast with the train of collective thinking in the Stratosphere
/ Flink roadmap, and eventually contribute to the project.
    Thanks in advance,
    PS : Apologies for using names / rechristened names (e.g. Flink / Stratosphere) as I am
not sure, which name to use currently.

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.

View raw message