flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Artem Tsikiridis (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-838) GSoC Summer Project: Implement full Hadoop Compatibility Layer for Stratosphere
Date Mon, 04 Aug 2014 06:23:12 GMT

    [ https://issues.apache.org/jira/browse/FLINK-838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14084352#comment-14084352
] 

Artem Tsikiridis commented on FLINK-838:
----------------------------------------

Hello,

The approach suggested in the above comment seems nice and should solve the problem. It is
also not hard to implement on the driver's side. However, I am not sure how would it be better
to add this API hook to handle a {{Tuple3}} in a grouping-partitioning. What do you think?

here is a report of what is going on:

1) I can't seem to get right the  custom classloader to replace any {{JobClient}} with a {{FlinkJobClient}}.
The thing is, that the classloader that loads the {{JobClient}} is not really the user's classloader
for Flink but it's parent. I can't stop this delegation process and handle my case. Do you
have any advices as I spent several days on this one (should have asked earlier) and I am
still a bit stuck?

2) I have implemented all of the latest comments for the PR (https://github.com/apache/incubator-flink/pull/37#discussion_r15390287).
The only thing I am unsure of is what to do when no number of slots has been set (can we really
assume this is an IDE run ?). Added 3 more test cases: a test job where the reducer has different
types, a map-only job without sorting (no reducers or combiners launched) and a test for {{MultipleInputs}}
(it is supported with our current code, as the driver only deals with the product of this
which is a {{DelegatingInputFormat}} - still an {{InputFormat}}.) I've also made a refactoring
of {{FlinkHadoopJobClient}} so that we don't have to repeat code in a {{HadoopJobOperation}}
(made a prototype). You can see it ASAP, as soon as we decide what should be done with the
slot number in the case of the IDE run (today?).

3) I'm trying to wrap with the other features of the {{JobClient}}, {{JobConf}} and support
as much as possible. It's finished. I must show you results in the next couple of days, as
we have limited time and you'll probably have comments.

4) I have finished support for sorting (custom {{Comparators}}) a while ago.

There are 2 weeks left, we should make them count! :)

> GSoC Summer Project: Implement full Hadoop Compatibility Layer for Stratosphere
> -------------------------------------------------------------------------------
>
>                 Key: FLINK-838
>                 URL: https://issues.apache.org/jira/browse/FLINK-838
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: GitHub Import
>              Labels: github-import
>             Fix For: pre-apache
>
>
> This is a meta issue for tracking @atsikiridis progress with implementing a full Hadoop
Compatibliltiy Layer for Stratosphere.
> Some documentation can be found in the Wiki: https://github.com/stratosphere/stratosphere/wiki/%5BGSoC-14%5D-A-Hadoop-abstraction-layer-for-Stratosphere-(Project-Map-and-Notes)
> As well as the project proposal: https://github.com/stratosphere/stratosphere/wiki/GSoC-2014-Project-Proposal-Draft-by-Artem-Tsikiridis
> Most importantly, there is the following **schedule**:
> *19 May - 27 June (Midterm)*
> 1) Work on the Hadoop tasks, their Context and the mapping of Hadoop's Configuration
to the one of Stratosphere. By successfully bridging the Hadoop tasks with Stratosphere, we
already cover the most basic Hadoop Jobs. This can be determined by running some popular Hadoop
examples on Stratosphere (e.g. WordCount, k-means, join) (4 - 5 weeks)
> 2) Understand how the running of these jobs works (e.g. command line interface) for the
wrapper. Implement how will the user run them. (1 - 2 weeks).
> *27 June - 11 August*
> 1) Continue wrapping more "advanced" Hadoop Interfaces (Comparators, Partitioners, Distributed
Cache etc.) There are quite a few interfaces and it will be a challenge to support all of
them. (5 full weeks)
> 2) Profiling of the application and optimizations (if applicable)
> *11 August - 18 August*
> Write documentation on code, write a README with care and add more unit-tests. (1 week)
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/issues/838
> Created by: [rmetzger|https://github.com/rmetzger]
> Labels: core, enhancement, parent-for-major-feature, 
> Milestone: Release 0.7 (unplanned)
> Created at: Tue May 20 10:11:34 CEST 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message