flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Pompermaier <pomperma...@okkam.it>
Subject Re: HBase 0.7.0 addon
Date Sun, 02 Nov 2014 07:41:23 GMT
See inline the answers

On Nov 1, 2014 8:19 PM, "Stephan Ewen" <sewen@apache.org> wrote:
>
> Hi Flavio!
>
> Here are a few comments:
>
>  - Concerning the count operator: I think we can hack this in a very
simple way. Would be good to spend a few thought cycles on keeping the API
consistent, though. Flink does not pull data back to the client as eagerly
as Spark, but leaves it in the cluster more. That has paid off in various
situations. Let me draft a proposal how to include such operations in the
next days. I think we can have this very soon.

Ok

>  - Concerning the Region Splitting: Can you elaborate a little bit on
that and give a few more details about the problem? In general, the input
splitting in Flink happens when the job is started and the splits are
dynamically assigned to the sources as the job runs. You can customize all
that behavior by overwriting the two methods "createInputSplits" and
"getInputSplitAssigner" in the input format.

I just wanted to known if and how regiom splitting is handled. Can you
explain me in detail how Flink and HBase works?what is not fully clear to
me is when computation is done by region servers and when data start flow
to a Flink worker (that in ky test job is only my pc) and how ro undertsand
better the important logged info to understand if my job is performing well

>  - Concerning the pull request: There are sometimes build stalls on
Travis that no one has encountered outside Travis so far. Not exactly sure
what causes them, but if that happens for one build and the others work, I
would consider the pull request passed.

It would be great to contribute. Just forgot to mention that one could
specify different Hbase version on compile time using
-Dhbase.version=0.98.xxxx
>
> Greetings,
> Stephan
>
>
>
> On Sat, Nov 1, 2014 at 2:03 PM, Flavio Pompermaier <pompermaier@okkam.it>
wrote:
>>
>> My pul;l request seems to build correctly right now, except a case
(PROFILE="-Dhadoop.profile=2 -Dhadoop.version=2.2.0") where Travis stops
the job during the tests saying:
>>
>> No output has been received in the last 10 minutes, this potentially
indicates a stalled build or something wrong with the build itself. The
build has been terminated
>>
>> Can someone help me finalizing this PR? I also removed some classes that
I think were obsolete right now (i.e. GenericTableOutputFormat,HBaseUtil
and HBaseDataSink).
>>
>>
>> On Fri, Oct 31, 2014 at 5:04 PM, Flavio Pompermaier <pompermaier@okkam.it>
wrote:
>>>
>>> The current implementation of HBase splitting policy cannot deal with
region splitting during the job execution.
>>> Do you think it is possible to overcome this issue?
>>>
>>> On Fri, Oct 31, 2014 at 2:22 PM, Flavio Pompermaier <
pompermaier@okkam.it> wrote:
>>>>
>>>> Is it far from being released this feature?
>>>>
>>>> On Fri, Oct 31, 2014 at 1:51 PM, Kostas Tzoumas <ktzoumas@apache.org>
wrote:
>>>>>
>>>>> I was wrong. This feature is actually coming up and tracked here:
https://issues.apache.org/jira/browse/FLINK-758
>>>>>
>>>>> On Fri, Oct 31, 2014 at 1:14 PM, Flavio Pompermaier <
pompermaier@okkam.it> wrote:
>>>>>>
>>>>>> For this I don't have time, we're working on upgrade HBase to 0.98
APIs (and it's already working :))
>>>>>> However we should discuss about how to manage properly the version
of hbase and its hadoop dependencies..
>>>>>>
>>>>>> Best,
>>>>>> Flavio
>>>>>>
>>>>>> On Fri, Oct 31, 2014 at 11:32 AM, Kostas Tzoumas <ktzoumas@apache.org>
wrote:
>>>>>>>
>>>>>>> Agreed 100%.
>>>>>>>
>>>>>>> I created a JIRA for this:
https://issues.apache.org/jira/browse/FLINK-1200
>>>>>>>
>>>>>>> Flavio, would you like to give it a go? Otherwise I will assign
it
to myself
>>>>>>>
>>>>>>> On Fri, Oct 31, 2014 at 10:12 AM, Flavio Pompermaier <
pompermaier@okkam.it> wrote:
>>>>>>>>
>>>>>>>> I think that a count operator is very useful for people wanting
to
run an HelloWorld with Flink,
>>>>>>>> it's always the first test I do (and with Spark that is very
easy..)
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Flavio
>>>>>>>>
>>>>>>>> On Fri, Oct 31, 2014 at 9:57 AM, Fabian Hueske <fhueske@apache.org>
wrote:
>>>>>>>>>
>>>>>>>>> Hi Flavio,
>>>>>>>>>
>>>>>>>>> right now, there is no dedicated count operator in the
API.
>>>>>>>>> You can do the work-around with appending a 1 and summing
it up
(see Wordcount example [1]).
>>>>>>>>> This is also what a dedicated count operator would internally
do.
>>>>>>>>>
>>>>>>>>> It would be awesome to get some contributions for the
HBase addon
:-)
>>>>>>>>>
>>>>>>>>> Best, Fabian
>>>>>>>>>
>>>>>>>>> [1]
http://flink.incubator.apache.org/docs/0.7-incubating/examples.html
>>>>>>>>>
>>>>>>>>> 2014-10-31 9:46 GMT+01:00 Flavio Pompermaier <pompermaier@okkam.it
>:
>>>>>>>>>>
>>>>>>>>>> We are trying to connect to HBase 0.98 so we'll probably
contribute to the HBase addon :)
>>>>>>>>>> Is there a count API for Dataset? What is the fastest
way to run
a count on a dataset?
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Flavio
>>>>>>>>>>
>>>>>>>>>> On Fri, Oct 31, 2014 at 6:19 AM, Robert Metzger <
rmetzger@apache.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Okay, I've deployed the missing artifacts to
maven central. it
will take some hours until they are synchronized.
>>>>>>>>>>> The example in the "flink-hbase" module is still
using the old
Java API.
>>>>>>>>>>> But you should be able to use the Hbase Input
format like this:
>>>>>>>>>>>         ExecutionEnvironment ee =
ExecutionEnvironment.getExecutionEnvironment();
>>>>>>>>>>>         DataSet<Record> t = ee.createInput(new
MyTableInputFormat());
>>>>>>>>>>>
>>>>>>>>>>> I think the Flink Hbase module is not very well-tested,
so its
likely that you'll find issues while using it.
>>>>>>>>>>>
>>>>>>>>>>> The only documentation on logging we have is
this one:
http://flink.incubator.apache.org/docs/0.7-incubating/internal_logging.html
>>>>>>>>>>>
>>>>>>>>>>> Are you only seeing the log messages from Flink
or no messages
at all?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Oct 30, 2014 at 4:10 PM, Flavio Pompermaier
<
pompermaier@okkam.it> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Ok thanks!I was trying to run a mapreduce
flink job using an
hbase dataset but I wasn't able to make it run it locally. The one in the
addons just specify a plan but it does not say how to test it.
>>>>>>>>>>>> Moreover I tried to put a log4j.properties
in the classpath to
debug what's going on but I can't see any debug info. Do you have any
hook/guide?
>>>>>>>>>>>>
>>>>>>>>>>>> On Oct 30, 2014 11:58 PM, "Robert Metzger"
<rmetzger@apache.org>
wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> No, there is no reason for that. It actually
seems like
something went wrong while releasing Flink 0.7.0. I'll deploy the missing
artifacts.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Oct 30, 2014 at 9:26 AM, Flavio
Pompermaier <
pompermaier@okkam.it> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi to all,
>>>>>>>>>>>>>> is there a reason why the 0.7.0 hbase
addons is not deployed
on maven central?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks in advance,
>>>>>>>>>>>>>> Flavio
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>
>

Mime
View raw message