spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: RFC: Remove "HBaseTest" from examples?
Date Tue, 19 Apr 2016 18:21:11 GMT
Clarification: in my previous email, I was not talking
about spark-streaming-flume artifact or spark-streaming-kafka artifact.

I was talking about examples for these projects, such
as examples//src/main/python/streaming/flume_wordcount.py

On Tue, Apr 19, 2016 at 11:10 AM, Marcelo Vanzin <vanzin@cloudera.com>
wrote:

> On Tue, Apr 19, 2016 at 11:07 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>
>> The same question can be asked w.r.t. examples for other projects, such
>> as flume and kafka.
>>
>
> The main difference being that flume and kafka integration are part of
> Spark itself. HBase integration is not.
>
>
>
>> On Tue, Apr 19, 2016 at 11:01 AM, Marcin Tustin <mtustin@handybook.com>
>> wrote:
>>
>>> Let's posit that the spark example is much better than what is available
>>> in HBase. Why is that a reason to keep it within Spark?
>>>
>>> On Tue, Apr 19, 2016 at 1:59 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>>>
>>>> bq. HBase's current support, even if there are bugs or things that
>>>> still need to be done, is much better than the Spark example
>>>>
>>>> In my opinion, a simple example that works is better than a buggy
>>>> package.
>>>>
>>>> I hope before long the hbase-spark module in HBase can arrive at a
>>>> state which we can advertise as mature - but we're not there yet.
>>>>
>>>> On Tue, Apr 19, 2016 at 10:50 AM, Marcelo Vanzin <vanzin@cloudera.com>
>>>> wrote:
>>>>
>>>>> You're completely missing my point. I'm saying that HBase's current
>>>>> support, even if there are bugs or things that still need to be done,
>>>>> is much better than the Spark example, which is basically a call to
>>>>> "SparkContext.hadoopRDD".
>>>>>
>>>>> Spark's example is not helpful in learning how to build an HBase
>>>>> application on Spark, and clashes head on with how the HBase
>>>>> developers think it should be done. That, and because it brings too
>>>>> many dependencies for something that is not really useful, is why I'm
>>>>> suggesting removing it.
>>>>>
>>>>>
>>>>> On Tue, Apr 19, 2016 at 10:47 AM, Ted Yu <yuzhihong@gmail.com>
wrote:
>>>>> > There is an Open JIRA for fixing the documentation: HBASE-15473
>>>>> >
>>>>> > I would say the refguide link you provided should not be considered
>>>>> as
>>>>> > complete.
>>>>> >
>>>>> > Note it is marked as Blocker by Sean B.
>>>>> >
>>>>> > On Tue, Apr 19, 2016 at 10:43 AM, Marcelo Vanzin <
>>>>> vanzin@cloudera.com>
>>>>> > wrote:
>>>>> >>
>>>>> >> You're entitled to your own opinions.
>>>>> >>
>>>>> >> While you're at it, here's some much better documentation, from
the
>>>>> >> HBase project themselves, than what the Spark example provides:
>>>>> >> http://hbase.apache.org/book.html#spark
>>>>> >>
>>>>> >> On Tue, Apr 19, 2016 at 10:41 AM, Ted Yu <yuzhihong@gmail.com>
>>>>> wrote:
>>>>> >> > bq. it's actually in use right now in spite of not being
in any
>>>>> upstream
>>>>> >> > HBase release
>>>>> >> >
>>>>> >> > If it is not in upstream, then it is not relevant for discussion
>>>>> on
>>>>> >> > Apache
>>>>> >> > mailing list.
>>>>> >> >
>>>>> >> > On Tue, Apr 19, 2016 at 10:38 AM, Marcelo Vanzin <
>>>>> vanzin@cloudera.com>
>>>>> >> > wrote:
>>>>> >> >>
>>>>> >> >> Alright, if you prefer, I'll say "it's actually in
use right now
>>>>> in
>>>>> >> >> spite of not being in any upstream HBase release",
and it's more
>>>>> >> >> useful than a single example file in the Spark repo
for those who
>>>>> >> >> really want to integrate with HBase.
>>>>> >> >>
>>>>> >> >> Spark's example is really very trivial (just uses one
of HBase's
>>>>> input
>>>>> >> >> formats), which makes it not very useful as a blueprint
for
>>>>> developing
>>>>> >> >> HBase apps with Spark.
>>>>> >> >>
>>>>> >> >> On Tue, Apr 19, 2016 at 10:28 AM, Ted Yu <yuzhihong@gmail.com>
>>>>> wrote:
>>>>> >> >> > bq. I wouldn't call it "incomplete".
>>>>> >> >> >
>>>>> >> >> > I would call it incomplete.
>>>>> >> >> >
>>>>> >> >> > Please see HBASE-15333 'Enhance the filter to
handle short,
>>>>> integer,
>>>>> >> >> > long,
>>>>> >> >> > float and double' which is a bug fix.
>>>>> >> >> >
>>>>> >> >> > Please exclude presence of related of module in
vendor distro
>>>>> from
>>>>> >> >> > this
>>>>> >> >> > discussion.
>>>>> >> >> >
>>>>> >> >> > Thanks
>>>>> >> >> >
>>>>> >> >> > On Tue, Apr 19, 2016 at 10:23 AM, Marcelo Vanzin
>>>>> >> >> > <vanzin@cloudera.com>
>>>>> >> >> > wrote:
>>>>> >> >> >>
>>>>> >> >> >> On Tue, Apr 19, 2016 at 10:20 AM, Ted Yu <yuzhihong@gmail.com
>>>>> >
>>>>> >> >> >> wrote:
>>>>> >> >> >> > I want to note that the hbase-spark module
in HBase is
>>>>> incomplete.
>>>>> >> >> >> > Zhan
>>>>> >> >> >> > has
>>>>> >> >> >> > several patches pending review.
>>>>> >> >> >>
>>>>> >> >> >> I wouldn't call it "incomplete". Lots of functionality
is
>>>>> there,
>>>>> >> >> >> which
>>>>> >> >> >> doesn't mean new ones, or more efficient implementations
of
>>>>> existing
>>>>> >> >> >> ones, can't be added.
>>>>> >> >> >>
>>>>> >> >> >> > hbase-spark module is currently only
in master branch which
>>>>> would
>>>>> >> >> >> > be
>>>>> >> >> >> > released as 2.0
>>>>> >> >> >>
>>>>> >> >> >> Just as a side note, it's part of CDH 5.7.0,
not that it
>>>>> matters
>>>>> >> >> >> much
>>>>> >> >> >> for upstream HBase.
>>>>> >> >> >>
>>>>> >> >> >> --
>>>>> >> >> >> Marcelo
>>>>> >> >> >
>>>>> >> >> >
>>>>> >> >>
>>>>> >> >>
>>>>> >> >>
>>>>> >> >> --
>>>>> >> >> Marcelo
>>>>> >> >
>>>>> >> >
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >> --
>>>>> >> Marcelo
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Marcelo
>>>>>
>>>>
>>>>
>>>
>>> Want to work at Handy? Check out our culture deck and open roles
>>> <http://www.handy.com/careers>
>>> Latest news <http://www.handy.com/press> at Handy
>>> Handy just raised $50m
>>> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/>
led
>>> by Fidelity
>>>
>>>
>>
>
>
> --
> Marcelo
>

Mime
View raw message