spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcelo Vanzin <van...@cloudera.com>
Subject Re: RFC: Remove "HBaseTest" from examples?
Date Tue, 19 Apr 2016 18:10:46 GMT
On Tue, Apr 19, 2016 at 11:07 AM, Ted Yu <yuzhihong@gmail.com> wrote:

> The same question can be asked w.r.t. examples for other projects, such as flume
> and kafka.
>

The main difference being that flume and kafka integration are part of
Spark itself. HBase integration is not.



> On Tue, Apr 19, 2016 at 11:01 AM, Marcin Tustin <mtustin@handybook.com>
> wrote:
>
>> Let's posit that the spark example is much better than what is available
>> in HBase. Why is that a reason to keep it within Spark?
>>
>> On Tue, Apr 19, 2016 at 1:59 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>>
>>> bq. HBase's current support, even if there are bugs or things that
>>> still need to be done, is much better than the Spark example
>>>
>>> In my opinion, a simple example that works is better than a buggy
>>> package.
>>>
>>> I hope before long the hbase-spark module in HBase can arrive at a state
>>> which we can advertise as mature - but we're not there yet.
>>>
>>> On Tue, Apr 19, 2016 at 10:50 AM, Marcelo Vanzin <vanzin@cloudera.com>
>>> wrote:
>>>
>>>> You're completely missing my point. I'm saying that HBase's current
>>>> support, even if there are bugs or things that still need to be done,
>>>> is much better than the Spark example, which is basically a call to
>>>> "SparkContext.hadoopRDD".
>>>>
>>>> Spark's example is not helpful in learning how to build an HBase
>>>> application on Spark, and clashes head on with how the HBase
>>>> developers think it should be done. That, and because it brings too
>>>> many dependencies for something that is not really useful, is why I'm
>>>> suggesting removing it.
>>>>
>>>>
>>>> On Tue, Apr 19, 2016 at 10:47 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>>>> > There is an Open JIRA for fixing the documentation: HBASE-15473
>>>> >
>>>> > I would say the refguide link you provided should not be considered
as
>>>> > complete.
>>>> >
>>>> > Note it is marked as Blocker by Sean B.
>>>> >
>>>> > On Tue, Apr 19, 2016 at 10:43 AM, Marcelo Vanzin <vanzin@cloudera.com
>>>> >
>>>> > wrote:
>>>> >>
>>>> >> You're entitled to your own opinions.
>>>> >>
>>>> >> While you're at it, here's some much better documentation, from
the
>>>> >> HBase project themselves, than what the Spark example provides:
>>>> >> http://hbase.apache.org/book.html#spark
>>>> >>
>>>> >> On Tue, Apr 19, 2016 at 10:41 AM, Ted Yu <yuzhihong@gmail.com>
>>>> wrote:
>>>> >> > bq. it's actually in use right now in spite of not being in
any
>>>> upstream
>>>> >> > HBase release
>>>> >> >
>>>> >> > If it is not in upstream, then it is not relevant for discussion
on
>>>> >> > Apache
>>>> >> > mailing list.
>>>> >> >
>>>> >> > On Tue, Apr 19, 2016 at 10:38 AM, Marcelo Vanzin <
>>>> vanzin@cloudera.com>
>>>> >> > wrote:
>>>> >> >>
>>>> >> >> Alright, if you prefer, I'll say "it's actually in use
right now
>>>> in
>>>> >> >> spite of not being in any upstream HBase release", and
it's more
>>>> >> >> useful than a single example file in the Spark repo for
those who
>>>> >> >> really want to integrate with HBase.
>>>> >> >>
>>>> >> >> Spark's example is really very trivial (just uses one of
HBase's
>>>> input
>>>> >> >> formats), which makes it not very useful as a blueprint
for
>>>> developing
>>>> >> >> HBase apps with Spark.
>>>> >> >>
>>>> >> >> On Tue, Apr 19, 2016 at 10:28 AM, Ted Yu <yuzhihong@gmail.com>
>>>> wrote:
>>>> >> >> > bq. I wouldn't call it "incomplete".
>>>> >> >> >
>>>> >> >> > I would call it incomplete.
>>>> >> >> >
>>>> >> >> > Please see HBASE-15333 'Enhance the filter to handle
short,
>>>> integer,
>>>> >> >> > long,
>>>> >> >> > float and double' which is a bug fix.
>>>> >> >> >
>>>> >> >> > Please exclude presence of related of module in vendor
distro
>>>> from
>>>> >> >> > this
>>>> >> >> > discussion.
>>>> >> >> >
>>>> >> >> > Thanks
>>>> >> >> >
>>>> >> >> > On Tue, Apr 19, 2016 at 10:23 AM, Marcelo Vanzin
>>>> >> >> > <vanzin@cloudera.com>
>>>> >> >> > wrote:
>>>> >> >> >>
>>>> >> >> >> On Tue, Apr 19, 2016 at 10:20 AM, Ted Yu <yuzhihong@gmail.com>
>>>> >> >> >> wrote:
>>>> >> >> >> > I want to note that the hbase-spark module
in HBase is
>>>> incomplete.
>>>> >> >> >> > Zhan
>>>> >> >> >> > has
>>>> >> >> >> > several patches pending review.
>>>> >> >> >>
>>>> >> >> >> I wouldn't call it "incomplete". Lots of functionality
is
>>>> there,
>>>> >> >> >> which
>>>> >> >> >> doesn't mean new ones, or more efficient implementations
of
>>>> existing
>>>> >> >> >> ones, can't be added.
>>>> >> >> >>
>>>> >> >> >> > hbase-spark module is currently only in master
branch which
>>>> would
>>>> >> >> >> > be
>>>> >> >> >> > released as 2.0
>>>> >> >> >>
>>>> >> >> >> Just as a side note, it's part of CDH 5.7.0, not
that it
>>>> matters
>>>> >> >> >> much
>>>> >> >> >> for upstream HBase.
>>>> >> >> >>
>>>> >> >> >> --
>>>> >> >> >> Marcelo
>>>> >> >> >
>>>> >> >> >
>>>> >> >>
>>>> >> >>
>>>> >> >>
>>>> >> >> --
>>>> >> >> Marcelo
>>>> >> >
>>>> >> >
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> Marcelo
>>>> >
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Marcelo
>>>>
>>>
>>>
>>
>> Want to work at Handy? Check out our culture deck and open roles
>> <http://www.handy.com/careers>
>> Latest news <http://www.handy.com/press> at Handy
>> Handy just raised $50m
>> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/>
led
>> by Fidelity
>>
>>
>


-- 
Marcelo

Mime
View raw message