spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcelo Vanzin <van...@cloudera.com>
Subject Re: RFC: Remove "HBaseTest" from examples?
Date Tue, 19 Apr 2016 18:26:00 GMT
On Tue, Apr 19, 2016 at 11:21 AM, Ted Yu <yuzhihong@gmail.com> wrote:

> Clarification: in my previous email, I was not talking
> about spark-streaming-flume artifact or spark-streaming-kafka artifact.
> I was talking about examples for these projects, such
> as examples//src/main/python/streaming/flume_wordcount.py
>

I understand. And those examples are showing how to use code that is part
of Spark. HBaseTest just shows how to use a generic Spark API that can both
be used to talk to HBase or to anything else that has an InputFormat, so
it's much less useful as an example.

I'd put CassandraTest in that same category, although that particular
example at least shows more functionality than the HBase one.



> On Tue, Apr 19, 2016 at 11:10 AM, Marcelo Vanzin <vanzin@cloudera.com>
> wrote:
>
>> On Tue, Apr 19, 2016 at 11:07 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>>
>>> The same question can be asked w.r.t. examples for other projects, such
>>> as flume and kafka.
>>>
>>
>> The main difference being that flume and kafka integration are part of
>> Spark itself. HBase integration is not.
>>
>>
>>
>>> On Tue, Apr 19, 2016 at 11:01 AM, Marcin Tustin <mtustin@handybook.com>
>>> wrote:
>>>
>>>> Let's posit that the spark example is much better than what is
>>>> available in HBase. Why is that a reason to keep it within Spark?
>>>>
>>>> On Tue, Apr 19, 2016 at 1:59 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>>>>
>>>>> bq. HBase's current support, even if there are bugs or things that
>>>>> still need to be done, is much better than the Spark example
>>>>>
>>>>> In my opinion, a simple example that works is better than a buggy
>>>>> package.
>>>>>
>>>>> I hope before long the hbase-spark module in HBase can arrive at a
>>>>> state which we can advertise as mature - but we're not there yet.
>>>>>
>>>>> On Tue, Apr 19, 2016 at 10:50 AM, Marcelo Vanzin <vanzin@cloudera.com>
>>>>> wrote:
>>>>>
>>>>>> You're completely missing my point. I'm saying that HBase's current
>>>>>> support, even if there are bugs or things that still need to be done,
>>>>>> is much better than the Spark example, which is basically a call
to
>>>>>> "SparkContext.hadoopRDD".
>>>>>>
>>>>>> Spark's example is not helpful in learning how to build an HBase
>>>>>> application on Spark, and clashes head on with how the HBase
>>>>>> developers think it should be done. That, and because it brings too
>>>>>> many dependencies for something that is not really useful, is why
I'm
>>>>>> suggesting removing it.
>>>>>>
>>>>>>
>>>>>> On Tue, Apr 19, 2016 at 10:47 AM, Ted Yu <yuzhihong@gmail.com>
wrote:
>>>>>> > There is an Open JIRA for fixing the documentation: HBASE-15473
>>>>>> >
>>>>>> > I would say the refguide link you provided should not be considered
>>>>>> as
>>>>>> > complete.
>>>>>> >
>>>>>> > Note it is marked as Blocker by Sean B.
>>>>>> >
>>>>>> > On Tue, Apr 19, 2016 at 10:43 AM, Marcelo Vanzin <
>>>>>> vanzin@cloudera.com>
>>>>>> > wrote:
>>>>>> >>
>>>>>> >> You're entitled to your own opinions.
>>>>>> >>
>>>>>> >> While you're at it, here's some much better documentation,
from the
>>>>>> >> HBase project themselves, than what the Spark example provides:
>>>>>> >> http://hbase.apache.org/book.html#spark
>>>>>> >>
>>>>>> >> On Tue, Apr 19, 2016 at 10:41 AM, Ted Yu <yuzhihong@gmail.com>
>>>>>> wrote:
>>>>>> >> > bq. it's actually in use right now in spite of not
being in any
>>>>>> upstream
>>>>>> >> > HBase release
>>>>>> >> >
>>>>>> >> > If it is not in upstream, then it is not relevant for
discussion
>>>>>> on
>>>>>> >> > Apache
>>>>>> >> > mailing list.
>>>>>> >> >
>>>>>> >> > On Tue, Apr 19, 2016 at 10:38 AM, Marcelo Vanzin <
>>>>>> vanzin@cloudera.com>
>>>>>> >> > wrote:
>>>>>> >> >>
>>>>>> >> >> Alright, if you prefer, I'll say "it's actually
in use right
>>>>>> now in
>>>>>> >> >> spite of not being in any upstream HBase release",
and it's more
>>>>>> >> >> useful than a single example file in the Spark
repo for those
>>>>>> who
>>>>>> >> >> really want to integrate with HBase.
>>>>>> >> >>
>>>>>> >> >> Spark's example is really very trivial (just uses
one of
>>>>>> HBase's input
>>>>>> >> >> formats), which makes it not very useful as a blueprint
for
>>>>>> developing
>>>>>> >> >> HBase apps with Spark.
>>>>>> >> >>
>>>>>> >> >> On Tue, Apr 19, 2016 at 10:28 AM, Ted Yu <yuzhihong@gmail.com>
>>>>>> wrote:
>>>>>> >> >> > bq. I wouldn't call it "incomplete".
>>>>>> >> >> >
>>>>>> >> >> > I would call it incomplete.
>>>>>> >> >> >
>>>>>> >> >> > Please see HBASE-15333 'Enhance the filter
to handle short,
>>>>>> integer,
>>>>>> >> >> > long,
>>>>>> >> >> > float and double' which is a bug fix.
>>>>>> >> >> >
>>>>>> >> >> > Please exclude presence of related of module
in vendor distro
>>>>>> from
>>>>>> >> >> > this
>>>>>> >> >> > discussion.
>>>>>> >> >> >
>>>>>> >> >> > Thanks
>>>>>> >> >> >
>>>>>> >> >> > On Tue, Apr 19, 2016 at 10:23 AM, Marcelo
Vanzin
>>>>>> >> >> > <vanzin@cloudera.com>
>>>>>> >> >> > wrote:
>>>>>> >> >> >>
>>>>>> >> >> >> On Tue, Apr 19, 2016 at 10:20 AM, Ted
Yu <
>>>>>> yuzhihong@gmail.com>
>>>>>> >> >> >> wrote:
>>>>>> >> >> >> > I want to note that the hbase-spark
module in HBase is
>>>>>> incomplete.
>>>>>> >> >> >> > Zhan
>>>>>> >> >> >> > has
>>>>>> >> >> >> > several patches pending review.
>>>>>> >> >> >>
>>>>>> >> >> >> I wouldn't call it "incomplete". Lots
of functionality is
>>>>>> there,
>>>>>> >> >> >> which
>>>>>> >> >> >> doesn't mean new ones, or more efficient
implementations of
>>>>>> existing
>>>>>> >> >> >> ones, can't be added.
>>>>>> >> >> >>
>>>>>> >> >> >> > hbase-spark module is currently only
in master branch
>>>>>> which would
>>>>>> >> >> >> > be
>>>>>> >> >> >> > released as 2.0
>>>>>> >> >> >>
>>>>>> >> >> >> Just as a side note, it's part of CDH
5.7.0, not that it
>>>>>> matters
>>>>>> >> >> >> much
>>>>>> >> >> >> for upstream HBase.
>>>>>> >> >> >>
>>>>>> >> >> >> --
>>>>>> >> >> >> Marcelo
>>>>>> >> >> >
>>>>>> >> >> >
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >> --
>>>>>> >> >> Marcelo
>>>>>> >> >
>>>>>> >> >
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> --
>>>>>> >> Marcelo
>>>>>> >
>>>>>> >
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Marcelo
>>>>>>
>>>>>
>>>>>
>>>>
>>>> Want to work at Handy? Check out our culture deck and open roles
>>>> <http://www.handy.com/careers>
>>>> Latest news <http://www.handy.com/press> at Handy
>>>> Handy just raised $50m
>>>> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/>
led
>>>> by Fidelity
>>>>
>>>>
>>>
>>
>>
>> --
>> Marcelo
>>
>
>


-- 
Marcelo

Mime
View raw message