phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <andrew.purt...@gmail.com>
Subject Re: Moving Phoenix master to Hbase 2.2
Date Wed, 15 Jan 2020 02:40:39 GMT
It’s not necessary to abstract the HBase interfaces into a compatibility layer, at least
not to start. At each bump from one minor release to another a fix up typically touches a
handful of files. The jump from 1.x to 2.x is a bigger deal but maybe there should still be
separate branches for major HBase versions? 

Anyway let’s assume for now you want to unify all the branches for HBase 1.x. Start with
the lowest HBase version you want to support. Then iterate up to the highest HBase version
you want to support. Whenever you run into compile problems, make a new version specific maven
module, add logic to the parent POM that chooses the right one. Then for each implicated file,
move it into the version specific maven modules, duplicating as needed, and finally fixing
up where needed. 

Over time you can iterate over the duplicated files and reduce duplication but there is no
need to take that on up front, so the task is not insurmountable. It can be incremental. 


> On Jan 14, 2020, at 4:46 PM, Josh Elser <elserj@apache.org> wrote:
> 
> Still not having looked at what Tephra does -- I'm intrigued by what Istvan has in-progress.
Waiting to see what he comes up with would be my suggestion :)
> 
>> On 1/14/20 1:12 PM, larsh@apache.org wrote:
>>  Does somebody volunteer to take this up?
>> I can see whether I can a resource where I work, but it's highly uncertain.
>> It would need a bit of digging and design work to see how we would abstract the HBase
interface in the most effective way.
>> As mentioned below, Tephra did a good job at this and could serve as an example here.
(Not dinging OMID, OMID does most of it's work client side and doesn't need these abstractions.)
>> -- Lars
>>     On Tuesday, January 14, 2020, 01:13:36 AM PST, István Tóth <stoty@cloudera.com.invalid>
wrote:
>>    Yes, the HBase API signatures change between versions, so we need to
>> compile each compat module against a specific HBase.
>> Whether I can define an internal compatibility API that is switchable at
>> run (startup) time without a performance hit remains to be seen.
>> István
>>> On Tue, Jan 14, 2020 at 3:21 AM Josh Elser <elserj@apache.org> wrote:
>>> Agree that trying to wrangle branches is just too frustrating and
>>> error-prone.
>>> 
>>> It would also be great if we could have a single Phoenix jar that works
>>> across HBase versions, but would not die on that hill :)
>>> 
>>> On 12/20/19 5:04 AM, larsh@apache.org wrote:
>>>>   I said _provided_ they can be isolated easily :) (I meant it in the
>>> sense of assuming it's easy).
>>>> As I said though, Tephra has a similar problem and they did a really
>>> good job isolating HBase versions. We can learn from them. Sometimes they
>>> isolate the change only, and sometimes the class needs to be copied, but
>>> even then it's the one class that is copied, not another branch that needs
>>> to be kept in sync.
>>>> 
>>>> This may also drive the desperately necessary refactoring of Phoenix to
>>> make these things easier to isolate, or to reduce the copying to a minimum.
>>> And we'd need to think through testing carefully.
>>>> 
>>>> The branch per Phoenix and HBase version is too complex, IMHO. And the
>>> complex branch to HBase version mapping that Istvan outlines below confirms
>>> that.
>>>> 
>>>> We should all take a brief look at the Tephra solution and see whether
>>> we can apply that. (And since Tephra is part of the fold now, perhaps
>>> someone can help there...?)
>>>> Cheers.
>>>> -- Lars
>>>> 
>>>>       On Thursday, December 19, 2019, 8:34:15 PM GMT+1, Geoffrey Jacoby <
>>> gjacoby@gmail.com> wrote:
>>>> 
>>>>   Lars,
>>>> 
>>>> I'm curious why you say the differences are easily isolated -- many of
>>> the
>>>> core classes of Phoenix either directly inherit HBase classes or
>>> implement
>>>> HBase interfaces, and those can vary between minor versions. (See my
>>> above
>>>> example of a new coprocessor hook on BaseRegionObserver.)
>>>> 
>>>> Geoffrey
>>>> 
>>>> On Thu, Dec 19, 2019 at 10:54 AM larsh@apache.org <larsh@apache.org>
>>> wrote:
>>>> 
>>>>>     Yep. The differences are pretty minimal - provided they can be
>>> isolated
>>>>> easily.
>>>>> Tephra might be a pretty good model. It supports various versions of
>>> HBase
>>>>> in a single branch and has similar issues as Phoenix (coprocessors,
>>> etc).
>>>>> -- Lars
>>>>>       On Thursday, December 19, 2019, 7:07:51 PM GMT+1, Josh Elser <
>>>>> elserj@apache.org> wrote:
>>>>> 
>>>>>     To clarify, you think that compat modules are better than that
>>>>> separate-branches model in 4.x?
>>>>> 
>>>>> On 12/18/19 11:29 AM, larsh@apache.org wrote:
>>>>>> This is really hard to follow.
>>>>>> 
>>>>>> I think we should do the same with HBase dependencies in Phoenix
that
>>>>> HBase does with Hadoop dependencies.
>>>>>> 
>>>>>> That is:  We could have a maven module with the specific HBase version
>>>>> dependent code.
>>>>>> Btw. Tephra does the same... A module for HBase version specific
code.
>>>>>> -- Lars
>>>>>> 
>>>>>>         On Tuesday, December 17, 2019, 10:00:31 AM GMT+1, Istvan
Toth <
>>>>> stoty@apache.org> wrote:
>>>>>> 
>>>>>>     What do you think about tying the minor releases to Hbase minor
>>> releases
>>>>>> (not necessarily one-to-one)
>>>>>> 
>>>>>> for example (provided 5.1 is 2020H1)
>>>>>> 
>>>>>> 5.0.0 -> HB 2.0
>>>>>> 5.1.0 -> HB 2.2.2 (and whatever 2.1 is API compatible with it)
>>>>>> 5.1.x -> HB 2.2.x (treat as maintenance branch, no major new features)
>>>>>> 5.2.0 -> HB 2.3.0 (if released by that time)
>>>>>> 5.2.x -> HB 2.3.x (treat as maintenance branch, no major new features)
>>>>>> 5.3.0 -> HB 2.3.x (if there is no new major/minor Hbase release)
>>>>>> master -> latest released HBase version
>>>>>> 
>>>>>> Alternatively, we could stick with the same HBase version for patch
>>>>>> releases that we used for the first minor release.
>>>>>> 
>>>>>> This would limit the number of branches that we have to maintain
in
>>>>>> parallel, while providing maintenance branches for older releases,
and
>>>>>> timely-ish Phoenix releases.
>>>>>> 
>>>>>> The drawback is that users of old HBase versions won't get the latest
>>>>>> features, on the other hand they can expect more polish.
>>>>>> 
>>>>>> Istvan
>>>>>> 
>>>>>> On Thu, Dec 12, 2019 at 8:05 PM Geoffrey Jacoby <gjacoby@apache.org>
>>>>> wrote:
>>>>>> 
>>>>>>> Since HBase 2.0 is EOM'ed, I'm +1 for not worrying about 2.0.x
>>>>>>> compatibility with the 5.x branch going forward.
>>>>>>> 
>>>>>>> Given how coupled Phoenix is to the implementation details of
HBase
>>>>> though,
>>>>>>> I'm not sure trying to abstract those away to keep one Phoenix
branch
>>>>> per
>>>>>>> HBase major version is practical, however. At the least, it would
be
>>>>> really
>>>>>>> complex.
>>>>>>> 
>>>>>>> For example, in the new year I plan to return to working on the
change
>>>>> data
>>>>>>> capture and Phoenix-level replication features, both of which
depend
>>> on
>>>>>>> WALKey interface changes and a new RegionObserver coprocessor
hook
>>>>>>> introduced in HBASE-22622 and HBASE-22623. This was released
in HBase
>>>>> 1.5
>>>>>>> and will be in the forthcoming HBase 2.3. While the HBase community
is
>>>>>>> discussing EOMing 1.3 right now, and maybe 1.4 will go in the
medium
>>>>> term,
>>>>>>> I don't see all pre-2.3 branch-2's getting deprecated anytime
soon.
>>>>>>> 
>>>>>>> So there will be at least two significant features that can only
exist
>>>>> in
>>>>>>> some but not all of our 4.x and 5.x branches.
>>>>>>> 
>>>>>>> Geoffrey
>>>>>>> 
>>>>>>> On Thu, Dec 12, 2019 at 8:21 AM Josh Elser <elserj@apache.org>
wrote:
>>>>>>> 
>>>>>>>> As much as possible, I'd like to avoid us getting into another
>>>>> situation
>>>>>>>> with 5.x where we have multiple branches. My hope was/is
that we can
>>>>>>>> keep one Phoenix5 branch that works against an acceptable
set of
>>> HBase
>>>>>>>> branches.
>>>>>>>> 
>>>>>>>> To me, that acceptable set of HBase branches is _a_ 2.1 and
2.2
>>>>> release.
>>>>>>>> I don't think we need to support all 2.1.x or 2.2.x, nor
do I think
>>> we
>>>>>>>> need to keep trying to maintain 2.0.x as it's already end
of support
>>> by
>>>>>>>> the HBase community.
>>>>>>>> 
>>>>>>>> Thanks for updating your PR. I'll add this to my review queue.
>>>>>>>> 
>>>>>>>> On 12/12/19 1:52 AM, Istvan Toth wrote:
>>>>>>>>> Hi!
>>>>>>>>> 
>>>>>>>>> I'd like to start a conversation about supporting HBase
2.2. in the
>>>>>>>>> master branch.
>>>>>>>>> 
>>>>>>>>> https://issues.apache.org/jira/browse/PHOENIX-5268 has
a slightly
>>> out
>>>>>>> of
>>>>>>>>> date, but functional PR for HBase 2.2 support on master.
(Please
>>>>> review
>>>>>>>>> and comment if you have the time, I'll try to update
the PR in the
>>>>> next
>>>>>>>>> few days)
>>>>>>>>> 
>>>>>>>>> The reason that it is not a straightforward decision
to merge it is
>>>>>>> that
>>>>>>>>> applying that patch breaks compatibility with HBase 2.0.1,
the
>>> current
>>>>>>>>> base.
>>>>>>>>> 
>>>>>>>>> I can see the following outcomes:
>>>>>>>>> 
>>>>>>>>> - Do nothing
>>>>>>>>> - Move master to HBase 2.2.2
>>>>>>>>> - Fork master to Hbase-2.0 and Hbase-2.2 branches
>>>>>>>>> - Build time compatibility modules
>>>>>>>>> - Run time compatibility modules
>>>>>>>>> - Something that I haven't thought of
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Doing nothing is obviously not a long term solution,
as the current
>>>>>>>>> master doesn't work with any of the currently supported
HBase
>>>>> branches,
>>>>>>>>> but we may postpone the inevitable.
>>>>>>>>> 
>>>>>>>>> Simply moving master to HBase 2.2 is the most attractive
solution
>>> from
>>>>>>> a
>>>>>>>>> pure developer POV, but there may be other considerations.
>>>>>>>>> 
>>>>>>>>> Having multiple masters for 2.0 and 2.2 is simple from
a code
>>>>>>>>> perspective, but maintaining two branches is a non-trivial
amount of
>>>>>>>>> additional work. (See the 4.x situation)
>>>>>>>>> 
>>>>>>>>> Moving the HBase version dependent stuff into a separate
module, and
>>>>>>>>> choosing at build time is not pretty from a code POV,
but saves us
>>> the
>>>>>>>>> hassle of maintaining multiple branches, while maintaining
>>>>>>> compatibility
>>>>>>>>> with multiple  HBase versions, and can handle future
API changes as
>>>>>>> well
>>>>>>>>> from a single branch. Doing something like this could
have saved us
>>>>> the
>>>>>>>>> effort of maintaining three separate 4.x branches.
>>>>>>>>> 
>>>>>>>>> I feel that since Phoenix is closely timed to HBase,
and requires
>>>>>>>>> cluster-wide HBase configuration to work anyway, handling
the
>>>>> different
>>>>>>>>> HBase versions from the same binary/JAR is not worth
the effort.
>>>>>>>>> 
>>>>>>>>> Please share your thoughts!
>>>>>>>>> 
>>>>>>>>> regards
>>>>>>>>> Istvan
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 

Mime
View raw message