spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jakub Wozniak <jakub.wozn...@cern.ch>
Subject Re: [VOTE] Release Apache Spark 2.4.1 (RC2)
Date Tue, 12 Mar 2019 08:50:43 GMT
Hello,

Any more thoughts on this one?
Will that be let in 2.4.1 or rather not?

Thanks in advance,
Jakub


On 8 Mar 2019, at 11:26, Jakub Wozniak <jakub.wozniak@cern.ch<mailto:jakub.wozniak@cern.ch>>
wrote:

Hi,

To me it is backwards compatible with older Hbase versions.
The code actually only falls back to the newer api on exception.

It would be great if this gets in.
Otherwise a setup with Hbase 2 + Spark 2.4 gets a bit complicated as we are forced to use
an older version of the Hbase client (1.4.9) when running on Yarn.
In theory compatible but we see some performance degradations while doing reads from Hbase
with the older client (we are investigating it now).
We have had issues in the past when Hbase server & client versions were not aligned so
this is not our favourite.

Thanks,
Jakub


On 8 Mar 2019, at 11:15, Jakub Wozniak <jakub.wozniak@cern.ch<mailto:jakub.wozniak@cern.ch>>
wrote:

I guess it is that one:
https://github.com/apache/spark/commit/dfed439e33b7bf224dd412b0960402068d961c7b#diff-9ebb59b7b008c694a8f583b94bd24e1d

Cheers,
Jakub


On 7 Mar 2019, at 17:25, Sean Owen <srowen@gmail.com<mailto:srowen@gmail.com>>
wrote:

Do you know what change fixed it?
If it's not a regression from 2.4.0 it wouldn't necessarily go into a
maintenance release. If there were no downside, maybe; does it cause
any incompatibility with older HBase versions?
It may be that this support is targeted for Spark 3 on purpose, which
is probably due in the middle of the year.

On Thu, Mar 7, 2019 at 8:57 AM Jakub Wozniak <jakub.wozniak@cern.ch<mailto:jakub.wozniak@cern.ch>>
wrote:

Hello,

I have a question regarding the 2.4.1 release.

It looks like Spark 2.4 (and 2.4.1-rc) is not exactly compatible with Hbase 2.x+ for the Yarn
mode.
The problem is in the org.apache.spark.deploy.security.HbaseDelegationTokenProvider class
that expects a specific version of TokenUtil class from Hbase that was changed between Hbase
1.x & 2.x.
On top the HadoopDelegationTokenManager does not use the ServiceLoader class so I cannot attach
my own provider (providers are hardcoded).

It seems that both problems are resolved on the Spark master branch.

Is there any reason not to include this fix in the 2.4.1 release?
If so when do you plan to release it (the fix for Hbase)?

Or maybe there is something I’ve overlooked, please correct me if I’m wrong.

Best regards,
Jakub


On 7 Mar 2019, at 03:04, Saisai Shao <sai.sai.shao@gmail.com<mailto:sai.sai.shao@gmail.com>>
wrote:

Do we have other block/critical issues for Spark 2.4.1 or waiting something to be fixed? I
roughly searched the JIRA, seems there's no block/critical issues marked for 2.4.1.

Thanks
Saisai

shane knapp <sknapp@berkeley.edu<mailto:sknapp@berkeley.edu>> 于2019年3月7日周四
上午4:57写道:

i'll be popping in to the sig-big-data meeting on the 20th to talk about stuff like this.

On Wed, Mar 6, 2019 at 12:40 PM Stavros Kontopoulos <stavros.kontopoulos@lightbend.com<mailto:stavros.kontopoulos@lightbend.com>>
wrote:

Yes its a touch decision and as we discussed today (https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA)
"Kubernetes support window is 9 months, Spark is two years". So we may end up with old client
versions on branches still supported like 2.4.x in the future.
That gives us no choice but to upgrade, if we want to be on the safe side. We have tested
3.0.0 with 1.11 internally and it works but I dont know what it means to run with old
clients.


On Wed, Mar 6, 2019 at 7:54 PM Sean Owen <srowen@gmail.com<mailto:srowen@gmail.com>>
wrote:

If the old client is basically unusable with the versions of K8S
people mostly use now, and the new client still works with older
versions, I could see including this in 2.4.1.

Looking at https://github.com/fabric8io/kubernetes-client#compatibility-matrix
it seems like the 4.1.1 client is needed for 1.10 and above. However
it no longer supports 1.7 and below.
We have 3.0.x, and versions through 4.0.x of the client support the
same K8S versions, so no real middle ground here.

1.7.0 came out June 2017, it seems. 1.10 was March 2018. Minor release
branches are maintained for 9 months per
https://kubernetes.io/docs/setup/version-skew-policy/

Spark 2.4.0 came in Nov 2018. I suppose we could say it should have
used the newer client from the start as at that point (?) 1.7 and
earlier were already at least 7 months past EOL.
If we update the client in 2.4.1, versions of K8S as recently
'supported' as a year ago won't work anymore. I'm guessing there are
still 1.7 users out there? That wasn't that long ago but if the
project and users generally move fast, maybe not.

Normally I'd say, that's what the next minor release of Spark is for;
update if you want later infra. But there is no Spark 2.5.
I presume downstream distros could modify the dependency easily (?) if
needed and maybe already do. It wouldn't necessarily help end users.

Does the 3.0.x client not work at all with 1.10+ or just unsupported.
If it 'basically works but no guarantees' I'd favor not updating. If
it doesn't work at all, hm. That's tough. I think I'd favor updating
the client but think it's a tough call both ways.



On Wed, Mar 6, 2019 at 11:14 AM Stavros Kontopoulos
<stavros.kontopoulos@lightbend.com<mailto:stavros.kontopoulos@lightbend.com>>
wrote:

Yes Shane Knapp has done the work for that already,  and also tests pass, I am working on
a PR now, I could submit it for the 2.4 branch .
I understand that this is a major dependency update, but the problem I see is that the client
version is so old that I dont think it makes
much sense for current users who are on k8s 1.10, 1.11 etc(https://github.com/fabric8io/kubernetes-client#compatibility-matrix,
3.0.0 does not even exist in there).
I dont know what it means to use that old version with current k8s clusters in terms of bugs
etc.





--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu<https://rise.cs.berkeley.edu/>



---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org<mailto:dev-unsubscribe@spark.apache.org>




Mime
View raw message