spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matei Zaharia <matei.zaha...@gmail.com>
Subject Re: Utilize newer hadoop releases WAS: [VOTE] Release Apache Spark 1.0.2 (RC1)
Date Sun, 27 Jul 2014 20:50:29 GMT
For this particular issue, it would be good to know if Hadoop provides an API to determine
the Hadoop version. If not, maybe that can be added to Hadoop in its next release, and we
can check for it with reflection. We recently added a SparkContext.version() method in Spark
to let you tell the version.

Matei

On Jul 27, 2014, at 12:19 PM, Patrick Wendell <pwendell@gmail.com> wrote:

> Hey Ted,
> 
> We always intend Spark to work with the newer Hadoop versions and
> encourage Spark users to use the newest Hadoop versions for best
> performance.
> 
> We do try to be liberal in terms of supporting older versions as well.
> This is because many people run older HDFS versions and we want Spark
> to read and write data from them. So far we've been willing to do this
> despite some maintenance cost.
> 
> The reason is that for many users it's very expensive to do a
> whole-sale upgrade of HDFS, but trying out new versions of Spark is
> much easier. For instance, some of the largest scale Spark users run
> fairly old or forked HDFS versions.
> 
> - Patrick
> 
> On Sun, Jul 27, 2014 at 12:01 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>> Thanks for replying, Patrick.
>> 
>> The intention of my first email was for utilizing newer hadoop releases for
>> their bug fixes. I am still looking for clean way of passing hadoop release
>> version number to individual classes.
>> Using newer hadoop releases would encourage pushing bug fixes / new
>> features upstream. Ultimately Spark code would become cleaner.
>> 
>> Cheers
>> 
>> On Sun, Jul 27, 2014 at 8:52 AM, Patrick Wendell <pwendell@gmail.com> wrote:
>> 
>>> Ted - technically I think you are correct, although I wouldn't
>>> recommend disabling this lock. This lock is not expensive (acquired
>>> once per task, as are many other locks already). Also, we've seen some
>>> cases where Hadoop concurrency bugs ended up requiring multiple fixes
>>> - concurrency of client access is not well tested in the Hadoop
>>> codebase since most of the Hadoop tools to not use concurrent access.
>>> So in general it's good to be conservative in what we expect of the
>>> Hadoop client libraries.
>>> 
>>> If you'd like to discuss this further, please fork a new thread, since
>>> this is a vote thread. Thanks!
>>> 
>>> On Fri, Jul 25, 2014 at 10:14 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>>>> HADOOP-10456 is fixed in hadoop 2.4.1
>>>> 
>>>> Does this mean that synchronization
>>>> on HadoopRDD.CONFIGURATION_INSTANTIATION_LOCK can be bypassed for hadoop
>>>> 2.4.1 ?
>>>> 
>>>> Cheers
>>>> 
>>>> 
>>>> On Fri, Jul 25, 2014 at 6:00 PM, Patrick Wendell <pwendell@gmail.com>
>>> wrote:
>>>> 
>>>>> The most important issue in this release is actually an ammendment to
>>>>> an earlier fix. The original fix caused a deadlock which was a
>>>>> regression from 1.0.0->1.0.1:
>>>>> 
>>>>> Issue:
>>>>> https://issues.apache.org/jira/browse/SPARK-1097
>>>>> 
>>>>> 1.0.1 Fix:
>>>>> https://github.com/apache/spark/pull/1273/files (had a deadlock)
>>>>> 
>>>>> 1.0.2 Fix:
>>>>> https://github.com/apache/spark/pull/1409/files
>>>>> 
>>>>> I failed to correctly label this on JIRA, but I've updated it!
>>>>> 
>>>>> On Fri, Jul 25, 2014 at 5:35 PM, Michael Armbrust
>>>>> <michael@databricks.com> wrote:
>>>>>> That query is looking at "Fix Version" not "Target Version".  The
fact
>>>>> that
>>>>>> the first one is still open is only because the bug is not resolved
in
>>>>>> master.  It is fixed in 1.0.2.  The second one is partially fixed
in
>>>>> 1.0.2,
>>>>>> but is not worth blocking the release for.
>>>>>> 
>>>>>> 
>>>>>> On Fri, Jul 25, 2014 at 4:23 PM, Nicholas Chammas <
>>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>> 
>>>>>>> TD, there are a couple of unresolved issues slated for 1.0.2
>>>>>>> <
>>>>>>> 
>>>>> 
>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%201.0.2%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC
>>>>>>>> .
>>>>>>> Should they be edited somehow?
>>>>>>> 
>>>>>>> 
>>>>>>> On Fri, Jul 25, 2014 at 7:08 PM, Tathagata Das <
>>>>>>> tathagata.das1565@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Please vote on releasing the following candidate as Apache
Spark
>>>>> version
>>>>>>>> 1.0.2.
>>>>>>>> 
>>>>>>>> This release fixes a number of bugs in Spark 1.0.1.
>>>>>>>> Some of the notable ones are
>>>>>>>> - SPARK-2452: Known issue is Spark 1.0.1 caused by attempted
fix
>>> for
>>>>>>>> SPARK-1199. The fix was reverted for 1.0.2.
>>>>>>>> - SPARK-2576: NoClassDefFoundError when executing Spark QL
query on
>>>>>>>> HDFS CSV file.
>>>>>>>> The full list is at http://s.apache.org/9NJ
>>>>>>>> 
>>>>>>>> The tag to be voted on is v1.0.2-rc1 (commit 8fb6f00e):
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=8fb6f00e195fb258f3f70f04756e07c259a2351f
>>>>>>>> 
>>>>>>>> The release files, including signatures, digests, etc can
be found
>>> at:
>>>>>>>> http://people.apache.org/~tdas/spark-1.0.2-rc1/
>>>>>>>> 
>>>>>>>> Release artifacts are signed with the following key:
>>>>>>>> https://people.apache.org/keys/committer/tdas.asc
>>>>>>>> 
>>>>>>>> The staging repository for this release can be found at:
>>>>>>>> 
>>>>> https://repository.apache.org/content/repositories/orgapachespark-1024/
>>>>>>>> 
>>>>>>>> The documentation corresponding to this release can be found
at:
>>>>>>>> http://people.apache.org/~tdas/spark-1.0.2-rc1-docs/
>>>>>>>> 
>>>>>>>> Please vote on releasing this package as Apache Spark 1.0.2!
>>>>>>>> 
>>>>>>>> The vote is open until Tuesday, July 29, at 23:00 UTC and
passes if
>>>>>>>> a majority of at least 3 +1 PMC votes are cast.
>>>>>>>> [ ] +1 Release this package as Apache Spark 1.0.2
>>>>>>>> [ ] -1 Do not release this package because ...
>>>>>>>> 
>>>>>>>> To learn more about Apache Spark, please see
>>>>>>>> http://spark.apache.org/
>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 


Mime
View raw message