spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Iulian DragoČ™ <iulian.dra...@typesafe.com>
Subject Re: [VOTE] Release Apache Spark 1.6.0 (RC4)
Date Wed, 23 Dec 2015 13:44:08 GMT
+1 (non-binding)

Tested Mesos deployments (client and cluster-mode, fine-grained and
coarse-grained). Things look good
<https://ci.typesafe.com/view/Spark/job/mit-docker-test-ref/8/console>.

iulian

On Wed, Dec 23, 2015 at 2:35 PM, Sean Owen <sowen@cloudera.com> wrote:

> Docker integration tests still fail for Mark and I, and should
> probably be disabled:
> https://issues.apache.org/jira/browse/SPARK-12426
>
> ... but if anyone else successfully runs these (and I assume Jenkins
> does) then not a blocker.
>
> I'm having intermittent trouble with other tests passing, but nothing
> unusual.
> Sigs and hashes are OK.
>
> We have 30 issues fixed for 1.6.1. All but those resolved in the last
> 24 hours or so should be fixed for 1.6.0 right? I can touch that up.
>
>
>
>
>
> On Tue, Dec 22, 2015 at 8:10 PM, Michael Armbrust
> <michael@databricks.com> wrote:
> > Please vote on releasing the following candidate as Apache Spark version
> > 1.6.0!
> >
> > The vote is open until Friday, December 25, 2015 at 18:00 UTC and passes
> if
> > a majority of at least 3 +1 PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache Spark 1.6.0
> > [ ] -1 Do not release this package because ...
> >
> > To learn more about Apache Spark, please see http://spark.apache.org/
> >
> > The tag to be voted on is v1.6.0-rc4
> > (4062cda3087ae42c6c3cb24508fc1d3a931accdf)
> >
> > The release files, including signatures, digests, etc. can be found at:
> > http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc4-bin/
> >
> > Release artifacts are signed with the following key:
> > https://people.apache.org/keys/committer/pwendell.asc
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1176/
> >
> > The test repository (versioned as v1.6.0-rc4) for this release can be
> found
> > at:
> > https://repository.apache.org/content/repositories/orgapachespark-1175/
> >
> > The documentation corresponding to this release can be found at:
> > http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc4-docs/
> >
> > =======================================
> > == How can I help test this release? ==
> > =======================================
> > If you are a Spark user, you can help us test this release by taking an
> > existing Spark workload and running on this release candidate, then
> > reporting any regressions.
> >
> > ================================================
> > == What justifies a -1 vote for this release? ==
> > ================================================
> > This vote is happening towards the end of the 1.6 QA period, so -1 votes
> > should only occur for significant regressions from 1.5. Bugs already
> present
> > in 1.5, minor regressions, or bugs related to new features will not block
> > this release.
> >
> > ===============================================================
> > == What should happen to JIRA tickets still targeting 1.6.0? ==
> > ===============================================================
> > 1. It is OK for documentation patches to target 1.6.0 and still go into
> > branch-1.6, since documentations will be published separately from the
> > release.
> > 2. New features for non-alpha-modules should target 1.7+.
> > 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target
> > version.
> >
> >
> > ==================================================
> > == Major changes to help you focus your testing ==
> > ==================================================
> >
> > Notable changes since 1.6 RC3
> >
> >
> >   - SPARK-12404 - Fix serialization error for Datasets with
> > Timestamps/Arrays/Decimal
> >   - SPARK-12218 - Fix incorrect pushdown of filters to parquet
> >   - SPARK-12395 - Fix join columns of outer join for DataFrame using
> >   - SPARK-12413 - Fix mesos HA
> >
> >
> > Notable changes since 1.6 RC2
> >
> >
> > - SPARK_VERSION has been set correctly
> > - SPARK-12199 ML Docs are publishing correctly
> > - SPARK-12345 Mesos cluster mode has been fixed
> >
> > Notable changes since 1.6 RC1
> >
> > Spark Streaming
> >
> > SPARK-2629  trackStateByKey has been renamed to mapWithState
> >
> > Spark SQL
> >
> > SPARK-12165 SPARK-12189 Fix bugs in eviction of storage memory by
> execution.
> > SPARK-12258 correct passing null into ScalaUDF
> >
> > Notable Features Since 1.5
> >
> > Spark SQL
> >
> > SPARK-11787 Parquet Performance - Improve Parquet scan performance when
> > using flat schemas.
> > SPARK-10810 Session Management - Isolated devault database (i.e USE mydb)
> > even on shared clusters.
> > SPARK-9999  Dataset API - A type-safe API (similar to RDDs) that performs
> > many operations on serialized binary data and code generation (i.e.
> Project
> > Tungsten).
> > SPARK-10000 Unified Memory Management - Shared memory for execution and
> > caching instead of exclusive division of the regions.
> > SPARK-11197 SQL Queries on Files - Concise syntax for running SQL queries
> > over files of any supported format without registering a table.
> > SPARK-11745 Reading non-standard JSON files - Added options to read
> > non-standard JSON files (e.g. single-quotes, unquoted attributes)
> > SPARK-10412 Per-operator Metrics for SQL Execution - Display statistics
> on a
> > peroperator basis for memory usage and spilled data size.
> > SPARK-11329 Star (*) expansion for StructTypes - Makes it easier to nest
> and
> > unest arbitrary numbers of columns
> > SPARK-10917, SPARK-11149 In-memory Columnar Cache Performance -
> Significant
> > (up to 14x) speed up when caching data that contains complex types in
> > DataFrames or SQL.
> > SPARK-11111 Fast null-safe joins - Joins using null-safe equality (<=>)
> will
> > now execute using SortMergeJoin instead of computing a cartisian product.
> > SPARK-11389 SQL Execution Using Off-Heap Memory - Support for configuring
> > query execution to occur using off-heap memory to avoid GC overhead
> > SPARK-10978 Datasource API Avoid Double Filter - When implemeting a
> > datasource with filter pushdown, developers can now tell Spark SQL to
> avoid
> > double evaluating a pushed-down filter.
> > SPARK-4849  Advanced Layout of Cached Data - storing partitioning and
> > ordering schemes in In-memory table scan, and adding distributeBy and
> > localSort to DF API
> > SPARK-9858  Adaptive query execution - Intial support for automatically
> > selecting the number of reducers for joins and aggregations.
> > SPARK-9241  Improved query planner for queries having distinct
> aggregations
> > - Query plans of distinct aggregations are more robust when distinct
> columns
> > have high cardinality.
> >
> > Spark Streaming
> >
> > API Updates
> >
> > SPARK-2629  New improved state management - mapWithState - a DStream
> > transformation for stateful stream processing, supercedes
> updateStateByKey
> > in functionality and performance.
> > SPARK-11198 Kinesis record deaggregation - Kinesis streams have been
> > upgraded to use KCL 1.4.0 and supports transparent deaggregation of
> > KPL-aggregated records.
> > SPARK-10891 Kinesis message handler function - Allows arbitraray
> function to
> > be applied to a Kinesis record in the Kinesis receiver before to
> customize
> > what data is to be stored in memory.
> > SPARK-6328  Python Streamng Listener API - Get streaming statistics
> > (scheduling delays, batch processing times, etc.) in streaming.
> >
> > UI Improvements
> >
> > Made failures visible in the streaming tab, in the timelines, batch list,
> > and batch details page.
> > Made output operations visible in the streaming tab as progress bars.
> >
> > MLlib
> >
> > New algorithms/models
> >
> > SPARK-8518  Survival analysis - Log-linear model for survival analysis
> > SPARK-9834  Normal equation for least squares - Normal equation solver,
> > providing R-like model summary statistics
> > SPARK-3147  Online hypothesis testing - A/B testing in the Spark
> Streaming
> > framework
> > SPARK-9930  New feature transformers - ChiSqSelector,
> QuantileDiscretizer,
> > SQL transformer
> > SPARK-6517  Bisecting K-Means clustering - Fast top-down clustering
> variant
> > of K-Means
> >
> > API improvements
> >
> > ML Pipelines
> >
> > SPARK-6725  Pipeline persistence - Save/load for ML Pipelines, with
> partial
> > coverage of spark.mlalgorithms
> > SPARK-5565  LDA in ML Pipelines - API for Latent Dirichlet Allocation in
> ML
> > Pipelines
> >
> > R API
> >
> > SPARK-9836  R-like statistics for GLMs - (Partial) R-like stats for
> ordinary
> > least squares via summary(model)
> > SPARK-9681  Feature interactions in R formula - Interaction operator ":"
> in
> > R formula
> >
> > Python API - Many improvements to Python API to approach feature parity
> >
> > Misc improvements
> >
> > SPARK-7685 , SPARK-9642  Instance weights for GLMs - Logistic and Linear
> > Regression can take instance weights
> > SPARK-10384, SPARK-10385 Univariate and bivariate statistics in
> DataFrames -
> > Variance, stddev, correlations, etc.
> > SPARK-10117 LIBSVM data source - LIBSVM as a SQL data source
> >
> > Documentation improvements
> >
> > SPARK-7751  @since versions - Documentation includes initial version when
> > classes and methods were added
> > SPARK-11337 Testable example code - Automated testing for code in user
> guide
> > examples
> >
> > Deprecations
> >
> > In spark.mllib.clustering.KMeans, the "runs" parameter has been
> deprecated.
> > In spark.ml.classification.LogisticRegressionModel and
> > spark.ml.regression.LinearRegressionModel, the "weights" field has been
> > deprecated, in favor of the new name "coefficients." This helps
> disambiguate
> > from instance (row) weights given to algorithms.
> >
> > Changes of behavior
> >
> > spark.mllib.tree.GradientBoostedTrees validationTol has changed
> semantics in
> > 1.6. Previously, it was a threshold for absolute change in error. Now, it
> > resembles the behavior of GradientDescent convergenceTol: For large
> errors,
> > it uses relative error (relative to the previous error); for small
> errors (<
> > 0.01), it uses absolute error.
> > spark.ml.feature.RegexTokenizer: Previously, it did not convert strings
> to
> > lowercase before tokenizing. Now, it converts to lowercase by default,
> with
> > an option not to. This matches the behavior of the simpler Tokenizer
> > transformer.
> > Spark SQL's partition discovery has been changed to only discover
> partition
> > directories that are children of the given path. (i.e. if
> > path="/my/data/x=1" then x=1 will no longer be considered a partition but
> > only children of x=1.) This behavior can be overridden by manually
> > specifying the basePath that partitioning discovery should start with
> > (SPARK-11678).
> > When casting a value of an integral type to timestamp (e.g. casting a
> long
> > value to timestamp), the value is treated as being in seconds instead of
> > milliseconds (SPARK-11724).
> > With the improved query planner for queries having distinct aggregations
> > (SPARK-9241), the plan of a query having a single distinct aggregation
> has
> > been changed to a more robust version. To switch back to the plan
> generated
> > by Spark 1.5's planner, please set
> > spark.sql.specializeSingleDistinctAggPlanning to true (SPARK-12077).
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>


-- 

--
Iulian Dragos

------
Reactive Apps on the JVM
www.typesafe.com

Mime
View raw message