www-announce mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sally Khudairi ...@apache.org>
Subject The Apache® Software Foundation Announces Apache Arrow™ as a Top-Level Project
Date Wed, 17 Feb 2016 11:54:21 GMT
 >> this announcement is available online at https://s.apache.org/5Mc8

Open source Big Data in-memory columnar layer accelerates analytical processing and interchange
by more than 100x. 

Forest Hill, MD --17 Feb 2016-- The Apache Software Foundation (ASF), the all-volunteer developers,
stewards, and incubators of more than 350 Open Source projects and initiatives, announced
today Apache Arrow as a new Top-Level Project. 

A high-performance cross-system data layer for columnar in-memory analytics, Apache Arrow
provides the following benefits for Big Data workloads: 
Accelerates the performance of analytical workloads by more than 100x in some cases 
Enables multi-system workloads by eliminating cross-system communication overhead 

Initially seeded by code from the Apache Drill project, Apache Arrow was built on top of a
number of Open Source collaborations, and establishes a de-facto standard for columnar in-memory
processing and interchange. 

"The Open Source community has joined forces on Apache Arrow," said Jacques Nadeau, Vice President
of Apache Arrow and Vice President Apache Drill. "Developers from 13 major Open Source Big
Data projects are already on board --by introducing a new era of columnar in-memory analytics,
we anticipate the majority of the world's data will be processed through Arrow within the
next few years." 

Code committers to Apache Arrow include developers from Apache Big Data projects Calcite,
Cassandra, Drill, Hadoop, HBase, Impala, Kudu (incubating), Parquet, Phoenix, Spark, and Storm
as well as established and emerging Open Source projects such as Pandas and Ibis. 

"Arrow's cross platform and cross system strengths will enable Python and R to become first-class
languages across the entire Big Data stack," said Wes McKinney, creator of Pandas. 

Apache Arrow accelerates analytical processing by providing a high performance columnar in-memory
representation. A number of processing algorithms benefit greatly from this memory design.

"A columnar in-memory data layer enables systems and applications to process data at full
hardware speeds," said Todd Lipcon, original Apache Kudu creator and member of the Apache
Arrow Project Management Committee. "Modern CPUs are designed to exploit data-level parallelism
via vectorized operations and SIMD instructions. Arrow facilitates such processing." 

In many workloads, 70-80% of CPU cycles are spent serializing and deserializing data. Arrow
solves this problem by enabling data to be shared between systems and processes with no serialization,
deserialization or memory copies. 

"An industry-standard columnar in-memory data layer enables users to combine multiple systems,
applications and programming languages in a single workload without the usual overhead," said
Ted Dunning, Vice President of the Apache Incubator and member of the Apache Arrow Project
Management Committee. 

In addition to traditional relational data, Arrow supports complex data with dynamic schemas.
For example, Arrow can handle JSON data which is commonly used in IoT workloads, modern applications
and log files. Implementations are also available (or underway) for a number of programming
languages including Java, C++ and Python to allow greater interoperability among a number
of Big Data solutions. 
"Real world use cases often include complex combinations of structured and rapidly growing
complex-data. Already tested with Apache Drill, the efficient in-memory columnar representation
and processing in Arrow will enable users to enjoy the performance of columnar processing
with the flexibility of JSON," said Parth Chandra, member of the Apache Drill and Apache Arrow
Project Management Committees. 

Catch Apache Arrow in action at Strata + Hadoop World (San Jose: 30 March 2016, and London:
1-3 June 2016), as well as upcoming MeetUps and local events http://arrow.apache.org/events

Availability and Oversight 
Apache Arrow software is released under the Apache License v2.0 and is overseen by a self-selected
team of active contributors to the project. A Project Management Committee (PMC) guides the
Project's day-to-day operations, including community development and product releases. For
downloads, documentation, and ways to become involved with Apache Arrow, visit http://arrow.apache.org/

About The Apache Software Foundation (ASF) 
Established in 1999, the all-volunteer Foundation oversees more than 350 leading Open Source
projects, including Apache HTTP Server --the world's most popular Web server software. Through
the ASF's meritocratic process known as "The Apache Way," more than 550 individual Members
and 5,300 Committers successfully collaborate to develop freely available enterprise-grade
software, benefiting millions of users worldwide: thousands of software solutions are distributed
under the Apache License; and the community actively participates in ASF mailing lists, mentoring
initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo.
The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate
sponsors including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct, Cerner, Cloudera,
Comcast, Confluent, Facebook, Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma,
LeaseWeb, Matt Mullenweg, Microsoft, PhoenixNAP, Pivotal, Private Internet Access, Produban,
Red Hat, Serenata Flowers, WANdisco, and Yahoo. For more information, visit http://www.apache.org/
or follow @TheASF on Twitter. 

© The Apache Software Foundation. "Apache", "Apache Arrow", "Arrow", "Apache Calcite", "Calcite",
"Apache Cassandra", "Cassandra", "Apache Drill", "Drill", "Apache Hadoop", "Hadoop", "Apache
HBase", "HBase", "Apache Impala", "Impala", "Apache Kudu (incubating)", "Kudu (incubating)",
"Apache Parquet", "Parquet", "Apache Phoenix", "Phoenix", "Apache Spark", "Spark", "Apache
Storm", "Storm", "ApacheCon", and their logos are registered trademarks or trademarks of The
Apache Software Foundation in the U.S. and/or other countries. All other brands and trademarks
are the property of their respective owners. 

# # # 

NOTE: you are receiving this message because you are subscribed to the announce@apache.org
distribution list. To unsubscribe, send email from the recipient account to announce-unsubscribe@apache.org
with the word "Unsubscribe" in the subject line.

View raw message