www-announce mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sally Khudairi ...@apache.org>
Subject The Apache Software Foundation Announces Apache® CarbonData™ as a Top-Level Project
Date Mon, 01 May 2017 11:02:12 GMT
[this announcement is available online at https://s.apache.org/QmTI ]

Open Source Big Data analytics accelerator in use at Bank of
Communications, Hulu, Huawei, SAIC Motor, Zhejiang Mobile, among others.

Forest Hill, MD –1 May 2017– The Apache Software Foundation (ASF), the
all-volunteer developers, stewards, and incubators of more than 350 Open
Source projects and initiatives, announced today that Apache®
CarbonData™ has graduated from the Apache Incubator to become a
Top-Level Project (TLP), signifying that the project's community and
products have been well-governed under the ASF's meritocratic process
and principles.

Apache CarbonData is an indexed columnar store file format for fast
analytics on Big Data platforms (including Apache Hadoop, Apache Spark,
among others) to help speed up queries an order of magnitude faster over
petabytes of data.

"We are very proud to complete the incubation process and graduate as an
Apache Top-Level Project," said Liang Chen, Vice President of Apache
CarbonData. "The CarbonData community grew rapidly over last ten months,
both in terms of size and diversity. Since entering the Apache
Incubator, we have completed 4 releases, and exceeded 90 contributors
from 10 different organizations."

With the aim of using a unified file format to satisfy all kinds of data
analysis cases, Apache CarbonData seamlessly integrates with Hadoop and
Spark to improve Big Data analysis efficiency. In benchmarks,
CarbonData's faster interactive query helps in speeding up queries
approximately 10x faster than standard column-oriented SQL on Hadoop
data stores.

Highlights include:

 - Unique data organization to allow faster filtering and better
 compression;
 - Multi-level Indexing to enable faster search and speeding up query
 processing;
 - Deep Apache Spark Integration for dataframe + SQL compliance;
 - Advanced push down optimization to minimize the amount of data being
 read processed, converted, transmitted, and shuffled;
 - Efficient compression and global encoding schemes to further improve
 aggregation query performance;
 - Dictionary encoding for reduced storage space and faster processing;
 and
 - Data update + delete support using standard SQL syntax.

Apache CarbonData is in use at an array of organizations, including Bank
of Communications, medical/pharma social platform DXY, Hulu, Huawei,
group online retailer MEITUAN, SAIC Motor, Zhejiang Mobile, among
others.

"CarbonData has very good performance as a ‘SQL on Hadoop’ solution,"
said Tan Sheng, Director of SAIC Motor’s Big Data team. "It is suitable
for SAIC Motor to adopt as a central Big Data platform component. Not
only do we use Apache CarbonData, we also actively participate in its
community as contributors." 

"Apache CarbonData is great, as helped our audit business to improve
7-10X performance based on 14 billion rows of data," said Wei Zhao,
Senior Engineer at Bank of Communications.

"Apache CarbonData is very suitable for our filter query cases, and has
averaged 20x improvement on performance," said William Zhu, Architecture
team member at DXY. "And, as CarbonData supports data update and delete,
this feature is very useful. We would consider CarbonData as our
all-in-one solution to unify all analysis data."

CarbonData was first developed at Huawei in 2013. The project was
submitted to the Apache Incubator in June 2016, and had its first
official release two months later. The project won top honors in the
BlackDuck 2016 Open Source Rookies of the Year's Big Data category.

"Apache CarbonData is a great example of the value of the incubation
process," said Jean-Baptiste Onofré, Apache CarbonData Incubator Mentor
and Project Management Committee member. "Helping grow the CarbonData
developer and user communities has increased our visibility, which
allowed us to extend our use cases and tests, and gather new ideas. The
initial CarbonData committers did (and are still doing) great work to
welcome new users and contributors, clearly understanding it's a step
forward for the project."

"We will continue to put our efforts towards optimizing data format
efficiency for Big Data ecosystem and provide an unified and high
performance data storage solution," added Liang. "The Apache CarbonData
community welcomes interested contributors to work with us on our
journey forward."

Catch Apache CarbonData in action at ApacheCon (16-18 May/Miami), and
Spark Summit (5-7 June/San Francisco).

Availability and Oversight
Apache CarbonData software is released under the Apache License v2.0 and
is overseen by a self-selected team of active contributors to the
project. A Project Management Committee (PMC) guides the Project's
day-to-day operations, including community development and product
releases. For downloads, documentation, and ways to become involved with
Apache CarbonData, visit http://carbondata.apache.org/ ,
https://twitter.com/ApacheCarbonDat , and
https://www.facebook.com/carbondata/

About the Apache Incubator
The Apache Incubator is the entry path for projects and codebases
wishing to become part of the efforts at The Apache Software Foundation.
All code donations from external organizations and existing external
projects wishing to join the ASF enter through the Incubator to: 1)
ensure all donations are in accordance with the ASF legal standards; and
2) develop new communities that adhere to our guiding principles.
Incubation is required of all newly accepted projects until a further
review indicates that the infrastructure, communications, and decision
making process have stabilized in a manner consistent with other
successful ASF projects. While incubation status is not necessarily a
reflection of the completeness or stability of the code, it does
indicate that the project has yet to be fully endorsed by the ASF. For
more information, visit http://incubator.apache.org/

About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees more than 350
leading Open Source projects, including Apache HTTP Server --the world's
most popular Web server software. Through the ASF's meritocratic process
known as "The Apache Way," more than 620 individual Members and 6,000
Committers successfully collaborate to develop freely available
enterprise-grade software, benefiting millions of users worldwide:
thousands of software solutions are distributed under the Apache
License; and the community actively participates in ASF mailing lists,
mentoring initiatives, and ApacheCon, the Foundation's official user
conference, trainings, and expo. The ASF is a US 501(c)(3) charitable
organization, funded by individual donations and corporate sponsors
including Alibaba Cloud Computing, ARM, Bloomberg, Budget Direct,
Capital One, Cash Store, Cerner, Cloudera, Comcast, Confluent, Facebook,
Google, Hortonworks, HP, Huawei, IBM, InMotion Hosting, iSigma,
LeaseWeb, Microsoft, ODPi, PhoenixNAP, Pivotal, Private Internet Access,
Produban, Red Hat, Serenata Flowers, Target, WANdisco, and Yahoo. For
more information, visit http://www.apache.org/ and
https://twitter.com/TheASF

© The Apache Software Foundation. "Apache", "CarbonData", "Apache
CarbonData", "Hadoop", "Apache Hadoop", "Spark", "Apache Spark", and
"ApacheCon" are registered trademarks or trademarks of the Apache
Software Foundation in the United States and/or other countries. All
other brands and trademarks are the property of their respective owners.

# # # 


NOTE: you are receiving this message because you are subscribed to the
announce@apache.org distribution list. To unsubscribe, send email from
the recipient account to announce-unsubscribe@apache.org with the word
"Unsubscribe" in the subject line.

Mime
View raw message