www-announce mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sally Khudairi ...@apache.org>
Subject The Apache Software Foundation Announces Apache™ Tajo™ v0.9
Date Tue, 21 Oct 2014 10:00:22 GMT
>> this announcement is also available online at http://s.apache.org/qr

Robust, Open Source "SQL-on-Hadoop" Big Data warehouse solution now faster, with improved
performance and enhanced integration with Apache Hadoop™. 

Forest Hill, MD –21 October 2014– The Apache Software Foundation (ASF), the all-volunteer
developers, stewards, and incubators of more than 200 Open Source projects and initiatives,
announced today the availability of Apache™ Tajo™ v0.9, the advanced Open Source data
warehousing system in Apache Hadoop™.

"With Apache Tajo v0.9, our goal of bringing traditional SQL performance to massive data is
a step closer," said Hyunsik Choi, Vice President of Apache Tajo. "We really enjoyed working
to improve Tajo's leading-edge native SQL support, and its lightning performance across divergent

Dubbed an "SQL-on-Hadoop" solution, Apache Tajo is used for low-latency and scalable ad-hoc
queries, online aggregation, and ETL (extract-transform-load process) on large data sets stored
on HDFS (Hadoop Distributed File System) and other data sources. By supporting SQL standards
and leveraging advanced database techniques, Tajo allows direct control of distributed execution
and data flow across a variety of query evaluation strategies and optimization opportunities.
Overall, Apache Tajo v0.9 delivers more powerful native SQL support on an even faster platform.

"We have been determined from the outset to find ways of boosting query processing speed without
compromising system robustness and solution accessibility," said Jihoon Son, member of the
Apache Tajo Project Management Committee. "In practice, that means using cutting-edge query
techniques and processing algorithms as our source of 'speed', meanwhile maintaining three
key features: Fault tolerance, the ability to fully utilize working memory and write to disk,
and data source neutrality. We think those design choices give Apache Tajo long-run flexibility
and coherence." 

Features and enhancements in Apache Tajo v0.9 include:

 - More comprehensive and powerful SQL capabilities, such as TIMESTAMP, DATE, TIME, and INTERVAL
type support, as well as WINDOW functions, OVER clause support, and multiple distinct aggregation;

 - Performance improvements, such as offheap sort algorithm for ORDER BY and Runtime code
generation for evaluating expressions push the boundaries of massive data query speeds; 

 - Improvements to the hash shuffle I/O, boosting bottom-line speeds by 200-300% on "heavy",
complex queries; 

 - Enhanced Hadoop integration, including support for Hadoop 2.2.0 up to Hadoop 2.5.1, and
expanded Hive Metastore access; 

 - Improved catalog backup and restore feature, as well as accessibility enhancements streamline
performance across disparate technology environments.

Apache Tajo is part of the Apache Hadoop ecosystem at a variety of organizations, including
Gruter, Korea University, and NASA JPL's Radio Astronomy and Airborne Snow Observatory projects,
among others. At SK Telecom, South Korea's largest wireless carrier, Apache Tajo has undergone
a brutal testing regimen, where it has had to deal with telco-sized data stores, node growth
and cluster expansion, and a grueling company-wide data analysis and reporting schedule. "The
fast processing capabilities of Apache Tajo have allowed us to build an entirely new big data
warehouse and OLAP system," said Eddy Park, Hadoop-based Data Warehouse Project Manager at
SK Telecom. "Apache Tajo now plays a vital role in data-driven decision making at our company."

Hyoungjun Kim, CTO of Gruter, said "We run Apache Tajo in-house on 30 cluster nodes in order
to power Seenal, our social network analysis service that supplies social media insight to
government and corporate clients. On the one hand, this involves running complex ETL processes
on hundreds of gigabytes of data per day in order to detect market and opinion signals. On
the other hand, analysts and project teams often need to run very specific analyses on much
smaller data sets. Tajo is able to handle the full spectrum of Seenal’s data processing
and query needs at high speed and with minimal fuss."

"We're very excited about the release of Apache Tajo 0.9," added Choi. "The Apache Tajo community,
committers, and supporters have really done our mission proud."

Availability and Oversight
As with all Apache products, Apache Tajo software is released under the Apache License v2.0,
and is overseen by a self-selected team of active contributors to the project. A Project Management
Committee (PMC) guides the Project's day-to-day operations, including community development
and product releases. For downloads, documentation, and ways to become involved with Apache
Tajo, visit http://tajo.apache.org/ and https://twitter.com/ApacheTajo

About The Apache Software Foundation (ASF) 
Established in 1999, the all-volunteer Foundation oversees more than two hundred leading Open
Source projects, including Apache HTTP Server --the world's most popular Web server software.
Through the ASF's meritocratic process known as "The Apache Way," more than 450 individual
Members and 4,000 Committers successfully collaborate to develop freely available enterprise-grade
software, benefiting millions of users worldwide: thousands of software solutions are distributed
under the Apache License; and the community actively participates in ASF mailing lists, mentoring
initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo.
The ASF is a US 501(c)(3) charitable organization, funded by individual donations and corporate
sponsors including Budget Direct, Citrix, Cloudera, Comcast, Facebook, Google, Hortonworks,
HP, Huawei, IBM, InMotion Hosting, Matt Mullenweg, Microsoft, Pivotal, Produban, WANdisco,
and Yahoo. For more
 information, visit http://www.apache.org/ or follow @TheASF on Twitter.

"Apache", "Apache Hadoop", "Hadoop", "Apache Tajo", "Tajo", "ApacheCon", and the Apache Tajo
logo are trademarks of The Apache Software Foundation. All other brands and trademarks are
the property of their respective owners. 

# # #

NOTE: you are receiving this message because you are subscribed to the announce@apache.org
distribution list. To unsubscribe, send email from the recipient account to announce-unsubscribe@apache.org
with the word "Unsubscribe" in the subject line.

View raw message