hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ming Li <...@pivotal.io>
Subject Support orc format
Date Fri, 17 Jun 2016 10:02:36 GMT
Hi Guys,

ORC (Optimized Row Columnar) is a very popular open source format adopted
in some major components in Hadoop eco-system. It is also used by a lot of
users. The advantages of supporting ORC storage in HAWQ are in two folds:
firstly, it makes HAWQ more Hadoop native which interacts with other
components more easily; secondly, ORC stores some meta info for query
optimization, thus, it might potentially outperform two native formats
(i.e., AO, Parquet) if it is available.

Since there are lots of popular formats available in HDFS community, and
more advanced formats are emerging frequently. It is good option for HAWQ
to design a general framework that supports pluggable c/c++ formats such as
ORC, as well as native format such as AO and Parquet. In designing this
framework, we also need to support data stored in different file systems:
HDFS, local disk, amazon S3, etc. Thus, it is better to offer a framework
to support pluggable formats and pluggable file systems.

We are proposing support ORC in JIRA (
https://issues.apache.org/jira/browse/HAWQ-786). Please see the design spec
in the JIRA.

Your comments are appreciated!

Ming Li

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message