Yes you should use orc it is much faster and more compact. Additionally you can apply compression (snappy) to increase performance. Your data processing pipeline seems to be not.very optimized. You should use the newest hive version enabling storage indexes and bloom filters on appropriate columns. Ideally you should insert the data sorted appropriately. Partitioning and setting the execution engine to tez is also beneficial.
Hbase with phoenix should currently only be used if you do few joins, not very complex queries and not many full table scans.
Hi, here I got two things to know.FIRST:In our project we use hive.We daily get new data. We need to process this new data only once. And send this processed data to RDBMS. Here in processing we majorly use many complex queries with joins with where condition and grouping functions. There are many intermediate tables generated around 50 while processing. Till now we use text format as storage. We came across ORC file format. I would like to know that since it is one Time querying the table is it worth of storing as ORC format.SECOND:I came to know about HBase, which is faster.Can I replace hive with HBase for processing of data daily faster.Currently it is taking 15hrs daily with hive.Please inform me if any other information is needed.Thanks & regardsVenkatesh