From Azuryy Yu <azury...@gmail.com>
Subject Feedback for tajo-0.10.0
Date Mon, 16 Mar 2015 05:44:31 GMT
I compiled tajo-0.10 source based on hadoop-2.6.0, then post some feedback

My cluster:
1 tajo-master, 9 tajo-worker
24 CPU(logic), 64GB mem, 4TB*12 HDD

1) tajo task progress estimate is normal on partitioned table, which is
incorrect sometimes in tajo-0.9.0
2) Tajo configuration doesn't support hostname in tajo-site.xml.
for example:


which does work under tajo-0.9.0, but it complain "1-1-1-1:2601" is not a
valid network address.

I have to change to:

but we don't use IP in our cluster, only hostname. so I did a little in the
hostnamePattern = Pattern.compile("\\d*-\\d*-\\d*-\\d");
then It works.

3) I did some test on the parquet, RCFILE(snappy compressed), RCFILE(GZIP

they are the same data, only different from file format.
the table has six partitions, 20 RCFILES, each parquet file is 1GB.

then rcfile with snappy's performance is similiar to rcfile with gzip. but
they are all two~three times better than parquet.

4) I compared tajo-0.10 and Impala-2.1.2,
Impala can provide very good support for parquet. more better than Tajo.

but impala is more *slow *with other format than Tajo.
such as(I don't use WHERE because I want query all six partitions together):

 > select sum (cast(movie_vv as bigint)), sum(cast(movie_cv as
bigint)),sum(cast(movie_pt as bigint)) from par;
| sum(cast(movie_vv as bigint)) | sum(cast(movie_cv as bigint)) |
sum(cast(movie_pt as bigint)) |
| 22557920                      | 19648838                      |
2005366694576           |
Fetched 1 row(s) in 6.02s


*default*> select sum (cast(movie_vv as bigint)), sum(cast(movie_cv as
bigint)),sum(cast(movie_pt as bigint)) from snappy;
Progress: 0%, response time: 1.598 sec
Progress: 0%, response time: 1.6 sec
Progress: 0%, response time: 2.003 sec
Progress: 0%, response time: 2.806 sec
Progress: 37%, response time: 3.808 sec
Progress: 100%, response time: 4.792 sec
?sum_3,  ?sum_4,  ?sum_5
22557920,  19648838,  2005366694576
(1 rows, 4.792 sec, 32 B selected)

