tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Azuryy Yu <azury...@gmail.com>
Subject Feedback for tajo-0.10.0
Date Mon, 16 Mar 2015 05:44:31 GMT
Hi,
I compiled tajo-0.10 source based on hadoop-2.6.0, then post some feedback
here.

My cluster:
1 tajo-master, 9 tajo-worker
24 CPU(logic), 64GB mem, 4TB*12 HDD

Feedback:
1) tajo task progress estimate is normal on partitioned table, which is
incorrect sometimes in tajo-0.9.0
2) Tajo configuration doesn't support hostname in tajo-site.xml.
for example:

  <property>
    <name>tajo.master.umbilical-rpc.address</name>
    <value>1-1-1-1:26001</value>
  </property>

which does work under tajo-0.9.0, but it complain "1-1-1-1:2601" is not a
valid network address.

I have to change to:
  <property>
    <name>tajo.master.umbilical-rpc.address</name>
    <value>1.1.1.1:26001</value>
  </property>

but we don't use IP in our cluster, only hostname. so I did a little in the
code:
org.apache.tajo.validation.NetworkAddressValidator.java:
hostnamePattern = Pattern.compile("\\d*-\\d*-\\d*-\\d");
then It works.

3) I did some test on the parquet, RCFILE(snappy compressed), RCFILE(GZIP
compressed)

they are the same data, only different from file format.
the table has six partitions, 20 RCFILES, each parquet file is 1GB.

then rcfile with snappy's performance is similiar to rcfile with gzip. but
they are all two~three times better than parquet.

4) I compared tajo-0.10 and Impala-2.1.2,
Impala can provide very good support for parquet. more better than Tajo.

but impala is more *slow *with other format than Tajo.
such as(I don't use WHERE because I want query all six partitions together):

*Impala*:
 > select sum (cast(movie_vv as bigint)), sum(cast(movie_cv as
bigint)),sum(cast(movie_pt as bigint)) from par;
+-------------------------------+-------------------------------+-------------------------------+
| sum(cast(movie_vv as bigint)) | sum(cast(movie_cv as bigint)) |
sum(cast(movie_pt as bigint)) |
+-------------------------------+-------------------------------+-------------------------------+
| 22557920                      | 19648838                      |
2005366694576           |
+-------------------------------+-------------------------------+-------------------------------+
Fetched 1 row(s) in 6.02s

*Tajo:*

*default*> select sum (cast(movie_vv as bigint)), sum(cast(movie_cv as
bigint)),sum(cast(movie_pt as bigint)) from snappy;
Progress: 0%, response time: 1.598 sec
Progress: 0%, response time: 1.6 sec
Progress: 0%, response time: 2.003 sec
Progress: 0%, response time: 2.806 sec
Progress: 37%, response time: 3.808 sec
Progress: 100%, response time: 4.792 sec
?sum_3,  ?sum_4,  ?sum_5
-------------------------------
22557920,  19648838,  2005366694576
(1 rows, 4.792 sec, 32 B selected)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message