There is a typo in my Email. I corrected here:
for example:
<property>
<name>tajo.master.umbilical-rpc.address</name>
<value>1-1-1-1:26001</value>
</property>
which does work under tajo-0.9.0, but it complain "1-1-1-1:2601" is not a
valid network address under tajo-0.10.0.
I have to change to:
<property>
<name>tajo.master.umbilical-rpc.address</name>
<value>1.1.1.1:26001</value>
</property>
On Mon, Mar 16, 2015 at 1:44 PM, Azuryy Yu <azuryyyu@gmail.com> wrote:
> Hi,
> I compiled tajo-0.10 source based on hadoop-2.6.0, then post some feedback
> here.
>
> My cluster:
> 1 tajo-master, 9 tajo-worker
> 24 CPU(logic), 64GB mem, 4TB*12 HDD
>
> Feedback:
> 1) tajo task progress estimate is normal on partitioned table, which is
> incorrect sometimes in tajo-0.9.0
> 2) Tajo configuration doesn't support hostname in tajo-site.xml.
> for example:
>
> <property>
> <name>tajo.master.umbilical-rpc.address</name>
> <value>1-1-1-1:26001</value>
> </property>
>
> which does work under tajo-0.9.0, but it complain "1-1-1-1:2601" is not a
> valid network address.
>
> I have to change to:
> <property>
> <name>tajo.master.umbilical-rpc.address</name>
> <value>1.1.1.1:26001</value>
> </property>
>
> but we don't use IP in our cluster, only hostname. so I did a little in
> the code:
> org.apache.tajo.validation.NetworkAddressValidator.java:
> hostnamePattern = Pattern.compile("\\d*-\\d*-\\d*-\\d");
> then It works.
>
> 3) I did some test on the parquet, RCFILE(snappy compressed), RCFILE(GZIP
> compressed)
>
> they are the same data, only different from file format.
> the table has six partitions, 20 RCFILES, each parquet file is 1GB.
>
> then rcfile with snappy's performance is similiar to rcfile with gzip. but
> they are all two~three times better than parquet.
>
> 4) I compared tajo-0.10 and Impala-2.1.2,
> Impala can provide very good support for parquet. more better than Tajo.
>
> but impala is more *slow *with other format than Tajo.
> such as(I don't use WHERE because I want query all six partitions
> together):
>
> *Impala*:
> > select sum (cast(movie_vv as bigint)), sum(cast(movie_cv as
> bigint)),sum(cast(movie_pt as bigint)) from par;
>
> +-------------------------------+-------------------------------+-------------------------------+
> | sum(cast(movie_vv as bigint)) | sum(cast(movie_cv as bigint)) |
> sum(cast(movie_pt as bigint)) |
>
> +-------------------------------+-------------------------------+-------------------------------+
> | 22557920 | 19648838 |
> 2005366694576 |
>
> +-------------------------------+-------------------------------+-------------------------------+
> Fetched 1 row(s) in 6.02s
>
> *Tajo:*
>
> *default*> select sum (cast(movie_vv as bigint)), sum(cast(movie_cv as
> bigint)),sum(cast(movie_pt as bigint)) from snappy;
> Progress: 0%, response time: 1.598 sec
> Progress: 0%, response time: 1.6 sec
> Progress: 0%, response time: 2.003 sec
> Progress: 0%, response time: 2.806 sec
> Progress: 37%, response time: 3.808 sec
> Progress: 100%, response time: 4.792 sec
> ?sum_3, ?sum_4, ?sum_5
> -------------------------------
> 22557920, 19648838, 2005366694576
> (1 rows, 4.792 sec, 32 B selected)
>
>
>
>
>
>
>
>
>
>
|