tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jihoon Son <jihoon...@apache.org>
Subject Re: Feedback for tajo-0.10.0
Date Mon, 16 Mar 2015 09:16:36 GMT
Interesting.
Here are some reasons I think.

Impala can process Parquet files in a distributed manner even if their size
is very large. This is becuase of its DBMS-like architecture. Impala has a
storage manager and a query executor, which have totally separated roles
during query processing. Once a query is executed, the storage manager
continuously loads data from HDFS into memory. Query executors just process
data loaded in memory according to the query plan. As a result, even though
the number of data files is small, huge amounts of query executors can
process them simultaneously.

However, in Tajo, each query executor (i.e., task) has a scanner to read
data from HDFS. During query processing, their roles are closely related.
Thus, the number of tasks is mainly decided based on the number of files.
(Of course, when input data are dispersed on the entire cluster nodes, the
number of tasks is decided based on the number of cpu cores and disks per
worker.)

So, I expect that Tajo's performance with Parquet will be improved when
there are sufficiently many input files.
As I aforementioned, this is just my suspection.
I will investigate further.

Thanks,
Jihoon

On Mon, Mar 16, 2015 at 5:45 PM Azuryy Yu <azuryyyu@gmail.com> wrote:

> HDFS block size is also 1GB
>
>
> On Mon, Mar 16, 2015 at 4:18 PM, Jihoon Son <jihoonson@apache.org> wrote:
>
> > Right. A large file size can improves the sequential scan on Parquet.
> > However, if you want to use the large file size, it is recommended to
> also
> > increase the HDFS block size to reduce the remote read cost.
> > How large size did you set for HDFS blocks?
> >
> > On Impala's good performance, I will also investigate it.
> > It seems to be related with Impala's storage manager.
> >
> > Best,
> > Jihoon
> >
> > On Mon, Mar 16, 2015 at 5:05 PM Azuryy Yu <azuryyyu@gmail.com> wrote:
> >
> > > Hi Jihoon,
> > >
> > > Impala works on Parquet is more faster than other file formats. and
> > Impala
> > > advice don't make more small parquet files. 1GB would be better.
> > >
> > >
> > >
> > >
> > > On Mon, Mar 16, 2015 at 3:57 PM, Jihoon Son <jihoonson@apache.org>
> > wrote:
> > >
> > > > Thanks!
> > > > It is really interesting.
> > > > I suspect that the large file size of Parquet makes Tajo slower. This
> > is
> > > > because Parquet is non-splittable, which means that only 4 workers
> read
> > > > data from HDFS. In addition, if the HDFS block size is smaller than
> > 1GB,
> > > a
> > > > lot of data can be moved over network during the scan phase.
> > > >
> > > > But, I have no idea why Impala shows good performance.
> > > > Maybe, its cache scheme improved it.
> > > >
> > > > Best regards,
> > > > Jihoon
> > > >
> > > > On Mon, Mar 16, 2015 at 4:16 PM Azuryy Yu <azuryyyu@gmail.com>
> wrote:
> > > >
> > > > > PS. my Parquet data was generated by Impala: "Insert into a parquet
> > > table
> > > > > [SHUFFLE] ... AS select .... from a text table"
> > > > >
> > > > > On Mon, Mar 16, 2015 at 3:11 PM, Azuryy Yu <azuryyyu@gmail.com>
> > wrote:
> > > > >
> > > > > > Hi Jihoon,
> > > > > >
> > > > > > Here is an example:
> > > > > > My data: (Parquet file is 1GB limited)
> > > > > >  hadoop fs -ls /data/basetable/par/dt=20150301/pf=pc
> > > > > >
> > > > > > -rw-r--r--   9 hadoop tajo 1062932057 2015-03-12 15:08
> > > > > > /data/basetable/par/dt=20150301/pf=pc/cc456c9d427c88a3-
> > > > > 3ead7e35ecf0da8_448517166_data.0.parq
> > > > > > -rw-r--r--   9 hadoop tajo 1063205684 2015-03-12 15:11
> > > > > > /data/basetable/par/dt=20150301/pf=pc/cc456c9d427c88a3-
> > > > > 3ead7e35ecf0da8_448517166_data.1.parq
> > > > > > -rw-r--r--   9 hadoop tajo 1063236005 2015-03-12 15:14
> > > > > > /data/basetable/par/dt=20150301/pf=pc/cc456c9d427c88a3-
> > > > > 3ead7e35ecf0da8_448517166_data.2.parq
> > > > > > -rw-r--r--   9 hadoop tajo  543786632 2015-03-12 15:16
> > > > > > /data/basetable/par/dt=20150301/pf=pc/cc456c9d427c88a3-
> > > > > 3ead7e35ecf0da8_448517166_data.3.parq
> > > > > >
> > > > > > hadoop fs -ls /data/basetable/snappy/dt=20150301/pf=pc
> > > > > >
> > > > > > -rw-r--r--   9 tajo tajo  144059045 2015-03-16 11:48
> > > > > > /data/basetable/snappy/dt=20150301/pf=pc/part-r-00000
> > > > > > -rw-r--r--   9 tajo tajo  144178118 2015-03-16 11:48
> > > > > > /data/basetable/snappy/dt=20150301/pf=pc/part-r-00001
> > > > > > -rw-r--r--   9 tajo tajo  143642438 2015-03-16 11:48
> > > > > > /data/basetable/snappy/dt=20150301/pf=pc/part-r-00002
> > > > > > -rw-r--r--   9 tajo tajo  143553142 2015-03-16 11:48
> > > > > > /data/basetable/snappy/dt=20150301/pf=pc/part-r-00003
> > > > > > -rw-r--r--   9 tajo tajo  143849627 2015-03-16 11:48
> > > > > > /data/basetable/snappy/dt=20150301/pf=pc/part-r-00004
> > > > > > -rw-r--r--   9 tajo tajo  144648456 2015-03-16 11:48
> > > > > > /data/basetable/snappy/dt=20150301/pf=pc/part-r-00005
> > > > > > -rw-r--r--   9 tajo tajo  144647502 2015-03-16 11:48
> > > > > > /data/basetable/snappy/dt=20150301/pf=pc/part-r-00006
> > > > > > -rw-r--r--   9 tajo tajo  144551053 2015-03-16 11:48
> > > > > > /data/basetable/snappy/dt=20150301/pf=pc/part-r-00007
> > > > > > -rw-r--r--   9 tajo tajo  144017287 2015-03-16 11:48
> > > > > > /data/basetable/snappy/dt=20150301/pf=pc/part-r-00008
> > > > > > -rw-r--r--   9 tajo tajo  144205111 2015-03-16 11:48
> > > > > > /data/basetable/snappy/dt=20150301/pf=pc/part-r-00009
> > > > > > -rw-r--r--   9 tajo tajo  145066506 2015-03-16 11:48
> > > > > > /data/basetable/snappy/dt=20150301/pf=pc/part-r-00010
> > > > > > -rw-r--r--   9 tajo tajo  144740791 2015-03-16 11:48
> > > > > > /data/basetable/snappy/dt=20150301/pf=pc/part-r-00011
> > > > > > -rw-r--r--   9 tajo tajo  144198266 2015-03-16 11:48
> > > > > > /data/basetable/snappy/dt=20150301/pf=pc/part-r-00012
> > > > > > -rw-r--r--   9 tajo tajo  143575440 2015-03-16 11:48
> > > > > > /data/basetable/snappy/dt=20150301/pf=pc/part-r-00013
> > > > > > -rw-r--r--   9 tajo tajo  143922343 2015-03-16 11:48
> > > > > > /data/basetable/snappy/dt=20150301/pf=pc/part-r-00014
> > > > > > -rw-r--r--   9 tajo tajo  143930019 2015-03-16 11:48
> > > > > > /data/basetable/snappy/dt=20150301/pf=pc/part-r-00015
> > > > > > -rw-r--r--   9 tajo tajo  144253019 2015-03-16 11:48
> > > > > > /data/basetable/snappy/dt=20150301/pf=pc/part-r-00016
> > > > > > -rw-r--r--   9 tajo tajo  144175506 2015-03-16 11:48
> > > > > > /data/basetable/snappy/dt=20150301/pf=pc/part-r-00017
> > > > > > -rw-r--r--   9 tajo tajo  143072995 2015-03-16 11:48
> > > > > > /data/basetable/snappy/dt=20150301/pf=pc/part-r-00018
> > > > > > -rw-r--r--   9 tajo tajo  143818118 2015-03-16 11:48
> > > > > > /data/basetable/snappy/dt=20150301/pf=pc/part-r-00019
> > > > > >
> > > > > > Result:
> > > > > >
> > > > > > default> select sum (cast(movie_vv as bigint)), sum(cast(movie_cv
> > as
> > > > > > bigint)),sum(cast(movie_pt as bigint)) from snappy where pf='pc';
> > > > > > Progress: 19%, response time: 1.87 sec
> > > > > > Progress: 19%, response time: 1.873 sec
> > > > > > Progress: 19%, response time: 2.276 sec
> > > > > > Progress: 100%, response time: 2.372 sec
> > > > > > ?sum_3,  ?sum_4,  ?sum_5
> > > > > > -------------------------------
> > > > > > 6928463,  6183665,  6055494385
> > > > > > (1 rows, 2.372 sec, 27 B selected)
> > > > > > default> select sum (cast(movie_vv as bigint)), sum(cast(movie_cv
> > as
> > > > > > bigint)),sum(cast(movie_pt as bigint)) from par where pf='pc';
> > > > > > Progress: 0%, response time: 0.751 sec
> > > > > > Progress: 0%, response time: 0.753 sec
> > > > > > Progress: 0%, response time: 1.155 sec
> > > > > > Progress: 0%, response time: 1.959 sec
> > > > > > Progress: 0%, response time: 2.962 sec
> > > > > > Progress: 0%, response time: 3.965 sec
> > > > > > Progress: 0%, response time: 4.968 sec
> > > > > > Progress: 0%, response time: 5.97 sec
> > > > > > Progress: 12%, response time: 6.974 sec
> > > > > > Progress: 12%, response time: 7.977 sec
> > > > > > Progress: 12%, response time: 8.979 sec
> > > > > > Progress: 12%, response time: 9.982 sec
> > > > > > Progress: 25%, response time: 10.985 sec
> > > > > > Progress: 100%, response time: 11.14 sec
> > > > > > ?sum_3,  ?sum_4,  ?sum_5
> > > > > > -------------------------------
> > > > > > 6928463,  6183665,  6055494385
> > > > > > (1 rows, 11.14 sec, 27 B selected)
> > > > > >
> > > > > > On Mon, Mar 16, 2015 at 2:58 PM, Jihoon Son <
> jihoonson@apache.org>
> > > > > wrote:
> > > > > >
> > > > > >> Azuryy, thanks for your feedbacks.
> > > > > >> They are very interesting results.
> > > > > >> Would you mind telling me how Tajo with Parquet is slower
than
> > Tajo
> > > > with
> > > > > >> RCFile?
> > > > > >>
> > > > > >> Thanks,
> > > > > >> Jihoon
> > > > > >>
> > > > > >> On Mon, Mar 16, 2015 at 3:39 PM Hyunsik Choi <
> hyunsik@apache.org>
> > > > > wrote:
> > > > > >>
> > > > > >> > Hi Azuryy,
> > > > > >> >
> > > > > >> > Thank for sharing the test results. They are very inspiring
to
> > us.
> > > > > >> > Also, I'll make some jira about the problems that you
found.
> > > > > >> >
> > > > > >> > Best regards,
> > > > > >> > Hyunsik
> > > > > >> >
> > > > > >> > On Sun, Mar 15, 2015 at 10:58 PM, Azuryy Yu <
> azuryyyu@gmail.com
> > >
> > > > > wrote:
> > > > > >> > > Another fix:
> > > > > >> > > My test result is unfair during compare Imapla-2.1.2
and
> > > > > Tajo-0.10.0,
> > > > > >> > > because I used Parquet with Impala and RCFILE
snappy with
> > Tajo.
> > > I
> > > > > >> should
> > > > > >> > > use the same file format to compare.
> > > > > >> > >
> > > > > >> > > because I've got a clear conclusion that Imapala
works
> better
> > on
> > > > > >> Parquet
> > > > > >> > > than Tajo, so I use RCFILE as the test data.
> > > > > >> > >
> > > > > >> > > *Tajo*:
> > > > > >> > > default> select sum (cast(movie_vv as bigint)),
> > > sum(cast(movie_cv
> > > > as
> > > > > >> > > bigint)),sum(cast(movie_pt as bigint)) from snappy;
> > > > > >> > > Progress: 0%, response time: 1.598 sec
> > > > > >> > > Progress: 0%, response time: 1.6 sec
> > > > > >> > > Progress: 0%, response time: 2.003 sec
> > > > > >> > > Progress: 0%, response time: 2.806 sec
> > > > > >> > > Progress: 37%, response time: 3.808 sec
> > > > > >> > > Progress: 100%, response time: 4.792 sec
> > > > > >> > > ?sum_3,  ?sum_4,  ?sum_5
> > > > > >> > > -------------------------------
> > > > > >> > > 22557920,  19648838,  2005366694576
> > > > > >> > > (1 rows, 4.792 sec, 32 B selected)
> > > > > >> > >
> > > > > >> > > *Impala*:
> > > > > >> > >  > select sum (cast(movie_vv as bigint)), sum(cast(movie_cv
> as
> > > > > >> > > bigint)),sum(cast(movie_pt as bigint)) from snappy;
> > > > > >> > > +-----------------------------
> --+---------------------------
> > > > > >> > ----+-------------------------------+
> > > > > >> > > | sum(cast(movie_vv as bigint)) | sum(cast(movie_cv
as
> > bigint))
> > > |
> > > > > >> > > sum(cast(movie_pt as bigint)) |
> > > > > >> > > +-----------------------------
> --+---------------------------
> > > > > >> > ----+-------------------------------+
> > > > > >> > > | 22557920                      | 19648838
> > > |
> > > > > >> > > 2005366694576                 |
> > > > > >> > > +-----------------------------
> --+---------------------------
> > > > > >> > ----+-------------------------------+
> > > > > >> > > Fetched 1 row(s) in 11.12s
> > > > > >> > >
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > On Mon, Mar 16, 2015 at 1:49 PM, Azuryy Yu <
> > azuryyyu@gmail.com>
> > > > > >> wrote:
> > > > > >> > >
> > > > > >> > >> There is a typo in my Email. I corrected here:
> > > > > >> > >>
> > > > > >> > >> for example:
> > > > > >> > >>
> > > > > >> > >>   <property>
> > > > > >> > >>     <name>tajo.master.umbilical-rpc.address</name>
> > > > > >> > >>     <value>1-1-1-1:26001</value>
> > > > > >> > >>   </property>
> > > > > >> > >>
> > > > > >> > >> which does work under tajo-0.9.0, but it complain
> > > "1-1-1-1:2601"
> > > > is
> > > > > >> not
> > > > > >> > a
> > > > > >> > >> valid network address under tajo-0.10.0.
> > > > > >> > >>
> > > > > >> > >> I have to change to:
> > > > > >> > >>   <property>
> > > > > >> > >>     <name>tajo.master.umbilical-rpc.address</name>
> > > > > >> > >>     <value>1.1.1.1:26001</value>
> > > > > >> > >>   </property>
> > > > > >> > >>
> > > > > >> > >>
> > > > > >> > >> On Mon, Mar 16, 2015 at 1:44 PM, Azuryy Yu
<
> > azuryyyu@gmail.com
> > > >
> > > > > >> wrote:
> > > > > >> > >>
> > > > > >> > >>> Hi,
> > > > > >> > >>> I compiled tajo-0.10 source based on hadoop-2.6.0,
then
> post
> > > > some
> > > > > >> > >>> feedback here.
> > > > > >> > >>>
> > > > > >> > >>> My cluster:
> > > > > >> > >>> 1 tajo-master, 9 tajo-worker
> > > > > >> > >>> 24 CPU(logic), 64GB mem, 4TB*12 HDD
> > > > > >> > >>>
> > > > > >> > >>> Feedback:
> > > > > >> > >>> 1) tajo task progress estimate is normal
on partitioned
> > table,
> > > > > >> which is
> > > > > >> > >>> incorrect sometimes in tajo-0.9.0
> > > > > >> > >>> 2) Tajo configuration doesn't support
hostname in
> > > tajo-site.xml.
> > > > > >> > >>> for example:
> > > > > >> > >>>
> > > > > >> > >>>   <property>
> > > > > >> > >>>     <name>tajo.master.umbilical-rpc.address</name>
> > > > > >> > >>>     <value>1-1-1-1:26001</value>
> > > > > >> > >>>   </property>
> > > > > >> > >>>
> > > > > >> > >>> which does work under tajo-0.9.0, but
it complain
> > > "1-1-1-1:2601"
> > > > > is
> > > > > >> > not a
> > > > > >> > >>> valid network address.
> > > > > >> > >>>
> > > > > >> > >>> I have to change to:
> > > > > >> > >>>   <property>
> > > > > >> > >>>     <name>tajo.master.umbilical-rpc.address</name>
> > > > > >> > >>>     <value>1.1.1.1:26001</value>
> > > > > >> > >>>   </property>
> > > > > >> > >>>
> > > > > >> > >>> but we don't use IP in our cluster, only
hostname. so I
> did
> > a
> > > > > >> little in
> > > > > >> > >>> the code:
> > > > > >> > >>> org.apache.tajo.validation.NetworkAddressValidator.java:
> > > > > >> > >>> hostnamePattern = Pattern.compile("\\d*-\\d*-\\d*-\\d");
> > > > > >> > >>> then It works.
> > > > > >> > >>>
> > > > > >> > >>> 3) I did some test on the parquet, RCFILE(snappy
> > compressed),
> > > > > >> > >>> RCFILE(GZIP compressed)
> > > > > >> > >>>
> > > > > >> > >>> they are the same data, only different
from file format.
> > > > > >> > >>> the table has six partitions, 20 RCFILES,
each parquet
> file
> > is
> > > > > 1GB.
> > > > > >> > >>>
> > > > > >> > >>> then rcfile with snappy's performance
is similiar to
> rcfile
> > > with
> > > > > >> gzip.
> > > > > >> > >>> but they are all two~three times better
than parquet.
> > > > > >> > >>>
> > > > > >> > >>> 4) I compared tajo-0.10 and Impala-2.1.2,
> > > > > >> > >>> Impala can provide very good support for
parquet. more
> > better
> > > > than
> > > > > >> > Tajo.
> > > > > >> > >>>
> > > > > >> > >>> but impala is more *slow *with other format
than Tajo.
> > > > > >> > >>> such as(I don't use WHERE because I want
query all six
> > > > partitions
> > > > > >> > >>> together):
> > > > > >> > >>>
> > > > > >> > >>> *Impala*:
> > > > > >> > >>>  > select sum (cast(movie_vv as bigint)),
> sum(cast(movie_cv
> > as
> > > > > >> > >>> bigint)),sum(cast(movie_pt as bigint))
from par;
> > > > > >> > >>>
> > > > > >> > >>> +-----------------------------
> --+---------------------------
> > > > > >> > ----+-------------------------------+
> > > > > >> > >>> | sum(cast(movie_vv as bigint)) | sum(cast(movie_cv
as
> > > bigint))
> > > > |
> > > > > >> > >>> sum(cast(movie_pt as bigint)) |
> > > > > >> > >>>
> > > > > >> > >>> +-----------------------------
> --+---------------------------
> > > > > >> > ----+-------------------------------+
> > > > > >> > >>> | 22557920                      | 19648838
> > > > |
> > > > > >> > >>> 2005366694576           |
> > > > > >> > >>>
> > > > > >> > >>> +-----------------------------
> --+---------------------------
> > > > > >> > ----+-------------------------------+
> > > > > >> > >>> Fetched 1 row(s) in 6.02s
> > > > > >> > >>>
> > > > > >> > >>> *Tajo:*
> > > > > >> > >>>
> > > > > >> > >>> *default*> select sum (cast(movie_vv
as bigint)),
> > > > > sum(cast(movie_cv
> > > > > >> as
> > > > > >> > >>> bigint)),sum(cast(movie_pt as bigint))
from snappy;
> > > > > >> > >>> Progress: 0%, response time: 1.598 sec
> > > > > >> > >>> Progress: 0%, response time: 1.6 sec
> > > > > >> > >>> Progress: 0%, response time: 2.003 sec
> > > > > >> > >>> Progress: 0%, response time: 2.806 sec
> > > > > >> > >>> Progress: 37%, response time: 3.808 sec
> > > > > >> > >>> Progress: 100%, response time: 4.792 sec
> > > > > >> > >>> ?sum_3,  ?sum_4,  ?sum_5
> > > > > >> > >>> -------------------------------
> > > > > >> > >>> 22557920,  19648838,  2005366694576
> > > > > >> > >>> (1 rows, 4.792 sec, 32 B selected)
> > > > > >> > >>>
> > > > > >> > >>>
> > > > > >> > >>>
> > > > > >> > >>>
> > > > > >> > >>>
> > > > > >> > >>>
> > > > > >> > >>>
> > > > > >> > >>>
> > > > > >> > >>>
> > > > > >> > >>>
> > > > > >> > >>
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message