From dev-return-5739-apmail-tajo-dev-archive=tajo.apache.org@tajo.apache.org Mon Mar 16 05:59:16 2015 Return-Path: X-Original-To: apmail-tajo-dev-archive@minotaur.apache.org Delivered-To: apmail-tajo-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 927E517AEF for ; Mon, 16 Mar 2015 05:59:16 +0000 (UTC) Received: (qmail 71320 invoked by uid 500); 16 Mar 2015 05:59:16 -0000 Delivered-To: apmail-tajo-dev-archive@tajo.apache.org Received: (qmail 71274 invoked by uid 500); 16 Mar 2015 05:59:16 -0000 Mailing-List: contact dev-help@tajo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@tajo.apache.org Delivered-To: mailing list dev@tajo.apache.org Received: (qmail 71263 invoked by uid 99); 16 Mar 2015 05:59:16 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Mar 2015 05:59:16 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of azuryyyu@gmail.com designates 209.85.192.44 as permitted sender) Received: from [209.85.192.44] (HELO mail-qg0-f44.google.com) (209.85.192.44) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Mar 2015 05:58:50 +0000 Received: by qgh62 with SMTP id 62so32059374qgh.1 for ; Sun, 15 Mar 2015 22:58:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=4Uf06g3TJeQD0474xekfOxbvcRtuYRTiwIpNzWrCz6o=; b=xOfIjMjb7H7vOGmppF7ea5vZSKUYUU5gvPj+Z/wmA8CBHzaDWnN+F+O4XkYeeKPFR0 FosTv204EEZAifC+omQ6hUeytwglWfg5cLYnISYeMTUIzuEDy4wRS3WfHELfK38SBh45 FczWsMeDQnz8R1SoW4ELIAT1BRuvBVxTHA5gB61aA5aHsotF9pfPgaCYlkYqvEoeBRBw nLrnswAKh+Nc15hw3FyQYr/yNAe1IP7VEeQS4wC+tp+iY6Z+FQ3ZYfU2rpkexJf6P5Y1 kBaahetjIGWu33sxGh4zGcjS/VqMhnZKxUTlFmivai1qfvx+aZ5M2Ke8ff3X6T7GBzgI 6msw== MIME-Version: 1.0 X-Received: by 10.55.31.10 with SMTP id f10mr84124668qkf.58.1426485528607; Sun, 15 Mar 2015 22:58:48 -0700 (PDT) Received: by 10.140.83.201 with HTTP; Sun, 15 Mar 2015 22:58:48 -0700 (PDT) In-Reply-To: References: Date: Mon, 16 Mar 2015 13:58:48 +0800 Message-ID: Subject: Re: Feedback for tajo-0.10.0 From: Azuryy Yu To: "dev@tajo.apache.org" Content-Type: multipart/alternative; boundary=001a1147ebda35bfd90511618b8b X-Virus-Checked: Checked by ClamAV on apache.org --001a1147ebda35bfd90511618b8b Content-Type: text/plain; charset=UTF-8 Another fix: My test result is unfair during compare Imapla-2.1.2 and Tajo-0.10.0, because I used Parquet with Impala and RCFILE snappy with Tajo. I should use the same file format to compare. because I've got a clear conclusion that Imapala works better on Parquet than Tajo, so I use RCFILE as the test data. *Tajo*: default> select sum (cast(movie_vv as bigint)), sum(cast(movie_cv as bigint)),sum(cast(movie_pt as bigint)) from snappy; Progress: 0%, response time: 1.598 sec Progress: 0%, response time: 1.6 sec Progress: 0%, response time: 2.003 sec Progress: 0%, response time: 2.806 sec Progress: 37%, response time: 3.808 sec Progress: 100%, response time: 4.792 sec ?sum_3, ?sum_4, ?sum_5 ------------------------------- 22557920, 19648838, 2005366694576 (1 rows, 4.792 sec, 32 B selected) *Impala*: > select sum (cast(movie_vv as bigint)), sum(cast(movie_cv as bigint)),sum(cast(movie_pt as bigint)) from snappy; +-------------------------------+-------------------------------+-------------------------------+ | sum(cast(movie_vv as bigint)) | sum(cast(movie_cv as bigint)) | sum(cast(movie_pt as bigint)) | +-------------------------------+-------------------------------+-------------------------------+ | 22557920 | 19648838 | 2005366694576 | +-------------------------------+-------------------------------+-------------------------------+ Fetched 1 row(s) in 11.12s On Mon, Mar 16, 2015 at 1:49 PM, Azuryy Yu wrote: > There is a typo in my Email. I corrected here: > > for example: > > > tajo.master.umbilical-rpc.address > 1-1-1-1:26001 > > > which does work under tajo-0.9.0, but it complain "1-1-1-1:2601" is not a > valid network address under tajo-0.10.0. > > I have to change to: > > tajo.master.umbilical-rpc.address > 1.1.1.1:26001 > > > > On Mon, Mar 16, 2015 at 1:44 PM, Azuryy Yu wrote: > >> Hi, >> I compiled tajo-0.10 source based on hadoop-2.6.0, then post some >> feedback here. >> >> My cluster: >> 1 tajo-master, 9 tajo-worker >> 24 CPU(logic), 64GB mem, 4TB*12 HDD >> >> Feedback: >> 1) tajo task progress estimate is normal on partitioned table, which is >> incorrect sometimes in tajo-0.9.0 >> 2) Tajo configuration doesn't support hostname in tajo-site.xml. >> for example: >> >> >> tajo.master.umbilical-rpc.address >> 1-1-1-1:26001 >> >> >> which does work under tajo-0.9.0, but it complain "1-1-1-1:2601" is not a >> valid network address. >> >> I have to change to: >> >> tajo.master.umbilical-rpc.address >> 1.1.1.1:26001 >> >> >> but we don't use IP in our cluster, only hostname. so I did a little in >> the code: >> org.apache.tajo.validation.NetworkAddressValidator.java: >> hostnamePattern = Pattern.compile("\\d*-\\d*-\\d*-\\d"); >> then It works. >> >> 3) I did some test on the parquet, RCFILE(snappy compressed), >> RCFILE(GZIP compressed) >> >> they are the same data, only different from file format. >> the table has six partitions, 20 RCFILES, each parquet file is 1GB. >> >> then rcfile with snappy's performance is similiar to rcfile with gzip. >> but they are all two~three times better than parquet. >> >> 4) I compared tajo-0.10 and Impala-2.1.2, >> Impala can provide very good support for parquet. more better than Tajo. >> >> but impala is more *slow *with other format than Tajo. >> such as(I don't use WHERE because I want query all six partitions >> together): >> >> *Impala*: >> > select sum (cast(movie_vv as bigint)), sum(cast(movie_cv as >> bigint)),sum(cast(movie_pt as bigint)) from par; >> >> +-------------------------------+-------------------------------+-------------------------------+ >> | sum(cast(movie_vv as bigint)) | sum(cast(movie_cv as bigint)) | >> sum(cast(movie_pt as bigint)) | >> >> +-------------------------------+-------------------------------+-------------------------------+ >> | 22557920 | 19648838 | >> 2005366694576 | >> >> +-------------------------------+-------------------------------+-------------------------------+ >> Fetched 1 row(s) in 6.02s >> >> *Tajo:* >> >> *default*> select sum (cast(movie_vv as bigint)), sum(cast(movie_cv as >> bigint)),sum(cast(movie_pt as bigint)) from snappy; >> Progress: 0%, response time: 1.598 sec >> Progress: 0%, response time: 1.6 sec >> Progress: 0%, response time: 2.003 sec >> Progress: 0%, response time: 2.806 sec >> Progress: 37%, response time: 3.808 sec >> Progress: 100%, response time: 4.792 sec >> ?sum_3, ?sum_4, ?sum_5 >> ------------------------------- >> 22557920, 19648838, 2005366694576 >> (1 rows, 4.792 sec, 32 B selected) >> >> >> >> >> >> >> >> >> >> > --001a1147ebda35bfd90511618b8b--