Does anyone kindly give me some answer? Thanks On Wed, Mar 4, 2015 at 6:12 PM, Azuryy Yu wrote: > Hi, > > I read theTajo-0.9.0 source code, I found Tajo using a simple FIFO > scheduler, > > I accept this in the current stage. but when Tajo peek a query from the > scheduler queue, then allocate workers for this query, > > Allocator only consider availale resource on a random worker list, then > specify a set of workers. > > 1) > so My question is why we don't consider HDFS locatbility? otherwise > network will be the bottleneck. > > I understand Tajo don't use YARN as a scheduler currently. and write a > temporary simple FIFO scheduler. and I am also looked at > https://issues.apache.org/jira/browse/TAJO-540 , I hope new Tajo > scheduler similar to Sparrow. > > 2) performance related. > I setup 10 nodes clusters, (1 master, 9 workers) > > 64GB mem, 24CPU, 12*4TB HDD, 1.6GB test data.(160 million records) > > It's works good for some agg sql tests except count(distinct) > > count(distinct) is very slow - ten minutes. > > who can give me a simple explanation of how Tajo works with > count(distinct), I can share my tajo-site here: > > > > tajo.rootdir > hdfs://realtime-cluster/tajo > > > > > tajo.master.umbilical-rpc.address > xx:26001 > > > tajo.master.client-rpc.address > xx:26002 > > > tajo.master.info-http.address > xx:26080 > > > tajo.resource-tracker.rpc.address > xx:26003 > > > tajo.catalog.client-rpc.address > xx:26005 > > > > tajo.worker.tmpdir.locations > > file:///data/hadoop/data1/tajo,file:///data/hadoop/data2/tajo,file:///data/hadoop/data3/tajo,file:///data/hadoop/data4/tajo,file:///data/hadoop/data5/tajo,file:///data/hadoop/data6/tajo,file:///data/hadoop/data7/tajo,file:///data/hadoop/data8/tajo,file:///data/hadoop/data9/tajo,file:///data/hadoop/data10/tajo,file:///data/hadoop/data11/tajo,file:///data/hadoop/data12/tajo > > > tajo.worker.tmpdir.cleanup-at-startup > true > > > tajo.worker.history.expire-interval-minutes > 60 > > > tajo.worker.resource.tajo.worker.resource.cpu-cores > 24 > > > tajo.worker.resource.memory-mb > 60512 > > > tajo.task.memory-slot-mb.default > 3000 > > > tajo.task.disk-slot.default > 1.0f > > > tajo.shuffle.fetcher.parallel-execution.max-num > 5 > > > tajo.executor.external-sort.thread-num > 2 > > > > tajo.rpc.client.worker-thread-num > 4 > > > tajo.cli.print.pause > false > > > > > > tajo-env: > > export TAJO_WORKER_HEAPSIZE=60000 > > > > > > > > >