drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Kovacevic <ivankov....@gmail.com>
Subject Drill data and database locality
Date Sat, 10 Jun 2017 17:02:18 GMT
Greetings,

I have two servers. The first server (A) contains the zookeeper, a mongodb
database and a drillbit. The second server (B) contains a hadoop
distribution with several hive tables, a postgresql database and the other
drillbit. Both drillbits can see eachother on the drill homepages, since
they are both connected to the zookeeper on server A. When a query like the
one on the bottom is run (the view dfs.lineorder consists of the hive and
postgresql data on server B), drillbit B becomes the foreman and executes
everything. It does not delegate the parts dealing with mongodb tables to
the drillbit on server A, so 90% of the time required to run the query
(approx. 30 minutes) is spent on sending the mongodb collection from server
A to server B. Is there a way to force drillbit B to delegate the mongodb
part of the query to drillbit B, or could this be a misconfiguration
problem?
Also, can drillbits be configured to access different databases (e.g. when
each drillbit has access to a distinct subnet, so that not all drillbits
can access the same databases)?
Thank you.

Sample query:
SELECT SUM(revenue) AS revenue
FROM (
    SELECT SUM(lo_extendedprice*lo_discount) AS revenue
    FROM dfs.tmp.lineorder, dfs.tmp.`date`
    WHERE lo_orderdate = d_datekey
    AND d_year = 1993
    AND lo_discount BETWEEN 1 AND 3
    AND lo_quantity < 25
UNION ALL
    SELECT SUM(lo_extendedprice * lo_discount) AS revenue
    FROM mongo.test.ssb_europe ssb
    WHERE ssb.orderdate.d_year = 1993
    AND lo_discount BETWEEN 1 AND 3
    AND lo_quantity < 25
);

Configuration (drill-override.conf):

On drillbit A (Windows Server 2008):

drill.exec: {
  cluster-id: "drillbits1",
  zk.connect: "serverA:2181",
impersonation: {
    enabled: true,
    max_chained_user_hops: 3
  }
}

On drillbit B (Cloudera CDH 5.8.0):
drill.exec: {
  cluster-id: "drillbits1",
  zk.connect: "serverA:2181",
  impersonation: {
    enabled: true,
    max_chained_user_hops: 3
  }
}


Best regards,
Ivan

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message