drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vitalii Diravka <vitalii.dira...@gmail.com>
Subject Re: Query local files in different machines?
Date Mon, 06 Aug 2018 09:21:59 GMT
If other queries are acceptable, you can use something similar to:
0: jdbc:drill:> select sum(`ROWS`) `TOTAL_NUMBER` from (select count(*) as
`ROWS` from cp.`tpch/nation.parquet` union all select count(*) as `ROWS`
from cp.`tpch/region.parquet`);
+-------------------------+
| TOTAL_NUMBER  |
+-------------------------+
|              30              |
+-------------------------+
1 row selected (0.324 seconds)

Kind regards
Vitalii


On Sun, Aug 5, 2018 at 9:56 PM 王亮 <wangliang.f@gmail.com> wrote:

> Hi all,
>
> I have apache HTTP server logs in different machines and want to query
> these log files.
>
> So I  install the drill (distributed mode) in these machines, for example,
> node1,node2.
>
> I use  this command:
> sqlline –u jdbc:drill:zk:node1,node2
> or
> sqlline –u jdbc:drill:drillbit:node1,node2
>
> then input query like: select count(*) from dfs.`/apache/logs/access_log`
> I could only get the data of one machine.
>
> Maybe I can upload all logs file to s3 or Hadoop.
> But is there an easy way to query all local files in different machines by
> drill?
>
> If we need develop the new features to support this requirement, How much
> work we should do?  for example, only revise the physical plan distribution
> codes? or need write the completely new data source plugin?
>
> I found these discussions, but seems no clear answer.
>
>
> https://stackoverflow.com/questions/29365320/apache-drill-in-distributed-mode
>
> http://mail-archives.apache.org/mod_mbox/drill-user/201506.mbox/thread
>
>
> https://stackoverflow.com/questions/33952568/how-to-configure-drill-to-use-all-the-nodes-for-a-query-by-creating-multiple-fr
>
> Thanks,
>
> Wang Liang
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message