drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Padma Penumarthy <penumarthy.pa...@gmail.com>
Subject Re: Query local files in different machines?
Date Sun, 05 Aug 2018 23:22:48 GMT
you need DFS i.e. Hadoop with global file system namespace provided by
NameNode.
Planning is done by single drill node,  which is the foreman for the query.
It will look for files through file system API. Local file system can know
only about files on that node.
So, I don't  think what you want to do is possible.

Thanks
Padma




On Sun, Aug 5, 2018 at 1:55 AM, 王亮 <wangliang.f@gmail.com> wrote:

> Hi all,
>
> I have apache HTTP server logs in different machines and want to query
> these log files.
>
> So I  install the drill (distributed mode) in these machines, for example,
> node1,node2.
>
> I use  this command:
> sqlline –u jdbc:drill:zk:node1,node2
> or
> sqlline –u jdbc:drill:drillbit:node1,node2
>
> then input query like: select count(*) from dfs.`/apache/logs/access_log`
> I could only get the data of one machine.
>
> Maybe I can upload all logs file to s3 or Hadoop.
> But is there an easy way to query all local files in different machines by
> drill?
>
> If we need develop the new features to support this requirement, How much
> work we should do?  for example, only revise the physical plan distribution
> codes? or need write the completely new data source plugin?
>
> I found these discussions, but seems no clear answer.
>
> https://stackoverflow.com/questions/29365320/apache-
> drill-in-distributed-mode
>
> http://mail-archives.apache.org/mod_mbox/drill-user/201506.mbox/thread
>
> https://stackoverflow.com/questions/33952568/how-to-
> configure-drill-to-use-all-the-nodes-for-a-query-by-creating-multiple-fr
>
> Thanks,
>
> Wang Liang
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message