drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Charles Givre <cgi...@gmail.com>
Subject Re: Drill storage plugin for IPFS, any suggestion is welcome :)
Date Sun, 07 Jul 2019 14:24:43 GMT
Hi Wang, 
This looks interesting.  Would you consider submitting this as a PR once you are satisfied
with the performance?

> On Jul 6, 2019, at 5:31 AM, 王亮 <wangliang.f@gmail.com> wrote:
> Hi all,
> After reading that excellent book "Learning Apache Drill: Query and Analyze
> Distributed Data Sources with SQL", my classmate and I also wanted to write
> a Drill storage plugin. We found most DFS and NFS have been supported by
> Drill, so we chose a relatively new and promising distributed file system,
> So we built Minerva, a Drill storage plugin that connects IPFS's
> decentralized storage and Drill's flexible query engine. Any data file
> stored on IPFS can be easily accessed from Drill's query interface, just
> like a file stored on a local disk. The basic idea is very simple: run a
> Drill instance along the IPFS daemon, and you can connect to other users on
> IPFS who are also using Minerva. If one of the users happens to have stored
> the file you are trying to query, then Drill can send execution plan to
> that node, who executes the operations locally and returns the results
> back. Of course, other users can benefit from your node as well, if you are
> sharing the data they want. If there are enough people running Minerva,
> data sharing and querying can be made distributed and more efficient!
> The query process is as follows:
> 0 The user inputs an SQL statement, referencing a file on IPFS by its CID;
> 1 The Foreman resolves the CIDs of the "pieces" of the data file, as well
> as the IPFS providers of these pieces, by querying the DHT of IPFS;
> 2 The Foreman distributes jobs to drillbits running on the providers.
> 3 Drillbits on the providers read data from the piece of file on their
> local disk, perform any necessary relational operations, and return results
> to the Foreman.
> 4 The Foreman returns the results to the user.
> Thanks to the modular design of Drill, we could rather "easily" write this
> storage plugin. Now this plugin supports basic query operations, both read
> and write, but only works with json and csv files. It is not very stable
> for now, and the performance is still poor, mainly because it takes to too
> long to do DHT queries on IPFS. We are trying to improve these problems in
> the future.
> If you are insterested, we have made a few slides that explain the ideas in
> details:
> https://www.slideshare.net/BowenDing4/minerva-ipfs-storage-plugin-for-ipfs
> Any suggestion is welcome. ^_^
> Find the code on GitHub: https://github.com/bdchain/Minerva
> Best,
> Wang Liang

View raw message