drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Updike, Clark" <Clark.Upd...@jhuapl.edu>
Subject Drill querying for non-existing objects on S3 interface
Date Mon, 15 Jun 2020 21:57:46 GMT
Does Drill normally query an S3 interface for object that might not exist? Observe for the
following setup:

Storage plugin name: s3-quobyte
Bucket name: drill-bucket
File Under Query: drill-bucket/0_0_0.parquet

Storage plugin highlights:
    "    connection": "s3a://drill-bucket",
    ...
       "root": {
          "location": "/",

Drill interaction... Can successfully "USE" the storage plugin:

    apache drill> use `s3-quobyte`.root;
    +--    ----+---------------------------------------------+
    |  ok  |                   summary                   |
    +------+---------------------------------------------+
    | true | Default schema changed to [s3-quobyte.root] |
    +------+---------------------------------------------+
    1 row selected (0.143 seconds)

But a query of the file hangs:

    select * from `s3-quobyte`.root.`0_0_0.parquet` limit 2;

On the S3 backend, we see the following requests for non-existing objects:

   /drill-bucket/s3-quobyte.view.drill: NoSuchKey(404
   /drill-bucket/s3-quobyte: NoSuchKey(404
   /drill-bucket/user/<usersname>/0_0_0.parquet.stats.drill/: NoSuchKey(404
   /drill-bucket/user/<usersname>/0_0_0.parquet.stats.drill: NoSuchKey(404

Are these spurious requests benign and expected, or, are they unexpected and likely the source
of the query hanging?
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message