drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Matta <cma...@mapr.com>
Subject Re: Connecting to multiple S3 locations
Date Thu, 26 Mar 2015 21:11:30 GMT
You can specify multiple “file” type storage plugins. One for each S3
instance, then join them.

My storage plugins have the auth keys in the “file” specification
configuration (not super secure, I know):

s3test1:

{
  "type": "file",
  "enabled": true,
  "connection": "s3n://<accesskey>:<secret>@cmatta",
  "workspaces": {
  },
  "formats": {
    "psv": {
      "type": "text",
      "extensions": [
        "tbl"
      ],
      "delimiter": "|"
    },
    "csv": {
      "type": "text",
      "extensions": [
        "csv"
      ],
      "delimiter": ","
    },
    "tsv": {
      "type": "text",
      "extensions": [
        "tsv"
      ],
      "delimiter": "\t"
    },
    "parquet": {
      "type": "parquet"
    },
    "json": {
      "type": "json"
    }
  }
}

s3test2:

{
  "type": "file",
  "enabled": true,
  "connection": "s3n://<accesskey>:<secret>@cmatta-test1",
  "workspaces": {
    "root": {
      "location": "/yelp",
      "writable": false,
      "defaultInputFormat": null
    }
  },
  "formats": {
    "psv": {
      "type": "text",
      "extensions": [
        "tbl"
      ],
      "delimiter": "|"
    },
    "csv": {
      "type": "text",
      "extensions": [
        "csv"
      ],
      "delimiter": ","
    },
    "tsv": {
      "type": "text",
      "extensions": [
        "tsv"
      ],
      "delimiter": "\t"
    },
    "parquet": {
      "type": "parquet"
    },
    "json": {
      "type": "json"
    }
  }
}

And I have the yelp_academic_dataset_business.json file in s3test1 and the
yelp_academic_dataset_review.json in s3test2 and the join seems to work
okay:

0: jdbc:drill:zk=172.16.1.175:5181,172.16.1.1> select a.`name`,
b.`text` FROM s3test1.`default`.`yelp1/yelp_academic_dataset_business.json`
a JOIN s3test2.`default`.`yelp/yelp_academic_dataset_review.json` b ON
a.`business_id` = b.`business_id` limit 1;
+------------+------------+
|    name    |    text    |
+------------+------------+
| Thai Pan Fresh Exotic Cuisine | Lately i have been feeling homesick
for asian food and been hitting up places that i haven't been to in
awhile.  Recently re-visited Thai Pan for a quick lunch and quickly
ordered without spending too much time perusing the menu.  It looked
more diverse than I remembered including some Vietnamese additions.  I
remembered the curries and stir-fry dishes were ok but nothing really
memorable.  A quick summary for my latest visit:

Pros:
- convenient order-at-the counter setup
- self-serve drink station
- brown and white rice mixture
- friendly and gracious owners

Cons:
- too much napa cabbage in comparison to green vegetables
- wish the owner/chef would be back in the kitchen vs. managing
- spice level on the weak side |
+------------+------------+
1 row selected (6.606 seconds)

To store the access keys and secrets in core-site.xml would require
different names for the properties, but I don’t know how to reference them
from the drill storage configs. If anyone would like to chime in on how to
reference entries in core-site.xml from the drill storage configs, that
would be helpful.

On Thursday, March 26, 2015, Ganesha Muthuraman <mganesh123@outlook.com>
wrote:

All,
> Does anyone know how to connect to multiple S3 locations using Drill (with
> different set of secret and access keys)? how does core-site.xml look like
> in that case? My intention is to join a two different files in these two S3
> locations with different S3 credentails.
> Ganesh

​

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message