drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ganesha Muthuraman <mganesh...@outlook.com>
Subject RE: Connecting to multiple S3 locations
Date Fri, 27 Mar 2015 03:39:23 GMT
Cool. Thank you Christopher ! That helps.
Regards,Ganesh

> From: cmatta@mapr.com
> Date: Thu, 26 Mar 2015 17:11:30 -0400
> Subject: Re: Connecting to multiple S3 locations
> To: user@drill.apache.org
> 
> You can specify multiple “file” type storage plugins. One for each S3
> instance, then join them.
> 
> My storage plugins have the auth keys in the “file” specification
> configuration (not super secure, I know):
> 
> s3test1:
> 
> {
>   "type": "file",
>   "enabled": true,
>   "connection": "s3n://<accesskey>:<secret>@cmatta",
>   "workspaces": {
>   },
>   "formats": {
>     "psv": {
>       "type": "text",
>       "extensions": [
>         "tbl"
>       ],
>       "delimiter": "|"
>     },
>     "csv": {
>       "type": "text",
>       "extensions": [
>         "csv"
>       ],
>       "delimiter": ","
>     },
>     "tsv": {
>       "type": "text",
>       "extensions": [
>         "tsv"
>       ],
>       "delimiter": "\t"
>     },
>     "parquet": {
>       "type": "parquet"
>     },
>     "json": {
>       "type": "json"
>     }
>   }
> }
> 
> s3test2:
> 
> {
>   "type": "file",
>   "enabled": true,
>   "connection": "s3n://<accesskey>:<secret>@cmatta-test1",
>   "workspaces": {
>     "root": {
>       "location": "/yelp",
>       "writable": false,
>       "defaultInputFormat": null
>     }
>   },
>   "formats": {
>     "psv": {
>       "type": "text",
>       "extensions": [
>         "tbl"
>       ],
>       "delimiter": "|"
>     },
>     "csv": {
>       "type": "text",
>       "extensions": [
>         "csv"
>       ],
>       "delimiter": ","
>     },
>     "tsv": {
>       "type": "text",
>       "extensions": [
>         "tsv"
>       ],
>       "delimiter": "\t"
>     },
>     "parquet": {
>       "type": "parquet"
>     },
>     "json": {
>       "type": "json"
>     }
>   }
> }
> 
> And I have the yelp_academic_dataset_business.json file in s3test1 and the
> yelp_academic_dataset_review.json in s3test2 and the join seems to work
> okay:
> 
> 0: jdbc:drill:zk=172.16.1.175:5181,172.16.1.1> select a.`name`,
> b.`text` FROM s3test1.`default`.`yelp1/yelp_academic_dataset_business.json`
> a JOIN s3test2.`default`.`yelp/yelp_academic_dataset_review.json` b ON
> a.`business_id` = b.`business_id` limit 1;
> +------------+------------+
> |    name    |    text    |
> +------------+------------+
> | Thai Pan Fresh Exotic Cuisine | Lately i have been feeling homesick
> for asian food and been hitting up places that i haven't been to in
> awhile.  Recently re-visited Thai Pan for a quick lunch and quickly
> ordered without spending too much time perusing the menu.  It looked
> more diverse than I remembered including some Vietnamese additions.  I
> remembered the curries and stir-fry dishes were ok but nothing really
> memorable.  A quick summary for my latest visit:
> 
> Pros:
> - convenient order-at-the counter setup
> - self-serve drink station
> - brown and white rice mixture
> - friendly and gracious owners
> 
> Cons:
> - too much napa cabbage in comparison to green vegetables
> - wish the owner/chef would be back in the kitchen vs. managing
> - spice level on the weak side |
> +------------+------------+
> 1 row selected (6.606 seconds)
> 
> To store the access keys and secrets in core-site.xml would require
> different names for the properties, but I don’t know how to reference them
> from the drill storage configs. If anyone would like to chime in on how to
> reference entries in core-site.xml from the drill storage configs, that
> would be helpful.
> 
> On Thursday, March 26, 2015, Ganesha Muthuraman <mganesh123@outlook.com>
> wrote:
> 
> All,
> > Does anyone know how to connect to multiple S3 locations using Drill (with
> > different set of secret and access keys)? how does core-site.xml look like
> > in that case? My intention is to join a two different files in these two S3
> > locations with different S3 credentails.
> > Ganesh
> 
> ​
 		 	   		  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message