hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yuzhou Sun (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-15364) Add support for S3 Select to S3A
Date Tue, 01 May 2018 19:57:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16460069#comment-16460069
] 

Yuzhou Sun commented on HADOOP-15364:
-------------------------------------

Hello Steve, about “For wider use we'll need to implement the HADOOP-15229 so that callers
can pass down the expression along with any other parameters”, do you mean writing a FSDataInputStreamBuilder
class similar to FSDataOutputStreamBuilder, which can take customized options through `opt`
or `must` method, and add a new method to FileSystem like
{code:java}
public FSDataInputStreamBuilder openFile(Path path) {
  return new FileSystemDataInputStreamBuilder(this, path).create();
}
{code}
And after that people can use it as
{code:java}
InputStream in = fs.openFile(path)
   .must("query", " SELECT s.entityId FROM S3OBJECT s WHERE s.cloudCover = '0.0' ")
   .build();
{code}
Do I understand it correctly? Thank you.

> Add support for S3 Select to S3A
> --------------------------------
>
>                 Key: HADOOP-15364
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15364
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>         Attachments: HADOOP-15364-001.patch, HADOOP-15364-002.patch
>
>
> Expect a PoC patch for this in a couple of days; 
> * it'll depend on an SDK update to work, plus a couple of of other minor changes
> * Adds command line option too 
> {code}
> hadoop s3guard select -header use -compression gzip -limit 100 s3a://landsat-pds/scene_list.gz"
\
> "SELECT s.entityId FROM S3OBJECT s WHERE s.cloudCover = '0.0' "
> {code}
> For wider use we'll need to implement the HADOOP-15229 so that callers can pass down
the expression along with any other parameters



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message