hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAPREDUCE-7182) MapReduce input format/record readers to support S3 select queries
Date Mon, 04 Feb 2019 17:33:00 GMT
Steve Loughran created MAPREDUCE-7182:

             Summary: MapReduce input format/record readers to support S3 select queries
                 Key: MAPREDUCE-7182
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7182
             Project: Hadoop Map/Reduce
          Issue Type: New Feature
          Components: mrv2
    Affects Versions: 3.3.0
            Reporter: Steve Loughran

HADOOP-15229 adds S3 select through the (new) async openFile API, but the classic RecordReader
&c can't handle it because

# the files are shorter than they are in a getFileStatus, and the readers assume that an EOFException
is an error in that situation
# everything assumes plain text is splitable
# if a file has a gz extension, the gunzip codec should be used. So breaks transcoded/uncompressed

to handle s3 select data sources  we need to be able to address them, either through changes
to the existing code (danger?) or some new readers

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org

View raw message