beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Halperin (JIRA)" <>
Subject [jira] [Commented] (BEAM-1065) FileBasedSource: replace SeekableByteChannel with open(spec, startingPosition)
Date Sat, 03 Dec 2016 02:20:58 GMT


Daniel Halperin commented on BEAM-1065:

More impressions based on your list, and in reverse order :).

3. Yes, giving the source implementation the ability to control the starting office is a clear
win, and can save a seek -- love it! However, this can (and should) be done independent of
any changes to seekability.

2. Two concerns:
    A) I am not certain that a file system that cannot provide a seek can provide an open-at-a-nonzero-offset.
So I'm not so convinced this is a trivial change.
    B) Just because the stream is opened at a specific place does not mean the user would
not want to seek. For example, consider a very efficient reader for PDF files. They have an
index at the beginning, so you know exactly where every page starts. Maybe the "open offset"
would be the start of the file, and then we would immediate seek to the first page in range.
So I think seekability is useful.
  Considering the combination of A/B, I would actually be supportive of the other direction
-- just change the return value of {{open}} to {{SeekableByteChannel}} -- requiring that seek
be supported. I'm not sure we have any examples of filesystems that don't support seeking
in practice.

1. This is true, but (see below) I think that {{SeekableByteChannel}} is still important.

> FileBasedSource: replace SeekableByteChannel with open(spec, startingPosition)
> ------------------------------------------------------------------------------
>                 Key: BEAM-1065
>                 URL:
>             Project: Beam
>          Issue Type: Sub-task
>          Components: sdk-java-core
>            Reporter: Pei He
>            Assignee: Pei He
> FileBasedReader should be able to open the file with the Source.getStartOffset(), and
then read forward to find the first input element.
> The benefits are:
> 1. It is easier to implement a ReadableByteChannel.
> 2. Dynamically splitting won't require file systems to support seeking.
> 3. Doesn't need to seek to position twice, which is what current API does.

This message was sent by Atlassian JIRA

View raw message