flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Richter (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (FLINK-11775) Introduce MemorySegmentWritable to let DataOutputView direct copy to internal bytes
Date Thu, 11 Apr 2019 08:00:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-11775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815161#comment-16815161
] 

Stefan Richter edited comment on FLINK-11775 at 4/11/19 7:59 AM:
-----------------------------------------------------------------

[~StephanEwen] I found that the implementation in {{AbstractPagedOutputView}} was already
somehow recently merged as part of a >4k lines PR via FLINK-11856, for now without introducing
an interface. It is using {{Segment#copyTo}} which looks good to work for all kinds of segment
implementations

We could consider two alternatives, something in the direction of Piotrs proposal or casting
a special {{DataInputView}} for an optimized path in the already existing method of {{DataOutputView}}.
One concern is, that according to the docs {{DataOutputView}} is already supposed to be the
interface to interact with memory segments. So here we bypass that abstraction. So might reconsider
if the abstraction still fits or (probably applicable here) consider this new interface a
secondary, lower level access interface available to code that already operates on the {{MemorySegment}}
level. But in that case, I wonder what is currently the benefit of introducing an interface
over only implementing this on {{AbstractPagedOutputView}}, as currently done.


was (Author: srichter):
[~StephanEwen] I found that the implementation in {{AbstractPagedOutputView}} was already
somehow recently merged as part of a >4k lines PR via FLINK-11856, for now without introducing
an interface. It is using {{Segment#copyTo}} which looks good to work for all kinds of segment
implementations

We could consider two alternatives, something in the direction of Piotrs proposal or casting
a special {{DataInputView}} for an optimized path in the already existing method of {{DataOutputView}}.
One concern is, that according to the docs {{DataOutputView}} is already supposed to be the
interface to interact with memory segments. So here we bypass that abstraction. So might reconsider
if the abstraction still fits or (probably applicable here) consider this new interface a
secondary, lower level access interface available to code that already operates on the {{MemorySegment}}
level.

> Introduce MemorySegmentWritable to let DataOutputView direct copy to internal bytes
> -----------------------------------------------------------------------------------
>
>                 Key: FLINK-11775
>                 URL: https://issues.apache.org/jira/browse/FLINK-11775
>             Project: Flink
>          Issue Type: New Feature
>          Components: Runtime / Operators
>            Reporter: Jingsong Lee
>            Assignee: Jingsong Lee
>            Priority: Major
>
> Blink new binary format is based on MemorySegment.
> Introduce MemorySegmentWritable to let DataOutputView direct copy to internal bytes
> {code:java}
> /**
>  * Provides the interface for write(Segment).
>  */
> public interface MemorySegmentWritable {
>  /**
>  * Writes {@code len} bytes from memory segment {@code segment} starting at offset {@code
off}, in order,
>  * to the output.
>  *
>  * @param segment memory segment to copy the bytes from.
>  * @param off the start offset in the memory segment.
>  * @param len The number of bytes to copy.
>  * @throws IOException if an I/O error occurs.
>  */
>  void write(MemorySegment segment, int off, int len) throws IOException;
> }{code}
>  
> If we want to write a Memory Segment to DataOutputView, we need to copy bytes to byte[]
and then write it in, which is less effective.
> If we let AbstractPagedOutputView have a write(MemorySegment) interface, we can copy
it directly.
> We need to ensure this in network serialization, batch operator calculation serialization,
Streaming State serialization to avoid new byte[] and copy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message