flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jingsong Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-11775) Introduce MemorySegmentWritable to let DataOutputView direct copy to internal bytes
Date Thu, 11 Apr 2019 08:28:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-11775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815190#comment-16815190

Jingsong Lee commented on FLINK-11775:

I think my goal is to optimize the serialization of BinaryRow, which currently occurs on two

1. AbstractPagedOutputView: In Sort, HashTable, etc.

2. DataOutputSerializer: (Because bytes is saved to byte[] in DataOutputSerializer, it can
be directly copied from MemorySegment.)

Scenario 1: It happened in RecordWriter and is about to be sent to the network. 

Scenario 2: In the serialization of RocksDBValueState.


My original intention was to optimize the serialization of BinaryRow on both views.

The current idea is:

Let AbstractPagedOutputView and DataOutputSerializer implement MemorySegmentWritable.

In AbstractPagedOutputView, implement write(MemorySegment segment, int off, int len) to
use MemorySegment.copyTo(MemorySegment)

In DataOutputSerializer, implement write(MemorySegment segment, int off, int len) to use

Then in BinaryRowSerializer.serialize(), if the outputView isInstanceOf MemorySegmentWritable,
call write(MemorySegment), or whether it is serialized using the DataOutputView interface.


Thanks [~srichter] and [~StephanEwen] and [~pnowojski] for your advice:

1.let DataOutputView implement MemorySegmentWritable is a bad idea. Not every DataOutputView
has the ability to deal directly with MemorySegment.

2.keep MemorySegmentWritable as internal is good. Only our Table can touch it.


> Introduce MemorySegmentWritable to let DataOutputView direct copy to internal bytes
> -----------------------------------------------------------------------------------
>                 Key: FLINK-11775
>                 URL: https://issues.apache.org/jira/browse/FLINK-11775
>             Project: Flink
>          Issue Type: New Feature
>          Components: Runtime / Operators
>            Reporter: Jingsong Lee
>            Assignee: Jingsong Lee
>            Priority: Major
> Blink new binary format is based on MemorySegment.
> Introduce MemorySegmentWritable to let DataOutputView direct copy to internal bytes
> {code:java}
> /**
>  * Provides the interface for write(Segment).
>  */
> public interface MemorySegmentWritable {
>  /**
>  * Writes {@code len} bytes from memory segment {@code segment} starting at offset {@code
off}, in order,
>  * to the output.
>  *
>  * @param segment memory segment to copy the bytes from.
>  * @param off the start offset in the memory segment.
>  * @param len The number of bytes to copy.
>  * @throws IOException if an I/O error occurs.
>  */
>  void write(MemorySegment segment, int off, int len) throws IOException;
> }{code}
> If we want to write a Memory Segment to DataOutputView, we need to copy bytes to byte[]
and then write it in, which is less effective.
> If we let AbstractPagedOutputView have a write(MemorySegment) interface, we can copy
it directly.
> We need to ensure this in network serialization, batch operator calculation serialization,
Streaming State serialization to avoid new byte[] and copy.

This message was sent by Atlassian JIRA

View raw message