flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jingsong Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-11775) Introduce MemorySegmentWritable to let DataOutputView direct copy to internal bytes
Date Thu, 11 Apr 2019 08:28:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-11775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815190#comment-16815190
] 

Jingsong Lee commented on FLINK-11775:
--------------------------------------

I think my goal is to optimize the serialization of BinaryRow, which currently occurs on two
views:

1. AbstractPagedOutputView: In Sort, HashTable, etc.

2. DataOutputSerializer: (Because bytes is saved to byte[] in DataOutputSerializer, it can
be directly copied from MemorySegment.)

Scenario 1: It happened in RecordWriter and is about to be sent to the network. 

Scenario 2: In the serialization of RocksDBValueState.

 

My original intention was to optimize the serialization of BinaryRow on both views.

The current idea is:

Let AbstractPagedOutputView and DataOutputSerializer implement MemorySegmentWritable.

In AbstractPagedOutputView, implement write(MemorySegment segment, int off, int len) to
use MemorySegment.copyTo(MemorySegment)

In DataOutputSerializer, implement write(MemorySegment segment, int off, int len) to use
MemorySegment.get(byte[])

Then in BinaryRowSerializer.serialize(), if the outputView isInstanceOf MemorySegmentWritable,
call write(MemorySegment), or whether it is serialized using the DataOutputView interface.

 

Thanks [~srichter] and [~StephanEwen] and [~pnowojski] for your advice:

1.let DataOutputView implement MemorySegmentWritable is a bad idea. Not every DataOutputView
has the ability to deal directly with MemorySegment.

2.keep MemorySegmentWritable as internal is good. Only our Table can touch it.

 

> Introduce MemorySegmentWritable to let DataOutputView direct copy to internal bytes
> -----------------------------------------------------------------------------------
>
>                 Key: FLINK-11775
>                 URL: https://issues.apache.org/jira/browse/FLINK-11775
>             Project: Flink
>          Issue Type: New Feature
>          Components: Runtime / Operators
>            Reporter: Jingsong Lee
>            Assignee: Jingsong Lee
>            Priority: Major
>
> Blink new binary format is based on MemorySegment.
> Introduce MemorySegmentWritable to let DataOutputView direct copy to internal bytes
> {code:java}
> /**
>  * Provides the interface for write(Segment).
>  */
> public interface MemorySegmentWritable {
>  /**
>  * Writes {@code len} bytes from memory segment {@code segment} starting at offset {@code
off}, in order,
>  * to the output.
>  *
>  * @param segment memory segment to copy the bytes from.
>  * @param off the start offset in the memory segment.
>  * @param len The number of bytes to copy.
>  * @throws IOException if an I/O error occurs.
>  */
>  void write(MemorySegment segment, int off, int len) throws IOException;
> }{code}
>  
> If we want to write a Memory Segment to DataOutputView, we need to copy bytes to byte[]
and then write it in, which is less effective.
> If we let AbstractPagedOutputView have a write(MemorySegment) interface, we can copy
it directly.
> We need to ensure this in network serialization, batch operator calculation serialization,
Streaming State serialization to avoid new byte[] and copy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message