flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] zhijiangW commented on a change in pull request #6417: [FLINK-9913][runtime] Improve output serialization only once in RecordWriter
Date Mon, 13 Aug 2018 09:59:37 GMT
zhijiangW commented on a change in pull request #6417: [FLINK-9913][runtime] Improve output
serialization only once in RecordWriter
URL: https://github.com/apache/flink/pull/6417#discussion_r209551509
 
 

 ##########
 File path: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/api/serialization/RecordSerializer.java
 ##########
 @@ -66,29 +66,33 @@ public boolean isFullBuffer() {
 	}
 
 	/**
-	 * Starts serializing and copying the given record to the target buffer
-	 * (if available).
+	 * Starts serializing the given record to an intermediate data buffer.
 	 *
 	 * @param record the record to serialize
-	 * @return how much information was written to the target buffer and
-	 *         whether this buffer is full
 	 */
-	SerializationResult addRecord(T record) throws IOException;
+	void serializeRecord(T record) throws IOException;
 
 Review comment:
   The previous `RecordSerializer` also confuses me a lot and I have the same experience with
you, because the previous `addRecord` and `continueWritingWithNextBufferBuilder`  methods
can be called in arbitrary sequence and both returned `SerializationResult`.
   
   In my current reconstruction, the method `serializeRecord` must be called firstly, and
then the method `copyToBufferBuilder` is called to return the final `SerializationResult`.
I think it seems a bit clearer than before.
   
   I agree your above idea is good for separating these two methods further. But the `RecordSerializer`
and `SerializedRecord` may be still close with each other. I think there are two ways to realize
`SerializedRecord#copyToBufferBuilder`:
   1. 
   ```
      public SerializedRecord(ByteBuffer lengthBuffer, ByteBuffer dataBuffer) {
   	}
   
       CopingResult copyToBufferBuilder(BufferBuilder bufferBuilder) {
           // copy  lengthBuffer
          // copy dataBuffer
         //  get CopingResult
   	}
   ```
   
   So this way the `SerializedRecord` can only see `lengthBuffer` and `dataBuffer`, and can
not interact with `RecordSerializer`. Maybe we do not need do anything in `SerializedRecord#close()`.
   
   2. 
     ```
    public SerializedRecord(RecordSerializer serializer) {
      }
   
       CopingResult copyToBufferBuilder(BufferBuilder bufferBuilder) {
           serializer.copyToBufferBuilder();
           //  get CopingResult
   	}
   ```
   
   This way the `SerializedRecord` can see and interact with `RecordSerializer`, but the only
difference seems we separate the `SerializedRecord` and `CopyingResult`. And my current implementation
is we hide the `SerializedRecord` and return `SerializationResult` which corresponds to `CopyingResult`
as final result.
   
   What do you think of the above ways?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

Mime
View raw message