flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: DelimitedInputFormat reads entire buffer when splitLength is 0
Date Fri, 10 Jul 2015 16:55:12 GMT
Hi Robert!

This clearly sounds like unintended behavior. Thanks for reporting this.

Apparently, the 0 line length was supposed to have a double meaning, but it
goes haywire in this case.

Let me try to come with a fix for this...


On Fri, Jul 10, 2015 at 6:05 PM, Robert Schmidtke <ro.schmidtke@gmail.com>

> Hey everyone,
> I just noticed that when processing input splits from a
> DelimitedInputFormat (specifically, I have a text file with words in it),
> that if the splitLength is 0, the entire readbuffer is filled (see
> https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/common/io/DelimitedInputFormat.java#L577).
> I'm using XtreemFS as underlying file system, which stripes files in blocks
> of 128kb across storage servers. I have 8 physically separate nodes, and my
> input file is 1MB, such that each node stores 128kb of data. This is
> reported accurately to Flink (e.g. split sizes and hostnames). Now when the
> splitLength is 0 at some point during processing (which it will become
> eventually), the entire file is read in again, which kind of defeats the
> point of processing a split of length 0. Is this intended behavior? I've
> tried multiple hot-fixes, but they ended up in the file not bein read in
> its entirety. I would like to know the rationale behind this
> implementation, and maybe figure out a way around it. Thanks in advance,
> Robert
> --
> My GPG Key ID: 336E2680

View raw message