beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kenneth Knowles (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-3572) Reduce inefficient allocations in coders
Date Tue, 30 Jan 2018 22:20:00 GMT

    [ https://issues.apache.org/jira/browse/BEAM-3572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16345912#comment-16345912
] 

Kenneth Knowles commented on BEAM-3572:
---------------------------------------

I think I see what you mean in terms of excess allocation. The buffering was added as an optimization
:-)

While the {{Coder}} itself should be observably immutable, there is no problem with mutation
under the hood to manage a pool of buffers. The real issue, which you alluded to, is that
coders are required to be thread safe. The reason that {{BufferedElementCountingOutputStream}}
can be used despite lack of thread safety is that it is only local.

Having either {{IterableLikeCoder}} or {{BufferedElementCountingOutputStream}} do their own
suballocation makes sense, with the usual caveats of bugs and leaks from that sort of code.
Definitely better encapsulation for {{BufferedElementCountingOutputStream}} to own it unless
it doesn't have enough info to do it well. I'm willing to trust that you came to this because
you actually hit this in practice, or are at least driven by a benchmark.

> Reduce inefficient allocations in coders
> ----------------------------------------
>
>                 Key: BEAM-3572
>                 URL: https://issues.apache.org/jira/browse/BEAM-3572
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-core
>            Reporter: Bill Neubauer
>            Assignee: Bill Neubauer
>            Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> BufferedElementCountingOutputStream's constructor allocates a new buffer to wrap the
input OutputStream. This gets called on each invocation of encode() from IterableLikeCoder.
Since Coder is designed to be stateless, but this buffer holds state and isn't threadsafe,
we can't just have the caller manage the buffer. Modifying the constructor to use a pool of
buffers to reduce the number of allocations will help performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message