jclouds-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrea Turli (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (JCLOUDS-1187) Avoid excessive memory usage when processing massive http response message
Date Wed, 23 Nov 2016 15:56:58 GMT

     [ https://issues.apache.org/jira/browse/JCLOUDS-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Andrea Turli resolved JCLOUDS-1187.
    Resolution: Fixed

> Avoid excessive memory usage when processing massive http response message
> --------------------------------------------------------------------------
>                 Key: JCLOUDS-1187
>                 URL: https://issues.apache.org/jira/browse/JCLOUDS-1187
>             Project: jclouds
>          Issue Type: Improvement
>          Components: jclouds-core
>    Affects Versions: 1.9.2
>            Reporter: Aled Sage
>             Fix For: 2.0.0
> With jclouds 1.9.3-SNAPSHOT and 2.0.0-SNAPSHOT (and all GA versions)...
> If jclouds receives a crazily big http response (e.g. see https://github.com/jclouds/jclouds/pull/1020),
then jclouds consumes a huge amount of memory while processing that response. It holds multiple
copies of the response in-memory at the same time.
> As reported in https://issues.apache.org/jira/browse/BROOKLYN-364, the memory usage (according
to {{jmap -histo:live <pid>}}) is most char arrays (which is what backs the StringBuilder
and String). When processing a 154MB response payload, the memory used by char arrays goes
something like this: 66MB -> 104MB -> 323MB -> 625MB -> 939MB -> 461MB ->
633MB -> 327MB -> 21MB.
> (I don't know how much to believe the {{:live}}: is that definitely the size of char
arrays that could not be GC'ed?)
> There are two main areas where this memory is consumed.
> *Wire Log*
> In the jclouds wire log, the following happens (approximately?!):
> * We read the response input stream, writing it to a {{FileBackedOutputStream}} - nice!
> * In {{Wire.wire(String, InputStream)}}, we read the {{FileBackedOutputStream}} into
a {{StringBuilder}}, and then call {{getWireLog().debug(buffer.toString())}}:  
>   The StringBuilder holds the {{char[]}}; the toString() duplicates it - so two copies
in memory.  
>   Unfortunately in the Softlayer example, it's all one huge line, so we logged it all
in one go.  
>   I think (but need to dig into it more) that the logging framework (slf4j -> log4j
in my case) ends up with multiple copies as well, while processing the call to {{log.debug(String)}}.
(Hence peaking at 939MB of char arrays in memory).
>   When the method returns, all these copies can be GC'ed.
> * The response payload has now been switched to the {{FileBackedOutputStream}}, so that
will be used subsequently.
> *Json Parsing*
> To parse the HTTP response:
> * The response is passed to {{org.jclouds.http.functions.ParseJson.apply(HttpResponse)}}
> * This calls {{json.fromJson(Strings2.toStringAndClose(stream), type)}}.  
> * The {{Strings2.toStringAndClose}} calls {{CharStreams.toString(new InputStreamReader(input,
>   This reads the stream into a StringBuilder, then calls toString - so holds two copies
in memory.  
>   This explains the second spike in memory usage (though I'm surprised it went as high
as 633MB of char arrays in memory).  
>   When the method returns, we have our one String.  
> *Possible Improvements to Wire Log*
> {{org.jclouds.logging.internal.Wire}} could be configurable to only log the first X bytes
of a response (so crazily long messages would be truncated in the log).
> Alternatively/additionally, {{Wire.wire}} could force splitting a huge line into multiple
log messages when calling {{getWireLog().debug(buffer.toString())}}. Again this could be configurable.
> In production usage, I personally would always configure it to truncate: better to miss
the end of the response rather than risk an {{OutOfMemoryError}}. Note this particular part
isn't an issue if jclouds.wire is set to INFO or higher.
> *Possible Improvements to Json Parsing*
> I think {{ParseJson.apply(InputStream stream, Type type)}} should pass the {{new InputStreamReader(inputStream)}}
to {{org.jclouds.json.Json.fromJson()}} (in a try-finally to close the stream, obviously).
> This would require adding a new method to the interface {{org.jclouds.json.Json}}.
> The implementation of {{GsonWrapper}} can handle that easily: it can call {{gson.fromJson(Reader,
Type)}} instead of {{gson.fromJson(String, Type)}}. It looks to me like that will parse the
stream as it reads it, rather than having to hold the whole string in memory at once.
> If we do these improvements, I don't think we'll ever hold the full 154MB char array
in memory.

This message was sent by Atlassian JIRA

View raw message