jclouds-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ignasi Barrera (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (JCLOUDS-1187) Avoid excessive memory usage when processing massive http response message
Date Sun, 23 Oct 2016 18:08:58 GMT

    [ https://issues.apache.org/jira/browse/JCLOUDS-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15600074#comment-15600074
] 

Ignasi Barrera commented on JCLOUDS-1187:
-----------------------------------------

Thanks for the detailed report [~aled.sage]. It is pretty much appreciated.

I agree with Andrea and would say that wire logs at debug level are not meant for production.
They are too verbose and often carry sensitive information. Apart from that, they may be convenient
for compute apis, but when you use the blob store providers, payloads usually contain binary
data that just pollutes the logs. Wire logs are fine for debugging purposes during short amounts
of time, but should be generally disabled. Most logging frameworks support hot reloading of
the configuration file, so enabling the wire logs can be done in most cases even without stopping
the application.

I've opened a PR with your suggested improvement to the default json serialization: https://github.com/jclouds/jclouds/pull/1025
Could you give it a try and report back if disabling the wire logs and applying the patch
fix the issue? This would be a nice to have for 2.0.



> Avoid excessive memory usage when processing massive http response message
> --------------------------------------------------------------------------
>
>                 Key: JCLOUDS-1187
>                 URL: https://issues.apache.org/jira/browse/JCLOUDS-1187
>             Project: jclouds
>          Issue Type: Improvement
>          Components: jclouds-core
>    Affects Versions: 1.9.2
>            Reporter: Aled Sage
>
> With jclouds 1.9.3-SNAPSHOT and 2.0.0-SNAPSHOT (and all GA versions)...
> If jclouds receives a crazily big http response (e.g. see https://github.com/jclouds/jclouds/pull/1020),
then jclouds consumes a huge amount of memory while processing that response. It holds multiple
copies of the response in-memory at the same time.
> As reported in https://issues.apache.org/jira/browse/BROOKLYN-364, the memory usage (according
to {{jmap -histo:live <pid>}}) is most char arrays (which is what backs the StringBuilder
and String). When processing a 154MB response payload, the memory used by char arrays goes
something like this: 66MB -> 104MB -> 323MB -> 625MB -> 939MB -> 461MB ->
633MB -> 327MB -> 21MB.
> (I don't know how much to believe the {{:live}}: is that definitely the size of char
arrays that could not be GC'ed?)
> There are two main areas where this memory is consumed.
> *Wire Log*
> In the jclouds wire log, the following happens (approximately?!):
> * We read the response input stream, writing it to a {{FileBackedOutputStream}} - nice!
> * In {{Wire.wire(String, InputStream)}}, we read the {{FileBackedOutputStream}} into
a {{StringBuilder}}, and then call {{getWireLog().debug(buffer.toString())}}:  
>   The StringBuilder holds the {{char[]}}; the toString() duplicates it - so two copies
in memory.  
>   Unfortunately in the Softlayer example, it's all one huge line, so we logged it all
in one go.  
>   I think (but need to dig into it more) that the logging framework (slf4j -> log4j
in my case) ends up with multiple copies as well, while processing the call to {{log.debug(String)}}.
(Hence peaking at 939MB of char arrays in memory).
>   When the method returns, all these copies can be GC'ed.
> * The response payload has now been switched to the {{FileBackedOutputStream}}, so that
will be used subsequently.
> *Json Parsing*
> To parse the HTTP response:
> * The response is passed to {{org.jclouds.http.functions.ParseJson.apply(HttpResponse)}}
> * This calls {{json.fromJson(Strings2.toStringAndClose(stream), type)}}.  
> * The {{Strings2.toStringAndClose}} calls {{CharStreams.toString(new InputStreamReader(input,
Charsets.UTF_8))}}.  
>   This reads the stream into a StringBuilder, then calls toString - so holds two copies
in memory.  
>   This explains the second spike in memory usage (though I'm surprised it went as high
as 633MB of char arrays in memory).  
>   When the method returns, we have our one String.  
> *Possible Improvements to Wire Log*
> {{org.jclouds.logging.internal.Wire}} could be configurable to only log the first X bytes
of a response (so crazily long messages would be truncated in the log).
> Alternatively/additionally, {{Wire.wire}} could force splitting a huge line into multiple
log messages when calling {{getWireLog().debug(buffer.toString())}}. Again this could be configurable.
> In production usage, I personally would always configure it to truncate: better to miss
the end of the response rather than risk an {{OutOfMemoryError}}. Note this particular part
isn't an issue if jclouds.wire is set to INFO or higher.
> *Possible Improvements to Json Parsing*
> I think {{ParseJson.apply(InputStream stream, Type type)}} should pass the {{new InputStreamReader(inputStream)}}
to {{org.jclouds.json.Json.fromJson()}} (in a try-finally to close the stream, obviously).
> This would require adding a new method to the interface {{org.jclouds.json.Json}}.
> The implementation of {{GsonWrapper}} can handle that easily: it can call {{gson.fromJson(Reader,
Type)}} instead of {{gson.fromJson(String, Type)}}. It looks to me like that will parse the
stream as it reads it, rather than having to hold the whole string in memory at once.
> If we do these improvements, I don't think we'll ever hold the full 154MB char array
in memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message