synapse-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Veithen <andreas.veit...@gmail.com>
Subject Re: VFS - Synapse Memory Leak
Date Mon, 09 Mar 2009 11:54:43 GMT
The changes I did in the VFS transport and the message builders for
text/plain and application/octet-stream certainly don't provide an
out-of-the-box solution for your use case, but they are the
prerequisite.

Concerning your first proposed solution (let the VFS write the content
to a temporary file), I don't like this because it would create a
tight coupling between the VFS transport and the mediator. A design
goal should be that the solution will still work if the file comes
from another source, e.g. an attachment in an MTOM or SwA message.

I thing that an all-Synapse solution (2 or 3) should be possible, but
this will require development of a custom mediator. This mediator
would read the content, split it up (and store the chunks in memory or
an disk) and executes a sub-sequence for each chunk. The execution of
the sub-sequence would happen synchronously to limit the memory/disk
space consumption (to the maximum chunk size) and to avoid flooding
the destination service.

Note that it is probably not possible to implemented the mediator
using a script because of the problematic String handling. Also,
Spring, POJO and class mediators don't support sub-sequences (I
think). Therefore it should be implemented as a full-featured Java
mediator, probably taking the existing iterate mediator as a template.
I can contribute the required code to get the text content in the form
of a java.io.Reader.

Regards,

Andreas

On Mon, Mar 9, 2009 at 03:05, kimhorn <kim.horn@icsglobal.net> wrote:
>
> Although this is a good feature it may not solve the actual problem ?
> The main first issue on my list was the memory leak.
> However, the real problem is once I get this massive files I  have to send
> it to a web Service that can only take it in small chunks (about 14MB) .
> Streaming it straight out would just kill the destination Web service. It
> would get the memory error. The text document can be split apart easily, as
> it has independant records on each line seperated by <CR> <LF>.
>
> In an earlier post; that was not responded too, I mentioned:
>
> "Otherwise; for large EDI files a VFS iterator Mediator that streams through
> input file and outputs smaller
> chunks for processing, in Synapse, may be a solution ? "
>
> So I had mentioned a few solutions, in prior posts, solution now are:
>
> 1) VFS writes straight to temporary file, then a Java mediator can process
> the file by splitting it into many smaller files. These files then trigger
> another VFS proxy that submits these to the final web Service.
> The problem is is that is uses the file system (not so bad).
> 2) A Java Mediator takes the <text> package and splits it up by wrapping
> into many XML <data> elements that can then be acted on by a Synapse
> Iterator. So replace the text message with many smaller XML elements.
> Problem is that this loads whole message into memory.
> 3) Create another Iterator in Synapse that works on Regular expression (to
> split the text data) or actually uses a for loop approach to chop the file
> into chunks based on the loop index value. E.g. Index = 23 means a 14K chunk
> 23 chunks into the data.
> 4) Using the approach proposed now - just submit the file straight (stream
> it) to another web service that chops it up. It may return an XML document
> with many sub elelements that allows the standard Iterator to work. Similar
> to (2) but using another service rather than Java to split document.
> 5) Using the approach proposed now - just submit the file straight (stream
> it) to another web service that chops it up but calls a Synapse proxy with
> each small packet of data that then forwards it to the final WEb Service. So
> the Web Service iterates across the data; and not Synapse.
>
> Then other solutions replace Synapse with a stand alone Java program at the
> front end.
>
> Another issue here is throttling: Splitting the file is one issues but
> submitting 100's of calls in parralel to the destination service would
> result in time outs... So need to work in throttling.
>
>
>
>
>
>
>
>
> Ruwan Linton wrote:
>>
>> I agree and can understand the time factor and also +1 for reusing stuff
>> than trying to invent the wheel again :-)
>>
>> Thanks,
>> Ruwan
>>
>> On Sun, Mar 8, 2009 at 4:08 PM, Andreas Veithen
>> <andreas.veithen@gmail.com>wrote:
>>
>>> Ruwan,
>>>
>>> It's not a question of possibility, it is a question of available time
>>> :-)
>>>
>>> Also note that some of the features that we might want to implement
>>> have some similarities with what is done for attachments in Axiom
>>> (except that an attachment is only available once, while a file over
>>> VFS can be read several times). I think there is also some existing
>>> code in Axis2 that might be useful. We should not reimplement these
>>> things but try to make the existing code reusable. This however is
>>> only realistic for the next release after 1.3.
>>>
>>> Andreas
>>>
>>> On Sun, Mar 8, 2009 at 03:47, Ruwan Linton <ruwan.linton@gmail.com>
>>> wrote:
>>> > Andreas,
>>> >
>>> > Can we have the caching at the file system as a property to support the
>>> > multiple layers touching the full message and is it possible make it to
>>> > specify a threshold for streaming? For example if the message is
>>> touched
>>> > several time we might still need streaming but not for the 100KB or
>>> lesser
>>> > files.
>>> >
>>> > Thanks,
>>> > Ruwan
>>> >
>>> > On Sun, Mar 8, 2009 at 1:12 AM, Andreas Veithen <
>>> andreas.veithen@gmail.com>
>>> > wrote:
>>> >>
>>> >> I've done an initial implementation of this feature. It is available
>>> >> in trunk and should be included in the next nightly build. In order
to
>>> >> enable this in your configuration, you need to add the following
>>> >> property to the proxy:
>>> >>
>>> >> <parameter name="transport.vfs.Streaming">true</parameter>
>>> >>
>>> >> You also need to add the following mediators just before the <send>
>>> >> mediator:
>>> >>
>>> >> <property action="remove" name="transportNonBlocking" scope="axis2"/>
>>> >> <property action="set" name="OUT_ONLY" value="true"/>
>>> >>
>>> >> With this configuration Synapse will stream the data directly from the
>>> >> incoming to the outgoing transport without storing it in memory or in
>>> >> a temporary file. Note that this has two other side effects:
>>> >> * The incoming file (or connection in case of a remote file) will only
>>> >> be opened on demand. In this case this happens during execution of the
>>> >> <send> mediator.
>>> >> * If during the mediation the content of the file is needed several
>>> >> time (which is not the case in your example), it will be read several
>>> >> times. The reason is of course that the content is not cached.
>>> >>
>>> >> I tested the solution with a 2GB file and it worked fine. The
>>> >> performance of the implementation is not yet optimal, but at least the
>>> >> memory consumption is constant.
>>> >>
>>> >> Some additional comments:
>>> >> * The transport.vfs.Streaming property has no impact on XML and SOAP
>>> >> processing: this type of content is processed exactly as before.
>>> >> * With the changes described here, we have now two different policies
>>> >> for plain text and binary content processing: in-memory caching + no
>>> >> streaming (transport.vfs.Streaming=false) and no caching + deferred
>>> >> connection + streaming (transport.vfs.Streaming=true). Probably we
>>> >> should define a wider range of policies in the future, including file
>>> >> system caching + streaming.
>>> >> * It is necessary to remove the transportNonBlocking property
>>> >> (MessageContext.TRANSPORT_NON_BLOCKING) to prevent the <send>
mediator
>>> >> (more precisely the OperationClient) from executing the outgoing
>>> >> transport in a separate thread. This property is set by the incoming
>>> >> transport. I think this is a bug since I don't see any valid reason
>>> >> why the transport that handles the incoming request should determine
>>> >> the threading behavior of the transport that sends the outgoing
>>> >> request to the target service. Maybe Asankha can comment on this?
>>> >>
>>> >> Andreas
>>> >>
>>> >> On Thu, Mar 5, 2009 at 07:21, kimhorn <kim.horn@icsglobal.net>
wrote:
>>> >> >
>>> >> > Thats good; as this stops us using Synapse.
>>> >> >
>>> >> >
>>> >> >
>>> >> > Asankha C. Perera wrote:
>>> >> >>
>>> >> >>
>>> >> >>> Exception in thread "vfs-Worker-4" java.lang.OutOfMemoryError:
>>> Java
>>> >> >>> heap
>>> >> >>> space
>>> >> >>>         at
>>> >> >>>
>>> >> >>>
>>> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:99)
>>> >> >>>         at
>>> >> >>>
>>> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:518)
>>> >> >>>         at java.lang.StringBuffer.append(StringBuffer.java:307)
>>> >> >>>         at java.io.StringWriter.write(StringWriter.java:72)
>>> >> >>>         at
>>> org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1129)
>>> >> >>>         at org.apache.commons.io.IOUtils.copy(IOUtils.java:1104)
>>> >> >>>         at org.apache.commons.io.IOUtils.copy(IOUtils.java:1078)
>>> >> >>>         at
>>> org.apache.commons.io.IOUtils.toString(IOUtils.java:382)
>>> >> >>>         at
>>> >> >>>
>>> >> >>>
>>> org.apache.synapse.format.PlainTextBuilder.processDocument(PlainTextBuilder.java:68)
>>> >> >>>
>>> >> >> Since the content type is text, the plain text formatter is
trying
>>> to
>>> >> >> use a String to parse as I see.. which is a problem for large
>>> content..
>>> >> >>
>>> >> >> A definite bug we need to fix ..
>>> >> >>
>>> >> >> cheers
>>> >> >> asankha
>>> >> >>
>>> >> >> --
>>> >> >> Asankha C. Perera
>>> >> >> AdroitLogic, http://adroitlogic.org
>>> >> >>
>>> >> >> http://esbmagic.blogspot.com
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >>
>>> ---------------------------------------------------------------------
>>> >> >> To unsubscribe, e-mail: dev-unsubscribe@synapse.apache.org
>>> >> >> For additional commands, e-mail: dev-help@synapse.apache.org
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >
>>> >> > --
>>> >> > View this message in context:
>>> >> >
>>> http://www.nabble.com/VFS---Synapse-Memory-Leak-tp22344176p22345904.html
>>> >> > Sent from the Synapse - Dev mailing list archive at Nabble.com.
>>> >> >
>>> >> >
>>> >> >
>>> ---------------------------------------------------------------------
>>> >> > To unsubscribe, e-mail: dev-unsubscribe@synapse.apache.org
>>> >> > For additional commands, e-mail: dev-help@synapse.apache.org
>>> >> >
>>> >> >
>>> >>
>>> >> ---------------------------------------------------------------------
>>> >> To unsubscribe, e-mail: dev-unsubscribe@synapse.apache.org
>>> >> For additional commands, e-mail: dev-help@synapse.apache.org
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Ruwan Linton
>>> > http://wso2.org - "Oxygenating the Web Services Platform"
>>> > http://ruwansblog.blogspot.com/
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@synapse.apache.org
>>> For additional commands, e-mail: dev-help@synapse.apache.org
>>>
>>>
>>
>>
>> --
>> Ruwan Linton
>> http://wso2.org - "Oxygenating the Web Services Platform"
>> http://ruwansblog.blogspot.com/
>>
>>
>
> --
> View this message in context: http://www.nabble.com/VFS---Synapse-Memory-Leak-tp22344176p22405973.html
> Sent from the Synapse - Dev mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@synapse.apache.org
> For additional commands, e-mail: dev-help@synapse.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@synapse.apache.org
For additional commands, e-mail: dev-help@synapse.apache.org


Mime
View raw message