synapse-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ruwan Linton <ruwan.lin...@gmail.com>
Subject Re: VFS - Synapse Memory Leak
Date Mon, 09 Mar 2009 15:53:25 GMT
Andreas,

On Mon, Mar 9, 2009 at 5:24 PM, Andreas Veithen
<andreas.veithen@gmail.com>wrote:

> The changes I did in the VFS transport and the message builders for
> text/plain and application/octet-stream certainly don't provide an
> out-of-the-box solution for your use case, but they are the
> prerequisite.
>
> Concerning your first proposed solution (let the VFS write the content
> to a temporary file), I don't like this because it would create a
> tight coupling between the VFS transport and the mediator. A design
> goal should be that the solution will still work if the file comes
> from another source, e.g. an attachment in an MTOM or SwA message.
>
> I thing that an all-Synapse solution (2 or 3) should be possible, but
> this will require development of a custom mediator. This mediator
> would read the content, split it up (and store the chunks in memory or
> an disk) and executes a sub-sequence for each chunk. The execution of
> the sub-sequence would happen synchronously to limit the memory/disk
> space consumption (to the maximum chunk size) and to avoid flooding
> the destination service.
>
> Note that it is probably not possible to implemented the mediator
> using a script because of the problematic String handling. Also,
> Spring, POJO and class mediators don't support sub-sequences (I
> think). Therefore it should be implemented as a full-featured Java
> mediator, probably taking the existing iterate mediator as a template.
> I can contribute the required code to get the text content in the form
> of a java.io.Reader.


Could you please explain this is bit? do you mean to implement the transport
to give out text content as a java.io.Reader? If so what is the general
usage of this except for this particular scenario?

Thanks,
Ruwan


>
>
> Regards,
>
> Andreas
>
> On Mon, Mar 9, 2009 at 03:05, kimhorn <kim.horn@icsglobal.net> wrote:
> >
> > Although this is a good feature it may not solve the actual problem ?
> > The main first issue on my list was the memory leak.
> > However, the real problem is once I get this massive files I  have to
> send
> > it to a web Service that can only take it in small chunks (about 14MB) .
> > Streaming it straight out would just kill the destination Web service. It
> > would get the memory error. The text document can be split apart easily,
> as
> > it has independant records on each line seperated by <CR> <LF>.
> >
> > In an earlier post; that was not responded too, I mentioned:
> >
> > "Otherwise; for large EDI files a VFS iterator Mediator that streams
> through
> > input file and outputs smaller
> > chunks for processing, in Synapse, may be a solution ? "
> >
> > So I had mentioned a few solutions, in prior posts, solution now are:
> >
> > 1) VFS writes straight to temporary file, then a Java mediator can
> process
> > the file by splitting it into many smaller files. These files then
> trigger
> > another VFS proxy that submits these to the final web Service.
> > The problem is is that is uses the file system (not so bad).
> > 2) A Java Mediator takes the <text> package and splits it up by wrapping
> > into many XML <data> elements that can then be acted on by a Synapse
> > Iterator. So replace the text message with many smaller XML elements.
> > Problem is that this loads whole message into memory.
> > 3) Create another Iterator in Synapse that works on Regular expression
> (to
> > split the text data) or actually uses a for loop approach to chop the
> file
> > into chunks based on the loop index value. E.g. Index = 23 means a 14K
> chunk
> > 23 chunks into the data.
> > 4) Using the approach proposed now - just submit the file straight
> (stream
> > it) to another web service that chops it up. It may return an XML
> document
> > with many sub elelements that allows the standard Iterator to work.
> Similar
> > to (2) but using another service rather than Java to split document.
> > 5) Using the approach proposed now - just submit the file straight
> (stream
> > it) to another web service that chops it up but calls a Synapse proxy
> with
> > each small packet of data that then forwards it to the final WEb Service.
> So
> > the Web Service iterates across the data; and not Synapse.
> >
> > Then other solutions replace Synapse with a stand alone Java program at
> the
> > front end.
> >
> > Another issue here is throttling: Splitting the file is one issues but
> > submitting 100's of calls in parralel to the destination service would
> > result in time outs... So need to work in throttling.
> >
> >
> >
> >
> >
> >
> >
> >
> > Ruwan Linton wrote:
> >>
> >> I agree and can understand the time factor and also +1 for reusing stuff
> >> than trying to invent the wheel again :-)
> >>
> >> Thanks,
> >> Ruwan
> >>
> >> On Sun, Mar 8, 2009 at 4:08 PM, Andreas Veithen
> >> <andreas.veithen@gmail.com>wrote:
> >>
> >>> Ruwan,
> >>>
> >>> It's not a question of possibility, it is a question of available time
> >>> :-)
> >>>
> >>> Also note that some of the features that we might want to implement
> >>> have some similarities with what is done for attachments in Axiom
> >>> (except that an attachment is only available once, while a file over
> >>> VFS can be read several times). I think there is also some existing
> >>> code in Axis2 that might be useful. We should not reimplement these
> >>> things but try to make the existing code reusable. This however is
> >>> only realistic for the next release after 1.3.
> >>>
> >>> Andreas
> >>>
> >>> On Sun, Mar 8, 2009 at 03:47, Ruwan Linton <ruwan.linton@gmail.com>
> >>> wrote:
> >>> > Andreas,
> >>> >
> >>> > Can we have the caching at the file system as a property to support
> the
> >>> > multiple layers touching the full message and is it possible make it
> to
> >>> > specify a threshold for streaming? For example if the message is
> >>> touched
> >>> > several time we might still need streaming but not for the 100KB or
> >>> lesser
> >>> > files.
> >>> >
> >>> > Thanks,
> >>> > Ruwan
> >>> >
> >>> > On Sun, Mar 8, 2009 at 1:12 AM, Andreas Veithen <
> >>> andreas.veithen@gmail.com>
> >>> > wrote:
> >>> >>
> >>> >> I've done an initial implementation of this feature. It is available
> >>> >> in trunk and should be included in the next nightly build. In order
> to
> >>> >> enable this in your configuration, you need to add the following
> >>> >> property to the proxy:
> >>> >>
> >>> >> <parameter name="transport.vfs.Streaming">true</parameter>
> >>> >>
> >>> >> You also need to add the following mediators just before the <send>
> >>> >> mediator:
> >>> >>
> >>> >> <property action="remove" name="transportNonBlocking"
> scope="axis2"/>
> >>> >> <property action="set" name="OUT_ONLY" value="true"/>
> >>> >>
> >>> >> With this configuration Synapse will stream the data directly from
> the
> >>> >> incoming to the outgoing transport without storing it in memory
or
> in
> >>> >> a temporary file. Note that this has two other side effects:
> >>> >> * The incoming file (or connection in case of a remote file) will
> only
> >>> >> be opened on demand. In this case this happens during execution
of
> the
> >>> >> <send> mediator.
> >>> >> * If during the mediation the content of the file is needed several
> >>> >> time (which is not the case in your example), it will be read
> several
> >>> >> times. The reason is of course that the content is not cached.
> >>> >>
> >>> >> I tested the solution with a 2GB file and it worked fine. The
> >>> >> performance of the implementation is not yet optimal, but at least
> the
> >>> >> memory consumption is constant.
> >>> >>
> >>> >> Some additional comments:
> >>> >> * The transport.vfs.Streaming property has no impact on XML and
SOAP
> >>> >> processing: this type of content is processed exactly as before.
> >>> >> * With the changes described here, we have now two different
> policies
> >>> >> for plain text and binary content processing: in-memory caching
+ no
> >>> >> streaming (transport.vfs.Streaming=false) and no caching + deferred
> >>> >> connection + streaming (transport.vfs.Streaming=true). Probably
we
> >>> >> should define a wider range of policies in the future, including
> file
> >>> >> system caching + streaming.
> >>> >> * It is necessary to remove the transportNonBlocking property
> >>> >> (MessageContext.TRANSPORT_NON_BLOCKING) to prevent the <send>
> mediator
> >>> >> (more precisely the OperationClient) from executing the outgoing
> >>> >> transport in a separate thread. This property is set by the incoming
> >>> >> transport. I think this is a bug since I don't see any valid reason
> >>> >> why the transport that handles the incoming request should determine
> >>> >> the threading behavior of the transport that sends the outgoing
> >>> >> request to the target service. Maybe Asankha can comment on this?
> >>> >>
> >>> >> Andreas
> >>> >>
> >>> >> On Thu, Mar 5, 2009 at 07:21, kimhorn <kim.horn@icsglobal.net>
> wrote:
> >>> >> >
> >>> >> > Thats good; as this stops us using Synapse.
> >>> >> >
> >>> >> >
> >>> >> >
> >>> >> > Asankha C. Perera wrote:
> >>> >> >>
> >>> >> >>
> >>> >> >>> Exception in thread "vfs-Worker-4" java.lang.OutOfMemoryError:
> >>> Java
> >>> >> >>> heap
> >>> >> >>> space
> >>> >> >>>         at
> >>> >> >>>
> >>> >> >>>
> >>>
> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:99)
> >>> >> >>>         at
> >>> >> >>>
> >>> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:518)
> >>> >> >>>         at java.lang.StringBuffer.append(StringBuffer.java:307)
> >>> >> >>>         at java.io.StringWriter.write(StringWriter.java:72)
> >>> >> >>>         at
> >>> org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1129)
> >>> >> >>>         at org.apache.commons.io.IOUtils.copy(IOUtils.java:1104)
> >>> >> >>>         at org.apache.commons.io.IOUtils.copy(IOUtils.java:1078)
> >>> >> >>>         at
> >>> org.apache.commons.io.IOUtils.toString(IOUtils.java:382)
> >>> >> >>>         at
> >>> >> >>>
> >>> >> >>>
> >>>
> org.apache.synapse.format.PlainTextBuilder.processDocument(PlainTextBuilder.java:68)
> >>> >> >>>
> >>> >> >> Since the content type is text, the plain text formatter
is
> trying
> >>> to
> >>> >> >> use a String to parse as I see.. which is a problem for
large
> >>> content..
> >>> >> >>
> >>> >> >> A definite bug we need to fix ..
> >>> >> >>
> >>> >> >> cheers
> >>> >> >> asankha
> >>> >> >>
> >>> >> >> --
> >>> >> >> Asankha C. Perera
> >>> >> >> AdroitLogic, http://adroitlogic.org
> >>> >> >>
> >>> >> >> http://esbmagic.blogspot.com
> >>> >> >>
> >>> >> >>
> >>> >> >>
> >>> >> >>
> >>> >> >>
> >>> >> >>
> >>> ---------------------------------------------------------------------
> >>> >> >> To unsubscribe, e-mail: dev-unsubscribe@synapse.apache.org
> >>> >> >> For additional commands, e-mail: dev-help@synapse.apache.org
> >>> >> >>
> >>> >> >>
> >>> >> >>
> >>> >> >
> >>> >> > --
> >>> >> > View this message in context:
> >>> >> >
> >>>
> http://www.nabble.com/VFS---Synapse-Memory-Leak-tp22344176p22345904.html
> >>> >> > Sent from the Synapse - Dev mailing list archive at Nabble.com.
> >>> >> >
> >>> >> >
> >>> >> >
> >>> ---------------------------------------------------------------------
> >>> >> > To unsubscribe, e-mail: dev-unsubscribe@synapse.apache.org
> >>> >> > For additional commands, e-mail: dev-help@synapse.apache.org
> >>> >> >
> >>> >> >
> >>> >>
> >>> >>
> ---------------------------------------------------------------------
> >>> >> To unsubscribe, e-mail: dev-unsubscribe@synapse.apache.org
> >>> >> For additional commands, e-mail: dev-help@synapse.apache.org
> >>> >>
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > Ruwan Linton
> >>> > http://wso2.org - "Oxygenating the Web Services Platform"
> >>> > http://ruwansblog.blogspot.com/
> >>> >
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@synapse.apache.org
> >>> For additional commands, e-mail: dev-help@synapse.apache.org
> >>>
> >>>
> >>
> >>
> >> --
> >> Ruwan Linton
> >> http://wso2.org - "Oxygenating the Web Services Platform"
> >> http://ruwansblog.blogspot.com/
> >>
> >>
> >
> > --
> > View this message in context:
> http://www.nabble.com/VFS---Synapse-Memory-Leak-tp22344176p22405973.html
> > Sent from the Synapse - Dev mailing list archive at Nabble.com.
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@synapse.apache.org
> > For additional commands, e-mail: dev-help@synapse.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@synapse.apache.org
> For additional commands, e-mail: dev-help@synapse.apache.org
>
>


-- 
Ruwan Linton
http://wso2.org - "Oxygenating the Web Services Platform"
http://ruwansblog.blogspot.com/

Mime
View raw message