ode-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maciej Szefler" <...@intalio.com>
Subject Re: Ode Performance: Round I
Date Fri, 08 Jun 2007 18:59:53 GMT
That strikes me addressing the issue at the wrong level in the
code---if we wants things to happen in one thread, then the engine
should just do them in one thread, i.e. not call scheduler until it
has given up on the thread. Introducing a new concept (work queue)
that is shared between the engine and integration layer would be
confusing... its bad enough that the IL uses the scheduler, which it
really should not.

-mbs

On 6/8/07, Alex Boisvert <boisvert@intalio.com> wrote:
> As a first step, I was thinking of allowing the composition of work that is
> currently done in several unrelated threads into a single thread, by
> introducing a WorkQueue
>
> Right now we have code in the engine, such as
> org.apache.ode.axis2.ExternalService.invoke() -> afterCompletion() that uses
> ExecutorService.submit(...) and I'd like to convert this into
> WorkQueue.submit().
>
> For example, this means that org.apache.ode.axis2.OdeService would first
> execute the transaction around odeMex.invoke() and after commit it would
> dequeue and execute any pending items in the WorkQueue.  We would also need
> to do the same in BpelEngineImpl.onScheduledJob() and other similar engine
> entrypoints.
>
> The outcome of this is that we could execute all the "non-blocking" work
> related to an external event in a single thread, if desired.   Depending on
> the WorkQueue implementation, we could have pure serial processing, parallel
> processing (like now), or even a mix in-between (e.g. limiting concurrent
> processing to N threads for a given instance).   This would allow for
> optimizing response time or throughput based on the engine policy, or if we
> want to get sophisticated, by process model.
>
> I think this change is relatively straightforward that it could happen in
> the trunk without disrupting it.
>
> Thoughts?
>
> alex
>
> On 6/8/07, Maciej Szefler <mbs@intalio.com> wrote:
> >
> > sure..
> >
> >
> > On 6/7/07, Alex Boisvert <boisvert@intalio.com> wrote:
> > > Ok, got it.   Do you want to go ahead and create the "straight-through"
> > > branch?
> > >
> > > alex
> > >
> > >
> > > On 6/7/07, Maciej Szefler <mbs@intalio.com> wrote:
> > > >
> > > > If the IL supports ASYNC, then it is used, otherwise BLOCKING would be
> > > > used. We want to keep this, because if the IL does indeed use ASYNC
> > > > style (for example if this is a JMS ESB), then likely we don't have
> > > > much in the way of performance guarantees, i.e. the thread may end up
> > > > being blocked for a day, which would quickly lead to resource
> > > > problems.
> > > >
> > > > -mbs
> > > >
> > > > On 6/6/07, Alex Boisvert <boisvert@intalio.com> wrote:
> > > > > Maciej,
> > > > >
> > > > > I'm unclear about how the engine would choose between BLOCKING and
> > > > ASYNC.
> > > > >
> > > > > I tend to think we need only BLOCKING and the IL deals with the fact
> > > > that it
> > > > > might have to suspend and resume itself if the underlying invocation
> > is
> > > > > async (e.g. JBI).   What's the use-case for ASYNC?
> > > > >
> > > > > alex
> > > > >
> > > > > On 6/6/07, Matthieu Riou <matthieu.riou@gmail.com> wrote:
> > > > > >
> > > > > > Forwarding on behalf of Maciej (mistakingly replied privately):
> > > > > >
> > > > > >
> > > > > >
> > > >
> > -----------------------------------------------------------------------------------------------------------------
> > > > > >
> > > > > > ah yes. ok, here's my theory on getting the behavior alex wants;
> > this
> > > > > > i think is a fairly concrete way to get the different use cases
we
> > > > > > outlined on the white board.
> > > > > >
> > > > > > 1) create the notion of an invocation style: BLOCKING, ASYNC,
> > > > > > RELIABLE, and TRANSACTED.
> > > > > > 2) add MessageExchangeContext.isStyleSupported(PartnerMex, Style)
> > > > method
> > > > > > 3) modify the MessageExchangeContext.invokePartner method to
take
> > a
> > > > > > style parameter.
> > > > > >
> > > > > > In BLOCKING style the IL simply does the invoke, right then
and
> > there,
> > > > > > blocking the thread. (our axis IL would support this style)
> > > > > >
> > > > > > In ASYNC style, the IL does not block; instead it sends us a
> > > > > > notification when the response is available. (JBI likes this
style
> > the
> > > > > > most).
> > > > > >
> > > > > > In RELIABLE, the request would be enrolled in the current TX,
> > response
> > > > > > delievered asynch as above (in a new tx)
> > > > > >
> > > > > > in TRANSACTED, the behavior is like BLOCKING, but the TX context
> > is
> > > > > > propagted with the invocation.
> > > > > >
> > > > > > The engine would try to use the best style given the
> > circumstances.
> > > > > > For example, for in-mem processes it would prefer to use the
> > > > > > TRANSACTED style and it could do it "in-line", i.e. as part
of the
> > > > > > <invoke> or right after it runs out of reductions.  If
the style
> > is
> > > > > > not supported it could 'downgrade' to the BLOCKING style, which
> > would
> > > > > > work in the same way. If BLOCKING were not supported, then ASYNC
> > would
> > > > > > be the last resort, but this would force us to serialize.
> > > > > >
> > > > > > For persisted processes, we'd prefer RELIABLE in general,
> > TRANSACTED
> > > > > > when inside an atomic scope, otherwise either BLOCKING or ASYNC.
> > > > > > However, here use of BLOCKING or ASYNC would result in additional
> > > > > > transactions since we'd need to persist the fact that the
> > invocation
> > > > > > was made. Unless of course the operation is marked as "idempotent"
> > in
> > > > > > which case we could use the BLOCKING call without a checkpoint.
> > > > > >
> > > > > > How does that sound?
> > > > > > -mbs
> > > > > >
> > > > > >
> > > > > > On 6/6/07, Matthieu Riou <matthieu.riou@gmail.com> wrote:
> > > > > > >
> > > > > > > Actually for in-memory processes, it would save us all
reads and
> > > > writes
> > > > > > > (we should never read or write it in that case). And for
> > persistent
> > > > > > > processes, then it will save a lot of reads (which are
still
> > > > expensive
> > > > > > > because of deserialization).
> > > > > > >
> > > > > > > On 6/6/07, Matthieu Riou <matthieu.riou@gmail.com>
wrote:
> > > > > > > >
> > > > > > > > Two things:
> > > > > > > >
> > > > > > > > 1. We should also consider caching the Jacob state.
Instead of
> > > > always
> > > > > > > > serializing / writing and reading / deserializing,
caching
> > those
> > > > > > states
> > > > > > > > could save us a lot of reads.
> > > > > > > >
> > > > > > > > 2. Cutting down the transaction count is a significant
> > refactoring
> > > > so
> > > > > > I
> > > > > > > > would start a new branch for that (maybe ODE 2.0?).
And we're
> > > > going to
> > > > > > > > need a lot of tests to chase regressions :)
> > > > > > > >
> > > > > > > > I think 1 could go without a branch. It's not trivial
but I
> > don't
> > > > > > think
> > > > > > > > it would take more than a couple of weeks (I would
have to get
> > > > deeper
> > > > > > into
> > > > > > > > the code to give a better evaluation).
> > > > > > > >
> > > > > > > > On 6/6/07, Alex Boisvert < boisvert@intalio.com>
wrote:
> > > > > > > > >
> > > > > > > > > Howza,
> > > > > > > > >
> > > > > > > > > I started testing a short-lived process implementing
a
> > single
> > > > > > > > > request-response operation.  The process structure
is as
> > > > follows:
> > > > > > > > >
> > > > > > > > > -Receive Purchase Order
> > > > > > > > > -Do some assignments (schema mappings)
> > > > > > > > > -Invoke CRM system to record the new PO
> > > > > > > > > -Do more assignments (schema mappings)
> > > > > > > > > -Invoke ERP system to record a new work order
> > > > > > > > > -Send back an acknowledgment
> > > > > > > > >
> > > > > > > > > Some deployment notes:
> > > > > > > > > -All WS operations are SOAP/HTTP
> > > > > > > > > -The process is deployed as "in-memory"
> > > > > > > > > -The CRM and ERP systems are mocked as Axis2
services (as
> > dumb
> > > > as
> > > > > > can
> > > > > > > > > be to
> > > > > > > > > avoid bottlenecks)
> > > > > > > > >
> > > > > > > > > After fixing a few minor issues (to handle the
load), and
> > fixing
> > > > a
> > > > > > few
> > > > > > > > >
> > > > > > > > > obvious code inefficiencies which gave us roughly
a 20%
> > gain, we
> > > > are
> > > > > > > > > now
> > > > > > > > > near-100% CPU utilization.  (I'm testing on my
dual-core
> > system)
> > > > > > As
> > > > > > > > > it
> > > > > > > > > stands, Ode clocks about 70 transactions per
second.
> > > > > > > > >
> > > > > > > > > Is this good?  I'd say there's room for improvement.
 Based
> > on
> > > > > > > > > previous work
> > > > > > > > > in the field, I estimate we could get up to 300-400
> > > > > > > > > transactions/second.
> > > > > > > > >
> > > > > > > > > How do we improve this?  Well, looking at the
end-to-end
> > > > execution
> > > > > > of
> > > > > > > > > the
> > > > > > > > > process, I counted 4 thread-switches and 4 JTA
> > > > transactions.  Those
> > > > > > > > > are not
> > > > > > > > > really necessary, if you ask me.  I think significant
> > > > improvements
> > > > > > > > > could be
> > > > > > > > > made if we could run this process straight-through,
meaning
> > in a
> > > > > > > > > single
> > > > > > > > > thread and a single transaction.  (Not to mention
it would
> > make
> > > > > > things
> > > > > > > > >
> > > > > > > > > easier to monitor and measure ;)
> > > > > > > > >
> > > > > > > > > Also, to give you an idea, the top 3 areas where
we spend
> > most
> > > > of
> > > > > > our
> > > > > > > > > CPU
> > > > > > > > > today are:
> > > > > > > > >
> > > > > > > > > 1) Serialization/deserialization of the Jacob
state (I'm
> > > > evaluating
> > > > > > > > > about
> > > > > > > > > 40-50%)
> > > > > > > > > 2) XML marshaling/unmarshaling (About 10-20%)
> > > > > > > > > 3) XML processing:  XPath evaluation + assignments
(About
> > > > 10-20%)
> > > > > > > > >
> > > > > > > > > (The rest would be about 20%; I need to load
up JProbe or
> > DTrace
> > > > to
> > > > > > > > > provide
> > > > > > > > > more accurate measurements.  My current estimates
are a mix
> > of
> > > > > > > > > non-scientific statistical sampling of thread
dumps and a
> > quick
> > > > run
> > > > > > > > > with the
> > > > > > > > > JVM's built-in profiler)
> > > > > > > > >
> > > > > > > > > So my general question is...  how do we get started
on the
> > > > single
> > > > > > > > > thread +
> > > > > > > > > single transaction refactoring?    Anybody already
gave some
> > > > > > thoughts
> > > > > > > > > to
> > > > > > > > > this?  Are there any pending design issues before
we
> > start?  How
> > > > do
> > > > > > we
> > > > > > > > > work
> > > > > > > > > on this without disrupting other parts of the
system?  Do we
> > > > start a
> > > > > > > > > new
> > > > > > > > > branch?
> > > > > > > > >
> > > > > > > > > alex
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
View raw message