qpid-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rupert Smith" <rupertlssm...@googlemail.com>
Subject Re: Use of AMQShortString in client side code
Date Wed, 19 Sep 2007 10:47:20 GMT
Yes, but if you have a new string, that may match an existing token you
still have to match it?

e.g.

Queue queue1 = session.createQueue("test");
Queue queue2 = session.createQueue("test");

need to match "test" against the token set, to see if it has already been
tokenized. (ok in this case the compiler has interned "test", so can look it
up in a hash map, but if "test" were a dynamically generated name it would
be different).

Rupert

On 19/09/2007, Robert Godfrey <rob.j.godfrey@gmail.com> wrote:
>
> The idea of tokenization is to not do String matching.
>
> It is something we may try to do at AMQP...
>
> e.g.
>
> client says
>
> "From now on we should refer to the Short String
> 'very-very-very-very-long-in-fact-ridiculously-long-routing-key-name' as
> Token 1"
>
> Then, whenever it publishes a message with this string as a routing key,
> it
> simply uses the token instead.  This saves both on bytes over the wire,
> but
> can also make routing more efficient inside the broker.
>
> Hope this helps,
> Rob
>
> On 19/09/2007, Rupert Smith <rupertlssmith@googlemail.com> wrote:
> >
> > Just for the record, when short string was introduced, there was a
> > performance boost.
> >
> > I like the idea of a string mapping down onto a byte buffer, its more
> like
> > how things would be done in C. That is the incoming data would not be
> > lifted
> > out of its frame, the string would just be a pointer into the frame.
> >
> > Wrt, tokenization. Doing this fast may require some neat string matching
> > algorithms. With Java String, you get what the String class provides you
> > (you could start iterating over the characters but thats not going to be
> > fast). With a string as a byte buffer, there exists the possibility of
> > handing off the cache string matching algorithm to a carefully chosen
> > native
> > string matching algorithm.
> >
> > Quick example. You have a token cache. You want to tokenize a
> potentially
> > new string. So you need to match a single string against a large set of
> > possible candidates. The string is not likely to be very long (its a
> short
> > string). So multi-match a shortish string. The most efficient algorithm
> > for
> > this scenario might involve using bit sets in machine registers, I'm
> > guessing. I'm not sure that we will really need to do this super fast
> > though?
> >
> > On 19/09/2007, John O'Hara <john.r.ohara@gmail.com> wrote:
> > >
> > > Its also a space optimisation on the wire for when we cared about
> > that...
> > > for high volume messaging esp with TCP/IP and serial unicast those
> bytes
> > > start to matter.
> > >
> > > As for Tokenising, to support the notion of trivial clients the idea
> was
> > > to
> > > let the client assert which short strings were tokenized; usually
> > relating
> > > to routing keys which could repeat a lot.
> > >
> > > The original argument you made was to do with typing.
> > > You point out that AMQPShortString has benefits for the broker.
> > > It would also make sense to have that symetry in the client.
> > >
> > > There seems to be no compelling reason to change this.
> > > It's late and I'm tired so I won't go on more
> > >
> > > G'd night
> > > John
> > >
> > >
> > > On 19/09/2007, Rafael Schloming <rafaels@redhat.com> wrote:
> > > >
> > > > John O'Hara wrote:
> > > > > I agree with Rob that the lower levels of the stack should be
> > > > implemented in
> > > > > AMQPShortString *where it occurs in the protocol* for the
> following
> > > > reasons:
> > > > >
> > > > > 1) It provides the opportunity to validate the semantics; just
> > because
> > > > we're
> > > > > not checking length today doesn't mean we shouldn't
> > > >
> > > > AMQShortString really isn't the appropriate place to validate domain
> > > > level semantics. Different uses of shortstr have different domain
> > level
> > > > constraints. Also, any validation we put in AMQShortString is forced
> > to
> > > > run for every single shortstr field that passes through a broker.
> This
> > > > isn't particularly useful because when decoding fields off the wire,
> > > > such validation is unnecessary as it is already performed by the
> codec
> > > > in a more efficient manner that is specialized to the wire format.
> > > >
> > > > > 2) We may introduce AMQPShortStrong Tokenisation in the protocol
> in
> > > the
> > > > > future (has been discussed often, I think it's quite
> likely).  Doing
> > > > this we
> > > > > can collapse a shortstring to 2 bytes and reduce garbage.
> > > >
> > > > I presume you're referring to some scheme for caching commonly used
> > > > strings? If so this is a decoding optimization that would equally
> well
> > > > apply when decoding directly to Strings, or any other type for that
> > > > matter. In fact such an optimization would likely nullify any
> > > > performance advantage rendered by AMQShortString since
> > decoding/encoding
> > > > of anything would only be necessary when there is a cache miss.
> > > >
> > > > > 3) I'm unsure of the memory ownership semantics but I believe the
> > JMS
> > > > spec
> > > > > explicitly requires a copy of the message to be take to prevent
> grim
> > > > race
> > > > > conditions on message reuse.  Some products have the option to
> turn
> > > this
> > > > > off, but that's not the spec.  It's like not DMA'ing from
> userspace
> > > > without
> > > > > extreme care.
> > > >
> > > > I'm unsure how this relates to the use of AMQShortString. Any such
> > > > copying would happen well past the point where raw types are decoded
> > off
> > > > the wire.
> > > >
> > > > > Also, Rob has said it has been proven to be faster in the past.
> > > > > In the absence of a measured, demonstrable issue why change this
> > > > arguably
> > > > > more correct implementation?
> > > >
> > > > As it stands today AMQShortString is really just an optimization for
> > the
> > > > broker, and one that comes at a pretty high cost to the client. So
> if
> > > > there is a better way to solve the performance issue for the broker
> > > > without encumbering the client, it's certainly worth investigating.
> > > >
> > > > That's why I asked about the original problem being solved. For
> > example
> > > > I'd guess that in the critical path the broker really never needs to
> > > > decode much more than the exchange name and routing key in order to
> > > > deliver a message, so it might be possible to limit the use of
> > > > AMQShortString to just those fields (or decode to specific Exchange
> > and
> > > > RoutingKey classes) and get the necessary performance benefit in the
> > > > broker, with much less impact on the client.
> > > >
> > > > --Rafael
> > > >
> > > > > Cheers
> > > > > John
> > > > >
> > > > >
> > > > > On 19/09/2007, Rafael Schloming <rafaels@redhat.com> wrote:
> > > > >> Robert Godfrey wrote:
> > > > >>> On 13/09/2007, Rajith Attapattu <rajith77@gmail.com>
wrote:
> > > > >>>> I am wondering why we are using AMQShortString indiscriminately
> > all
> > > > >> over
> > > > >>>> the
> > > > >>>> client side code?
> > > > >>>> There is no performance benefit of using AMQShortString
(based
> on
> > > the
> > > > >> way
> > > > >>>> it
> > > > >>>> is used) on the client side and is purely used for encoding.
> > > > >>>
> > > > >>>
> > > > >>> Rajith,
> > > > >>>
> > > > >>> as we have discussed before - there *is* a significant
> performance
> > > > >> benefit
> > > > >>> which we have tested and proved previously.
> > > > >> Can you point me to the previous discussion? I'd like to learn
> more
> > > > >> about the original issue.
> > > > >>
> > > > >>    Many short strings are re-used
> > > > >>> frequently within the client library, and by using our own
type
> we
> > > can
> > > > >>> exploit this.
> > > > >> Unless we're excessively copying them I don't see how this
> matters.
> > > For
> > > > >> both an AMQShortString and a String we should just be passing
> > around
> > > > >> pointers when they are reused.
> > > > >>
> > > > >>    Further, the domain for many parameters in AMQP is *not* a
> > > > >>> unicode string, but is tightly defined as upto 255 bytes
of data
> > > with
> > > > a
> > > > >>> particular encoding.  Java Strings are not the appropriate
type
> to
> > > use
> > > > >> for
> > > > >>> this.  Encoding and decoding Java Strings is expensive, and
also
> > > prone
> > > > >> to
> > > > >>> error (i.e. you need to make sure that you *always* use the
> > correct
> > > > >> explicit
> > > > >>> encoding).
> > > > >> Despite the name AMQShortString, I don't think the AMQShortString
> > > class
> > > > >> actually represents the AMQP type short-string, for example there
> > is
> > > no
> > > > >> length limit for an AMQShortString. It's really just a generic
> > > > >> implementation of CharSequence that is optimized specifically
for
> > > rapid
> > > > >> decoding from a ByteBuffer. From a domain restriction
> perspective,
> > > > using
> > > > >> an ordinary String is just as correct.
> > > > >>
> > > > >>> It makes sense to use it on Broker side as you deal at bytes
> level
> > > and
> > > > I
> > > > >> can
> > > > >>>> understand the performance benefit of not having convert
back
> and
> > > > forth
> > > > >>>> into
> > > > >>>> a String.
> > > > >>>
> > > > >>> The low level API should be using correct AMQ domains.  High
> level
> > > > APIs
> > > > >>> (such as JMS) will obviously want to present these parameters
as
> > > java
> > > > >>> Strings.
> > > > >>>
> > > > >>>
> > > > >>> On the client side we just merely wrap/unwrap a String using
> > > > >> AMQShortString.
> > > > >>>> Why can't we do that at the encoding/decoding level for
the
> > client
> > > > side
> > > > >> ?
> > > > >>>
> > > > >>> In some cases this may be true, but in others certainly
> not.  When
> > > > >>> converting into JMS Destinations on receipt of a message,
for
> > > > instance,
> > > > >> one
> > > > >>> never needs to convert to a String... it is *much* faster
to
> > simply
> > > > use
> > > > >> the
> > > > >>> correct type of AMQShortString/
> > > > >> Unfortunately using AMQShortString imposes additional overhead
> > > whenever
> > > > >> we need to en/decode to/from an ordinary String. It basically
> > > requires
> > > > >> an additional copy when compared with directly encoding/decoding
> > > > to/from
> > > > >>   a String. As the common case on the client side is dealing
with
> > > > >> Strings, I'm not at all convinced that ubiquitous use of
> > > AMQShortString
> > > > >> is a net win for the client.
> > > > >>
> > > > >> I believe what would be optimal is to use the CharSequence
> > interface
> > > > >> everywhere. This way String values passed to us by an application
> > > could
> > > > >> be directly passed all the way down the stack and encoded
> directly
> > > onto
> > > > >> the wire without an additional copy, and incoming data could
be
> > > > >> efficiently decoded into a private impl of CharSequence that
> could
> > be
> > > > >> converted to a String on demand.
> > > > >>
> > > > >> --Rafael
> > > > >>
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message