qpid-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kim van der Riet" <kim.vdr...@redhat.com>
Subject Re: Review Request 23305: [C++ broker] Make memory usage consistent after broker restart
Date Tue, 22 Jul 2014 13:43:09 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23305/#review48360
-----------------------------------------------------------


This is a fundamentally difficult problem to solve, especially since the legacy/linearstore
was conceptualized as a per-queue store rather than as a centralized message store. There
are trade-offs between these two approaches, and the ability to track the same message over
multiple queues is one of these. A centralized store design would make this easy, but the
overhead of tracking what each queue is doing with each message becomes difficlut. A per-queue
store takes away the message-tracking issues, but leaves makes cross-queue correlation difficult.
There is no consensus on which of these design approaches is best, and when what is now the
legacystore was designed, per-queue throughput was considered an important criterion, and
so the per-queue design was used.

The linear and legacy stores require locally unique ids which span recoveries. By locally,
I mean per-queue unique. This means that the same persistence id can be used in different
queues for different messages. By spanning recoveries, I mean that once a persistence id is
"used" by consuming a message, it must also be purged from the file system (in the case of
the legacystore, files with both enqueue and dequeue records must be overwritten and in the
case of linearstore, returned to the EFP) so that it will not be encountered during the recovery
process as it reads all the files to determine which messages are current. For legacystore,
it is also helpful if the persistence ids are monotonically increasing, but this does not
affect the accuracy of the recovery, only the speed of the recovery. There is nothing wrong
with the broker setting the persistence ids to globally unique values so long as these requirements
are met. As previous comments have already stated, we should make sure 
 that no other implementations of the store will be broken by this.

In your proposed solution, you introduce a std::map to contain the recovered persistence ids.
My concern with this is twofold, both around recovering large numbers of messages:

1. Performance when recovering large numbers of messages: For each unique message, two methods
of logarithmic complexity must be executed: std::map::find() followed by std::map::operator[].
For small numbers of messages, this may not be noticeable, but as the number of messages increase,
these two methods will become rapidly slower. Remember also that the messages in the map will
not be just current messages, but all messages read from the journal files as the store plays
back the enqueueing and dequeueing of all visible messages.

2. Memory footprint: Once the recovery is complete, the map contains all the persistence ids
of all messages seen during recovery. Although the size of the persistence ids stored is likely
small next to the message content of the messages, the sheer number of them may reduce or
even negate the memory savings advantage of this approach. Of course, this could be made a
temporary issue by simply clearing the map once the recovery is complete as these no longer
serve any purpose beyond the recovery itself.

3. Thread safety: Currently, IIRC, only one thread is used to recover the various queue stores
serially. If that should change, and each queue is recovered independently on separate threads,
then your map will require a lock.

If use-cases where the message size is large and the number of queues/messages is relatively
small, then this approach could still provide a benefit. Perhaps some measurements of performance
and memory footprint are needed to ascertain where these trade-offs make sense.

- Kim van der Riet


On July 8, 2014, 12:53 p.m., Pavel Moravec wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/23305/
> -----------------------------------------------------------
> 
> (Updated July 8, 2014, 12:53 p.m.)
> 
> 
> Review request for qpid, Gordon Sim and Kim van der Riet.
> 
> 
> Bugs: QPID-5880
>     https://issues.apache.org/jira/browse/QPID-5880
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Simple idea:
> - in Queue::enqueue, set PersistableMessage::persistencyID "manually" to some unique
number that is identical to all message instances that has common SharedState - e.g. to the
pointer to SharedState
> - during journal recovery, if we recover a message with already seen persistencyID, use
the previous seen instead with its SharedState and PersistableMessage bits
> 
> Known limitation:
> - message annotations added to some queue (e.g. due to queue sequencing enabled) will
be either over-written or shared to all other queues during recovery
> 
> The patch contains a new QueueSettings option to enable (by default disabled) this feature
on per queue basis. This somehow limits the limitation above.
> 
> Isn't storing pointer to SharedState to the disk (via persistencyID) some sort of security
breach? (I dont think so, but worth to ask)
> 
> Can't manual setup of persistencyID break something in store? (AFAIK no as uniqueness
of the ID is assured: 1) a new / different message with the same persistencyID can appear
only after the previous instance is gone from memory, and 2) only queues with the option enabled
are checked for message coupling)
> 
> Will it help in cluster? No, it won't. As when primary broker gets 1 message to an exchange
that distributes it to 100 queues, th broker updates backup brokers via 100 individual "enqueue
1 message to queue q[1-100]" events. So backup brokers consume more memory than primary -
the same amount like primary does not share SharedState at all.
> 
> So it is reasonable for standalone brokers only.
> 
> 
> Diffs
> -----
> 
>   /trunk/qpid/cpp/src/qpid/broker/Queue.cpp 1608083 
>   /trunk/qpid/cpp/src/qpid/broker/QueueSettings.h 1608083 
>   /trunk/qpid/cpp/src/qpid/broker/QueueSettings.cpp 1608083 
>   /trunk/qpid/cpp/src/qpid/legacystore/MessageStoreImpl.h 1608083 
>   /trunk/qpid/cpp/src/qpid/legacystore/MessageStoreImpl.cpp 1608083 
>   /trunk/qpid/cpp/src/qpid/linearstore/MessageStoreImpl.h 1608083 
>   /trunk/qpid/cpp/src/qpid/linearstore/MessageStoreImpl.cpp 1608083 
> 
> Diff: https://reviews.apache.org/r/23305/diff/
> 
> 
> Testing
> -------
> 
> No significant difference in memory consumption before & after restart (in setup
of 500 queues with qpid.store_msgID=true and thousands of messages sent via fanout exchange
to all of them).
> 
> Automated tests passed.
> 
> 
> Thanks,
> 
> Pavel Moravec
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message