I mentioned earlier I've been working with Service Broker lately. It meets my need of being able to create a near-real time (reportable) archive of activities. More importantly, the queuing allows the archival process to survive the sometimes unpredictable peak loads found in web applications.
[Note: Those of you who know my history know I have a background in IBM's etl toolset, and may wonder why I'm not using its SOA ETL capabilities here. We've chosen to stay vendor-neutral in the etl world, a decision that affords greater flexibility to our client base.]
My first attempt at Service Broker queuing was the "fire and forget" pattern described in Remus Rusanu's old blog. Note that Reumus is pointing out it's NOT a good pattern. In the "fire and forget" pattern the caller
- opens a dialog,
- enqueues the message, and
- ends the conversation.
ARCHIVAL TIP #1: LET THE TARGET END THE CONVERSATION
First, when there's no return queue, and when the sender ends the dialog, there's no opportunity to be notified of problems. Did the target queue disable due to five errors in a row? No notice. Any foreign key errors? Too bad, you won't be notified. Worse yet, is the target unavailable or did the message fail validation? Argh! The message flat-out disappears..... By ending the conversation the caller indicated it doesn't want to be notified of anything related to the conversation, ever.
ARCHIVAL TIP #2: DON'T BUILD PER-MESSAGE CONVERSATIONS
Second, if the caller's just sending one message the overall performance suffers. My load tests showed poor overall performance and when spinning up a conversation for every message, due to memory and disk i/o issues.
Once I realized I had a performance (and potential lost message) problem, I looked at Pro SQL Server Service Broker for ideas to dequeue messages in a more performant manner. I tried the recommendations for message structure -- including marshaling the messages as binary, which was PAINFUL -- but didn't see any consistent improvements in performance. Simon Sabin tested performance changes over decreasing conversation group size and found performance to be similar across most message-handling options when conversation groups contained only one message.
ARCHIVAL TIP #3: REUSE CONVERSATION HANDLES
At this point, you really really need to spend some time in Remus' blog. His 3-part series of posts describes how to share conversation handles, end the shared conversations after a pre-determined period (I time out conversations after an hour), and resend conversations under certain error scenarios.
Sharing conversation handles moved us past message-per-conversation performance challenges. As a result of this change data throughput more than doubled under load.
I consider these blog posts required reading. Even if you decide not to implement portions of the code, it's worth taking time to understand the recommendations.
ARCHIVAL TIP #4: LOG ERROR MESSAGES
I choose to retry saves if the error is a deadlock, and log all others to an error log table. The table stores the error number, text, message handle, and the message payload itself.
I'm not comfortable yet doing an automatic resend of errored messages in the manner Remus describes. Instead, I'll monitor and use his approach manually, as needed, until I have a better sense of which errors are appropriate for automated processing.
By logging errors to a separate table I reduce the risk of disabling the queue due to "poison messages." I use TRY..CATCH around the actual data save and only move the message to a log if it failed that data save (for something other than a deadlock). Since I'm still in development mode, the occasional primary- and foreign-key error still arises. The root-cause scenarios will be tuned out over the next couple weeks.
ARCHIVAL TIP #5: TUNE THE NUMBER OF QUEUE READERS
I've yet to find a solid recommendation for choosing the number of queue readers (threads) to activate for a given queue. One post suggested activating a queue reader per cpu. In my particular situation I've seen improved throughput with a number almost triple the cpu count.
I'm NOT suggesting that you use my numbers. Test, test, and test some more to find the thread level that allows you to keep up, or catch up, under load. For me, it was useful to play with both the MAX_QUEUE_READERS setting in the queue definition and the number of messages dequeued by the activated stored procedure at any one time (via the RECEIVE TOP(x) syntax).
ARCHIVAL TIP #6: SHARE THE CONVERSATION-ENDING QUEUE (if appropriate)
I originally didn't create response queues for messages from target to source. Not surprising, since I didn't have a clue how to manage errors or conversation ends. Once I realized my goofup, I still wanted to keep the process as simple as possible. To that end, my implementation of
Remus's example uses a single "CloseConversation" queue to receive responses from the target. Error and CloseConversation messages (CloseConversation being a user-defined message) arrive on this queue.
If it's an error message, the activated procedure logs the error.
If it's a CloseConversation message, the procedure removes the conversation handle record and closes its end of the conversation. See Remus' blog for a good description of how the whole process works.
GIVE IT A TRY
SQL Server's Service Broker is a great solution for a number of technical requirements, and well worth learning. That said, it's not the most intuitive architecture for DBAs to master. These tips should definitely be combined with Remus' blog and a copy of Klaus Aschenbrenner's book. They both contain exceptionally good descriptions and code examples that are worth the study.
A final note: I've only described architectures and tips for an archival, "fire and forget" scenario. Klaus's book describes a number of other scenarios (SODA and load balancing, for example) to which the technologies apply.

0 comments:
Post a Comment