Thursday, September 3, 2009

The XLANG/s runtime failed in terminating the instance

I have a scope in my orchestration the construct 2 different XLANG messages and add entry in the database.



In order to handle failure gracefully, I add an Exception Handler to perform error handling and then terminate the orchestration. However, when I run the orchestration, I get the following error message.

Event Type: Error
Event Source: XLANG/s
Event Category: None
Event ID: 10001
Date: 6/9/2009
Time: 9:34:26 PM
User: N/A
Computer: TOR-12653
Description:
The XLANG/s runtime failed in terminating the instance 99f16670-471d-4cb2-a299-3c5f84547efd of service type "ArielBPM.BizTalk.RequestManagement.LOC.Orchestrations.Orch_CobraCoverageTermination". Exceptions.Messages.Preparation

Exception type: BTXServiceTerminateFailedException
Source: System.Xml
Additional error information:


The string '' is not a valid Boolean value.

Exception type: FormatException
Source: System.Xml
Target Site: Boolean ToBoolean(System.String)
The following is a stack trace that identifies the location where the exception occured


at System.Xml.XmlConvert.ToBoolean(String s)
at Microsoft.XLANGs.RuntimeTypes.XmlHelpers.ParseString(String s, Type targetType, Boolean throwIfNotParsed)
at Microsoft.XLANGs.RuntimeTypes.XmlHelpers.ChangeType(Object val, Type targetType)
at Microsoft.XLANGs.Core.XMessage.GetContentProperties()
at Microsoft.BizTalk.XLANGs.BTXEngine.BTXXlangStore.PrepareMessage(XLANGMessage msg, IList promoteProps, IList toPromote)
at Microsoft.BizTalk.XLANGs.BTXEngine.BTXXlangStore.PersistMessageState(XLANGMessage msg)
at Microsoft.XLANGs.Core.XMessage.PrePersist(Boolean dehydrate, IStateManager state)
at Microsoft.XLANGs.Core.ReferencedMessages.PrePersist(Boolean dehydrate, Context state)
at Microsoft.XLANGs.Core.Context.PrePersist(Boolean dehydrate, Context state)
at Microsoft.XLANGs.Core.Context.PrePersist(Boolean dehydrate, XlangStore store)
at Microsoft.XLANGs.Core.ExecutingSubStates.PrePersist(Boolean dehydrate, Context ctx)
at Microsoft.XLANGs.Core.Context.PrePersist(Boolean dehydrate, Context state)
at Microsoft.XLANGs.Core.ServiceContext.PrePersist(Boolean dehydrate, Context state)
at Microsoft.XLANGs.Core.Service.PrePersist(Boolean dehydrate, Context ctx)
at Microsoft.XLANGs.Core.Service.Persist(Boolean dehydrate, Context ctx, Boolean idleRequired, Boolean finalPersist, Boolean bypassCommit, Boolean terminate)
at Microsoft.BizTalk.XLANGs.BTXEngine.BTXService.TerminateSchedule(Envelope env)

For more information, see Help and Support Center at
http://go.microsoft.com/fwlink/events.asp.



As you can see from the image above, there's no mechenism to retry or loop after Exception is caught. However, this doesn't seem to reflect the reality. When I get this error message, the orchestration instance is not terminated, nor it's suspended. Instead, after the above XLANG error is thrown, the entire orchestration is restarted all the way from the beginning and it keeps looping like this indefintely.

I try Googling it, however, the only place that I can find that has similar exception is here: http://www.eggheadcafe.com/forumarchives/biztalkorchestration/Oct2005/post24389371.asp. He/she was able to fix the problem by changing the scope's transaction type from "Long Running" to "None". However, the transaction type of my failing scope is "None" to start with. Hence it doesn't seem like the solution for my problem.

After studying the error message more closely with my colleague , we are suspecting the orchestration fails when it tries persist to the database. In order to confirm that, I purposely put a "Delay" shape before the "Terminate" shape, which will force the orchestration to be de-hydrated.
It turns out that the test result confirms our theory. I get the same error message when the "Delay" shape is being executed. It seems like we are one step closer to finding out the root cause of the problem.

So, what cause the orchestration to fail when it tries to persist the state to the database? This remains to be the mystery, but not for long. After looking through my orchestration and the error message a couple more times, I find a clue that directs me to the offending shape that cause the problem.

Firstly, the error message says "The string '' is not a valid Boolean value", which indicates it's trying to convert a empty string to a Boolean value. This reminds me that I have a created an empty message before the failure point (in the "Init" Expression shape)


construct MSG_ArielBPMLOC_Common{
MSG_ArielBPMLOC_Common = ArielBPM.Biztalk.Helper.XLANG.Construction.ConstructEmptyMessage (typeof(ArielBPM.BizTalk.RequestManagement.LOC.InternalSchemas.ArielBPMLOC));
}





The reason that I creating an empty message before the "Decide" shape is as follows. My orchestration listens for 2 different message types as possible inputs. Depends on the message type received, a common message type is being constructed in both branches of the "Decide" shape. However, because of the way the "Decide" shape is implemented, although the same message is being constructed in both branches, it still complains about "use of unconstructed message" when I try to compiles the solution. Because of this, I have to explicitly construct a dummy empty common message before the "Decide" shape. The created empty message is only an instance of the schema without any data (similar to xml below):

<RootNode>
<BooleanTypeElement />
<StringTypeElement />
<IntegerTypeElement />
</rootnode>


An error (Microsoft.XLANGs.RuntimeTypes.XPathUpdateException - see below) occur before the common mesage the constructed in one of the "Decide" shape branches. The "Exception Handler" catches the exception and it tries to terminate the orchestration using the "Terminate" shape.


Type: Microsoft.XLANGs.RuntimeTypes.XPathUpdateException

Message: A failure occurred while evaluating the distinguished field RequestHeader.RequestID against the message part data. The message part data does not contain at least one of the nodes specified by the XPath expression (listed below) that corresponds to the distinguished field. The cause for this error may be that the message part data has not been initialized or that the message part data does not conform to the message part schema. Ensure that the message part data is initialized correctly. XPath expression: /*[local-name()='LossOfCoverageRequest' and namespace-uri()='http://ArielBPM.Biztalk.RequestManagement.LOC.InternalSchemas']/*[local-name()='RequestHeader' and namespace-uri()='http://ArielBPM.Biztalk.RequestManagement.LOC.InternalSchemas']/*[local-name()='RequestID' and namespace-uri()='http://ArielBPM.Biztalk.RequestManagement.LOC.InternalSchemas']

StackTrace:
at Microsoft.XLANGs.Core.XSDPart.SetDistinguishedField(String dottedPath, Object val)
at ArielBPM.BizTalk.RequestManagement.LOC.Orchestrations.Orch_CobraCoverageTermination.segment6(StopConditions stopOn)
at Microsoft.XLANGs.Core.SegmentScheduler.RunASegment(Segment s, StopConditions stopCond, Exception& exp)

Source: Microsoft.XLANGs.Engine


I believe when the "Terminate" shape is being executed, it tries to evaluate (or validate) the empty common message before it persist its state to the database. However, since the common message is still empty and validation will fail when it validates an empty element which is supposed to be of type boolean. In order to test my theory, I make the following changes in the "Init" Express shape to construct the common message and populate it with some dummy data.


construct MSG_ArielBPMLOC_Common{
MSG_ArielBPMLOC_Common = ArielBPM.Biztalk.Helper.XLANG.Construction.ConstructMessageWithData (typeof(ArielBPM.BizTalk.RequestManagement.LOC.InternalSchemas.ArielBPMLOC));
}



As soon as I make this changes to the orchestration, although I am still getting the Microsoft.XLANGs.RuntimeTypes.XPathUpdateException, the "Exception Handler" is able to catch the exception and terminate the orchestration successfully.

Wednesday, August 26, 2009

There was a failure executing the receive pipeline: Reason: Unexpected event ("eos") in state "processing_header"

Error found in Application Event Log:

Event Type: Error
Event Source: BizTalk Server 2006
Event Category: BizTalk Server 2006
Event ID: 5719
Date: 8/26/2009
Time: 4:00:45 AM
User: N/A
Computer: CTOR-UBIZTALK01
Description:
There was a failure executing the receive pipeline: "Microsoft.BizTalk.DefaultPipelines.XMLReceive, Microsoft.BizTalk.DefaultPipelines, Version=3.0.1.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" Source: "XML disassembler" Receive Port: "RP_ArielBPM_WF_Process_Input_Msg_UAT" URI: "FORMATNAME:DIRECT=OS:CTOR-UBIZTALK\ARIELBPM_WORKFLOW_UAT" Reason: Unexpected event ("eos") in state "processing_header".

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.


Reason:

The following message was submitted to the queue:

<LossOfCoverageRequests xmlns="http://ArielBPM.Biztalk.RequestManagement.Schemas"
xmlns:i="http://www.w3.org/2001/XMLSchema-instance" />





The envelop schema definition of the message type expecting:

<xs:schema xmlns:tns="http://ArielBPM.Biztalk.RequestManagement.Schemas" xmlns:b="http://schemas.microsoft.com/BizTalk/2003" elementFormDefault="qualified" targetNamespace="http://ArielBPM.Biztalk.RequestManagement.Schemas" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:annotation>
<xs:appinfo>
<schemaInfo is_envelope="yes" root_reference="LossOfCoverageRequests" xmlns="http://schemas.microsoft.com/BizTalk/2003" />
</xs:appinfo>
</xs:annotation>
<xs:element name="LossOfCoverageRequests" type="tns:LossOfCoverageRequests">
<xs:annotation>
<xs:appinfo>
<recordInfo body_xpath="/*[local-name()='LossOfCoverageRequests' and namespace-uri()='http://ArielBPM.Biztalk.RequestManagement.Schemas']" xmlns="http://schemas.microsoft.com/BizTalk/2003" />
</xs:appinfo>
</xs:annotation>
</xs:element>
<xs:complexType name="LossOfCoverageRequests">
<xs:sequence>
<xs:any />
</xs:sequence>
</xs:complexType>
</xs:schema>

Monday, July 20, 2009

bts_UpdateMsgbox_BizTalkServerApplication causing MessageBox deadlock

I encountered the following list of errors when QA team was submitting a whole bunch (about 70) of test requests to BizTalk.

-----------------------------------------------------------------------------
Event Type: Error
Event Source: BizTalk Server 2006
Event Category: BizTalk Server 2006
Event ID: 6912
Date: 7/20/2009
Time: 2:23:27 PM
User: N/A
Computer: CTOR-QBIZTALK01
Description:
The following stored procedure call failed: " { call [dbo].[bts_UpdateMsgbox_BizTalkServerApplication]( ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)}". SQL Server returned error string: "Transaction (Process ID 59) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.".

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
-----------------------------------------------------------------------------
Event Type: Error
Event Source: XLANG/s
Event Category: None
Event ID: 10008
Date: 7/21/2009
Time: 1:12:15 AM
User: N/A
Computer: CTOR-QBIZTALK01
Description:
Uncaught exception (see the 'inner exception' below) has suspended an instance of service 'ArielBPM.BizTalk.RequestManagement.QueueUpAndNotify.QUAN_Orchestration(3c1decfe-a30b-14b9-c942-044cec6befed)'.
The service instance will remain suspended until administratively resumed or terminated.
If resumed the instance will continue from its last persisted state and may re-throw the same unexpected exception.
InstanceId: b2141f90-2d02-4cbd-b04e-d9de8770275f
Shape name: Start SndDocument ODX 4 Init Docs
ShapeId: 87668c55-e194-41f2-802c-34f909d271dd
Exception thrown from: segment 15, progress 11
Inner exception: Exception occurred when persisting state to the database.
Exception type: PersistenceException
Source: Microsoft.XLANGs.BizTalk.Engine
Target Site: Void Commit()
The following is a stack trace that identifies the location where the exception occured

at Microsoft.BizTalk.XLANGs.BTXEngine.BTXXlangStore.Commit()
at Microsoft.XLANGs.Core.Service.Persist(Boolean dehydrate, Context ctx, Boolean idleRequired, Boolean finalPersist, Boolean bypassCommit, Boolean terminate)
at Microsoft.XLANGs.Core.ServiceContext.PendingCommit(Boolean ignore, XMessage msg)
at Microsoft.XLANGs.Core.ExceptionHandlingContext.PendingCommit(Boolean ignoreCommit, XMessage msg)
at Microsoft.XLANGs.Core.ExceptionHandlingContext.PendingCommit(Boolean ignoreCommit, XMessage msg)
at Microsoft.XLANGs.Core.ExceptionHandlingContext.PendingCommit(Boolean ignoreCommit, XMessage msg)
at Microsoft.BizTalk.XLANGs.BTXEngine.BTXService.ExecService(Context cxt, Type serviceType, Object[] parameters, ActivityFlags flags)
at Microsoft.XLANGs.Core.Service.ExecService(Context cxt, Type serviceType, Object[] parameters)
at ArielBPM.BizTalk.RequestManagement.QueueUpAndNotify.QUAN_Orchestration.segment15(StopConditions stopOn)
at Microsoft.XLANGs.Core.SegmentScheduler.RunASegment(Segment s, StopConditions stopCond, Exception& exp)
Additional error information:

A batch item failed persistence Item-ID 26ce289e-2325-4d2e-8cd2-121a216259af OperationType MAIO_CommitBatch Status -1061151942 ErrorInfo A database failure occurred due to a deadlock. .
Exception type: PersistenceItemException


For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
-----------------------------------------------------------------------------

Base on the volume, it didn't seem like a huge load. At first, I thought this is most likely caused by the BizTalk MessageBox Database running out of space. It was trying to increase the size of the database, yet it couldn’t keep up with the speed of the message coming in. Hence I tried to fix by increasing the DB Autogrowth to a higher number (for example increase from default of 1MB to 100MB). Nonetheless, this attempt didn't seem to resolve the problem. I was still seeing the same errors after this change.


Since the first attempt wasn't successful, the search went on. Then I found a couple forums mentioned that this issue could be related to memory. I decided to check the specification of the server that's hosting BizTalk server. Surprisingly, I found that the server only have 1GB of RAM, which barely meets the minimum requirement of BTS 2006 R2 installation. Looking at the BizTalk server 2006 R2 installation guide, it states that minimum memory requirement is 1 GB of RAM (2 GB is recommended). However, the server only have 1 GB of RAM, not to mention that the server is also being used as the Application DB server and BizTalk DB server. I was wondering if this could be the cause of the deadlocks. To confirm if that's the case, I request the SI team to increase the physical memory of the server that's hosting the BizTalk 2006.

While I was waiting for the physical memory, I kept searching and found 2 Microsoft KB articles the talks about how to maintain and troubleshoot BizTalk Server databases (Microsoft KB 952555) and blocking, deadlock conditions, or other SQL Server issues when you try to connect to the BizTalkMsgBoxDb database in BizTalk Server (Microsoft KB 917845). Based on the articles, I tried the following 3 steps:

Step 1: Disable the Auto Create Statistics option and the Auto Update Statistics option on the BizTalkMsgBoxDb database in SQL Server.

Step 2: Set the max degree of parallelism option to 1 on the computer that is running SQL Server.

Step 3: I ran "DBCC SHOWCONTIG" against the BizTalkMsgBox. According to the KB article, if the Scan Density value that is returned by the DBCC SHOWCONTIG statement is less than 30 percent, the BizTalk Server indexes can be rebuilt during downtime. There are a couple result of the scan caught my eyes. The Scan Density of some of the tables are less than 15%. These results indicated that BizTalk Message Box is highly defragmented. Hence there's a need to re-build index for the BizTalk Message Box database.

------------------------------------------------------------------
DBCC SHOWCONTIG scanning 'PredicateGroupZeroSum1' table...
Table: 'PredicateGroupZeroSum1' (50099219); index ID: 1, database ID: 10
TABLE level scan performed.
- Scan Density [Best Count:Actual Count].......: 13.64% [6:44]
------------------------------------------------------------------
DBCC SHOWCONTIG scanning 'Subscription' table...
Table: 'Subscription' (1205579333); index ID: 1, database ID: 10
TABLE level scan performed.
- Scan Density [Best Count:Actual Count].......: 14.29% [1:7]
------------------------------------------------------------------
DBCC SHOWCONTIG scanning 'EqualsPredicates' table...
Table: 'EqualsPredicates' (1381579960); index ID: 1, database ID: 10
TABLE level scan performed.
- Scan Density [Best Count:Actual Count].......: 13.47% [26:193]
------------------------------------------------------------------
DBCC SHOWCONTIG scanning 'EqualsPredicates2ndPass' table...
Table: 'EqualsPredicates2ndPass' (1397580017); index ID: 1, database ID: 10
TABLE level scan performed.
- Scan Density [Best Count:Actual Count].......: 12.79% [60:469]
------------------------------------------------------------------
DBCC SHOWCONTIG scanning 'PredicateGroup' table...
Table: 'PredicateGroup' (1733581214); index ID: 1, database ID: 10
TABLE level scan performed.
- Scan Density [Best Count:Actual Count].......: 13.11% [8:61]
------------------------------------------------------------------
DBCC SHOWCONTIG scanning 'Spool' table...
Table: 'Spool' (1781581385); index ID: 1, database ID: 10
TABLE level scan performed.
- Scan Density [Best Count:Actual Count].......: 12.52% [489:3905]
------------------------------------------------------------------
DBCC SHOWCONTIG scanning 'Parts' table...
Table: 'Parts' (1909581841); index ID: 1, database ID: 10
TABLE level scan performed.
- Scan Density [Best Count:Actual Count].......: 12.65% [76:601]
------------------------------------------------------------------
DBCC SHOWCONTIG scanning 'MessageParts' table...
Table: 'MessageParts' (1941581955); index ID: 1, database ID: 10
TABLE level scan performed.
- Scan Density [Best Count:Actual Count].......: 12.63% [49:388]
------------------------------------------------------------------
DBCC SHOWCONTIG scanning 'MessageRefCountLog1' table...
Table: 'MessageRefCountLog1' (1957582012); index ID: 1, database ID: 10
TABLE level scan performed.
- Scan Density [Best Count:Actual Count].......: 12.64% [68:538]
------------------------------------------------------------------
DBCC SHOWCONTIG scanning 'PartRefCountLog1' table...
Table: 'PartRefCountLog1' (2085582468); index ID: 1, database ID: 10
TABLE level scan performed.
- Scan Density [Best Count:Actual Count].......: 12.57% [46:366]
------------------------------------------------------------------

In order to re-build the indexes, I partially stop BizTalk Applications and Application Host Instances. Update the store procedure bts_RebuildIndexes according to Microsoft KB article 917845. Then I run the bts_RebuildIndexes stored procedure in BizTalk Message Box database (BizTalkMsgBoxDb) to rebuild the indexes. I refresh BizTalk Group, restart the BizTalk Applications and Application Host Instances after rebuilding the indexes. Then I run the test again submitting about 1000 requests at the same time (about 15 times of the original failed tests). And guess what, there's no more error.

Although I find the fix for the database deadlock, I still need to find out the root cause of the problem. I recall that there's a set of SQL server agent jobs installed when BizTalk is installed. The purpose of these jobs are to automate the administrative tasks for the Message Box, Management Database and other BizTalk databases. These jobs are critical to the health of the BizTalk server. If they fail running successfully, the performance of BizTalk server will degrade severely in the long run. One of the jobs is responsible for removing physical messages from the message box that have no more subscribers remaining. If this job fails, the orphan messages will remain in the message box forever, which will slow down the BizTalk server and eventually lead to the system being in an unmanageable state that will stop the BizTalk server. Having this in mind, I connect to the database server trying to look for any failed SQL server agent jobs for BizTalk databases. Surprisingly, I find that all the SQL server agent jobs are disabled and have never been run ever since BizTalk is installed. In order to prevent the same problem from happening again, we definitely need to enable these SQL server agent jobs and monitor to make sure they run successfully all the time.