SCOM: DeltaSynchronization Error

This error appeared in out environment recently, and didn’t go away until we changed some configuration settings in a config file on all the management servers.

Symptoms:

  • EventID 29181 in the OpsMgr eventlog.
  • Newly pushed agents show up as “Not Monitored”
  • Changes you do to a Management Pack, ex. overriding a rule, does not work.
  • Discovery of new objects is insanely slow, or does not work all together.

Example Event: 

Log Name:      Operations Manager
Source:        OpsMgr Management Configuration
Event ID:      29181
Level:         Error
User:          N/A
Computer:      server.domain.com
Description:
OpsMgr Management Configuration Service failed to execute ‘DeltaSynchronization’ engine work item due to the following exception
Microsoft.EnterpriseManagement.ManagementConfiguration.DataAccessLayer.DataAccessOperationTimeoutException: Exception of type ‘Microsoft.EnterpriseManagement.ManagementConfiguration.DataAccessLayer.DataAccessOperationTimeoutException’ was thrown.
at Microsoft.EnterpriseManagement.ManagementConfiguration.DataAccessLayer.DataAccessOperation.ExecuteSynchronously(Int32 timeoutSeconds, WaitHandle stopWaitHandle)
at Microsoft.EnterpriseManagement.ManagementConfiguration.SqlConfigurationStore.ConfigurationStore.ExecuteOperationSynchronously(IDataAccessConnectedOperation operation, String operationName)
at Microsoft.EnterpriseManagement.ManagementConfiguration.SqlConfigurationStore.ConfigurationStore.WriteConfigurationDelta(IConfigurationDeltaDataSet dataSet)
at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.DeltaSynchronizationWorkItem.TransferData(String watermark)
at Microsoft.EnterpriseManagement.ManagementConfiguration.Engine.DeltaSynchronizationWorkItem.ExecuteSharedWorkItem()
at Microsoft.EnterpriseManagement.ManagementConfiguration.Interop.SharedWorkItem.ExecuteWorkItem()
at Microsoft.EnterpriseManagement.ManagementConfiguration.Interop.ConfigServiceEngineWorkItem.Execute()

 

We went as far as opening a support case with Microsoft.
In the end, we managed to fix the problem, changing some timeoutvalues in the Config Service configfile on each and every Management Server.

Check workitem duration:

First of all, run this query against the OperationsManager database to check if you see errors in the WorkItemState table;

SELECT * FROM cs.WorkItem WHERE WorkItemName = ‘DeltaSynchronization’

If you see alot of 10’s and not all 20’s, you have a problem with syncronization.
The states are:
WorkItemStateId WorkItemStateName
1 Running
10 Failed
12 Abandoned
15 Timed out
20 Succeeded

How to fix the issue:

You may stumble upon this support tip if you did what we did – google’d the crap out of the problem: http://blogs.technet.com/b/momteam/archive/2013/01/29/support-tip-config-service-deltasynchronization-process-fails-with-timeout-exception.aspx

This provides most people with a fix to the problem, but we had to change another parameter to get stuff working again.
In addition to thanging the <Category Name=”Cmdb”> TimeoutSeconds values, we needed to change the <Category Name=”ConfigStore”> TimeoutSeconds values.

Our config now looks like this:

<Category Name=”Cmdb”>
<OperationTimeout DefaultTimeoutSeconds=”300″>
<Operation Name=”GetEntityChangeDeltaList” TimeoutSeconds=”300″ />

<Category Name=”ConfigStore”>
<OperationTimeout DefaultTimeoutSeconds=”300″>

After changig this on EVERY management server, and restarting the Config Service, things started to work again.