Friday, 25 December 2009

WMI Troubleshooting

Source URL:
http://blogs.technet.com/configmgrteam/archive/2009/05/08/wmi-troubleshooting-tips.aspx

WMI Troubleshooting Tips

The Windows Management Instrumentation (WMI) subsystem has evolved over time to become a key dependency for many applications, Configuration Manager being one of them. We've been intertwined since the .698 build of WMI shipped with Systems Management Server 2.0 (there was even a WBEM Inventory component in SMS 1.2).

Given this history and relationship it's important to know what to do when something goes wrong.

I've spent many years investigating / troubleshooting WMI related issues - especially as they relate to SMS / Configuration Manager. Based on that experience I've compiled a few tips and general observations for the community. This list is by no means comprehensive.
Assumptions are made regarding a basic understanding of WMI, such as general structure, terms and usage of the WBEMTest tool.

Don't delete the repository (though it may make problems seem to go away).

Rebuilding the WMI repository is a destructive operation that can lead to data loss, applications breaking, and a whole host of slow to appear, difficult to diagnose problems.
Generally speaking, the only time this operation should ~really~ be necessary is in the case of true corruption as indicated by tools such as WMIDiag or Winmgmt /verifyrepository.

Can lost WMI data be recovered?
Probably, but that's never a good state to be in. Therefore, I say avoid this operation whenever possible.
On the flip side, I certainly recognize there is a tradeoff between operational needs and individual investigations.
Over time some customers have seen that rebuilding the repository makes a problem seem to go away quickly. Typically this also comes with a loss of ability to find root cause, could mask other problems, and may not actually solve anything long term. On the whole I strongly recommend against deleting the repository folder as a means to resolving WMI issues.

What can I do other than rebuild the repository?

One low risk, potentially high gain operation that can be performed is to recompile MOF files, and register component DLL's associated with WMI operations. If an important class or component registration needed for WMI operation was somehow removed you can put the needed structure back.

These steps can be automated easily, but aren't generally recommended on a large scale as they too can mask issues. This is just one more option to try short of rebuilding the repository. There are variations of the steps below available between XP and Vista, but this most basic version should work for either.

1. Open a CMD prompt on the server and change directory to %windir%\System32\WBEM (\SysWOW64\WBEM on x64)

2. Execute the following:
FOR /f %s in ('dir /b /s *.dll') do regsvr32 /s %s
Net stop /y winmgmt
FOR /f %s in ('dir /b *.mof *.mfl') do mofcomp %s
Net start winmgmt

Note: Don't attempt to compile the MOF files in the \bin\i386 folder on a site server, as we contain stub files (names start with an underscore character such as _smsprov.mof) that need to be populated with site specific data through other means.

Outside of that, there are many troubleshooting options available, depending on the amount of time you can spend investigating.

Where can I find the log files and error codes?

Start here: the WMI Troubleshooting page on MSDN. This page serves as a jumping off point for many important details such as logging and tracing information, WMI Error constants, and more.

Common Errors

These errors are referenced in greater detail on the WMI Troubleshooting page and subsequent links but I still wanted to mention them here.

WBEM_E_NOT_FOUND – 0x80041002
The Not Found message was very common in XP log files, a little less so in Vista and up. Without context this one isn't very helpful, as you have no way of knowing if the requested data is supposed to be present. Simply put, it may not always be a bad thing.

Access Denied
Echoing the troubleshooting page, if you're seeing 0x80070005 (E_ACCESS_DENIED) when connecting you're being turned away by DCOM, not WMI. Similarly the 0x800706BA (RPC_S_SERVER_UNAVAILABLE) means you're being turned away before you've talked to DCOM or WMI. A Network capture is often the quickest way to get to make progress for the RPC error.
There's also a bit more info in the Remoting and Security blog entry from the WMI team.

WBEM_E_PROVIDER_LOAD_FAILURE - 0x80041013
The Provider Event Troubleshooting Classes are a great resource, but may be a little overwhelming. The MSFT_WmiProvider_LoadOperationFailureEvent class is one that I've found useful quite often. Most Provider Load Failures I've encountered have been the result of bad component registration (either in the registry or WMI), or permissions related.
Could also be a corrupted cimwin32.dll

WBEM_E_INVALID_CLASS - 0x80041010 / WBEM_E_INVALID_NAMESPACE - 0x8004100E
Similar to the Not Found error, context is important here. Some operation was being performed against a class / namespace that isn't present on the target machine.
Is that bad? Depends on the situation. It may be perfectly normal. If investigation tell you it's not, the class or namespace can usually be recovered by recompiling the appropriate MOF file.

Generic Failure - 0x80004005
Among the least helpful errors, and not WMI specific. I only bring it up here as many people see this and mistakenly think it's an Access Denied message given the 5 at the end. Remember access denied is 0x80070005

WMIDiag
An invaluable tool for diagnosing WMI issues, even if it's a little dated.
http://www.microsoft.com/technet/scriptcenter/topics/help/wmidiag.mspx
It has many configuration options available and can be deployed via Configuration Manager. One of the more helpful features is the report that is generated at the end. It contains details on how to correct many common issues that are found when running the tool.

Tracking resource usage of WMI
By default the core WMI service lives in the shared Network Services instance of scvhost.exe. This can make debugging or identifying resource issues a little challenging. As a general rule of thumb I run (and recommend to customers) that they keep WMI separated into its own instance of svchost.

On XP/Server 2003 this can be accomplished automatically via the following case sensitive command:
RUNDLL32.EXE %Systemroot%\SYSTEM32\WBEM\WMISVC.DLL,MoveToAlone
For Vista and up this is done with
winmgmt /standalonehost

Where is the provider?
The WMI Provider host process (wmiprvse.exe) will create one instance for each different hosting (security) model defined. To find out which instance by PID a given provider resides in (such as smsprov.dll) you can simply run
Tasklist /m smsprov.dll
It is possible to isolate a provider into its own instance by changing the hosting model.

This is fairly rare and not necessarily a best practice, but if you're running into resource or performance problems that could be traced back to multiple providers running in the same instance, it may be worth investigating a split - at least for the purpose of issue isolation. The Provider Hosting and Security page has more information.

WMI configuration
There are quite a few options available for tuning WMI performance. Two that I'll cover here are important for Configuration Manager Site (provider) servers - the MemoryPerHost and HandlesPerHost values that can be found in the __ProviderHostQuotaConfiguration class in the root namespace.
First a little background:
For each instance of WMIPrvse.exe that is running, the classes above dictate how much virtual memory or handles that instance may consume. When exceeding that limit the process may terminate, or in some rare cases may hang.
As more providers for various applications are being used on server machines, and Configuration Manager environments get larger, it's expected to see increased resource usage with our provider.

Prior to Vista the limits were 128MB (134217728 bytes) and 4096 handles.

In a large Configuration Manager environment (in terms of number of objects that exist, such as collections, advertisements, AdminUI connections, as well as clients) you could definitely exceed those limits.
Quadruple the memoryperhost value to 512MB - 536870912 is the value to enter - is what I recommend to all my customers.

512 is even the default value now on Vista and above, further indication that a larger limit was needed.

If performance monitoring tools indicate that you're hitting or exceeding the 4096 handle limit, you can increase that as well but be a little more conservative since handles are a shared resource. It could likely be doubled but I usually recommend 5120, again if monitoring indicates an increase is needed.

It's important to remember that increased memory usage alone is not an indication of a problem state, or a leak - it's quite likely just normal behavior. In other words, many objects (and perhaps many objects from multiple remote connections) mean more resources required to handle everything.

If you see the Process ID (PID) of wmiprvse.exe that hosts smsprov.dll changing frequently, or multiple instances of smsprov.dll loaded you should definitely increase this value. Some customers have reported an increase helping with Administrator Console performance as well.

WMI repository stability fix
Lastly, if you're still on XP SP2 or Server 2003 SP1 or SP2 you should apply this fix to help further stabilize the repository files. Note it won't correct a system already having problems but makes for good preventative maintenance.
http://support.microsoft.com/kb/933062

In addition - If WMI does not work - SBS Monitoring and Reporting will not work either - Will probably need to be re-installed after repairing WMI.

No comments:

Post a Comment