As the name of the post implies, something was terribly wrong with Outlook. Or so it appeared.
This was a visit to a customer with poor user experience running Outlook 2016 on Windows 7. The level of impact was clear long before I even stepped on site. Management were under the impression that Outlook was crippling their desktops and blocking other business critical applications from working. The reported symptoms were:
-
Windows was unable to connect to HTTPS sites
-
IE was unable to connect to HTTPS
-
Some 3rd party browsers were also unable to connect to HTTPS sites
-
Outlook stopped responding and so was the cause of all of the above
Interestingly enough, HTTP traffic was not affected. That was of note as it proved base network connectivity was present, and we could rule out issues with connectivity. The customer had spent some time investigating the issue, though troubleshooting was complicated by the deployment of file system anti virus, host based intrusion protection, data loss prevention, disk encryption and other assorted third party goodies. This is typical for many enterprises, and requires that all of the technology stakeholders work together to solve the issue and determine the root cause.
Root cause had not yet been identified when I went onsite to assist with Office 365. It was reported that Outlook would go into a disconnected state, and would no longer connect to Office 365 unless the workstation was restarted. Outlook connects to Exchange, and since I do Exchange that was enough of a link for me to be "volunteered"....
Troubleshoot This
As with many issues, a choice needs to be made between relief and root cause. In order to troubleshoot the issue, a broken machine was required to investigate. Most users were simply restarting their workstation when something squirrely happened and the issue could not be easily reproduced. After a couple of days we got a winner, and a broken machine was available to investigate.
Network Monitor 3.4 was used to capture the traffic on the machine. The intent was to review was was hitting the wire just in case the proxy server was blocking the traffic for any reason. No HTTPS traffic was present whatsoever. We could see that the TCP three way handshake would complete, but there was absolutely no SSL handshake. Nothing. That was a tad weird…
Only the Windows firewall was present. No third party firewalls were installed, and disabling Windows firewall made no difference.
Pass the Network Traffic
The client started the process to initiate traffic to the remote server, but never progressed past the initial TCP handshake. There was no SSL handshake. Why was it not processing traffic? The build of tcpip.sys and afd.sys were not the latest, but not terribly out-dated. They were updated, but as expected that made no difference.
What else was happening to the machine that could impact Outlook? Was there anything else running or doing something to the machine?
More Broken Than You Know
Testing the broken machine showed that there were more facets to the original problem statement than originally noted. The problem statement was updated to include:
-
Windows was unable to connect to HTTPS sites
-
IE was unable to connect to HTTPS
-
Some 3rd party browsers were also unable to connect to HTTPS sites
-
Unable to open local certificate MMC console
-
Unable to open IE certificate store
-
Unable to open device manager and view properties of a device driver
Being able to open IE’s certificates or view device drivers should be possible even when offline in an airplane. Understandably Certificate Revocation Lookup may fail. but I can still view device driver details. I know, as I’m writing this on a plane, and can view driver details and open the certificate store (highlighted box below).
Lack of network access alone will not cause all of these symptoms, so something else was badly wrong on the machine. Let's start with something simple –> Task Mangler.
Clicking View then Select columns allows us to add additional columns into Task Manager in Windows 7. In newer versions, just right click an existing column to get to the same point. Amongst others I always like to see handles and threads. This dates from when I used to have to host applications and get them working in a locked down IIS environment. Milo – I still use all the good tips you taught me!
The below is what I would expect to see on a typical Windows 7 system. Note that the view is sorted on the handles columns (as highlighted) and that the largest amount of handles associated to a single process is just over a thousand.
When reviewing the afflicted customer machine, the largest consumer of handles was a 3rd party CRM process with 37,000 handles. Yes one app was using 37 times the handles of the above svchost process. That is not good.
OK – now we have a potential cause. We need get some more data on this before doing anything else. Using Sysinternal’s Process Explorer we were able to review the current performance characteristics of the process. No excessive disk or or CPU load was found. That was part of the reason it was not initially detected.
Process Explorer also incorporates the handy dandy Find Handle or DLL feature. This is available from the Find menu as noted below.
When searching for the offending process executable, would we find the handles listed?
Yup. All 37,000 of them. Interestingly they were connecting from the 3rd party executable to CSRSS.exe. That is the client server runtime subsystem, why was this process creating handles and not releasing them? No idea. I'm a cable-plugger, go ask the developer!
When presenting this to the customer, there was some shock as the person in question did not even use the 3rd party CRM tool which had been identified. However this was running as a Win32 service so regardless if the logged on user was actually using the CRM application or not, it was leaking handles in the background.
Killing In the Name
Now that we had identified the application which was leaking the handles, instant relief was obtained by either:
-
Killing the process
-
Stopping the service
-
Disabling the service and restarting the workstation
The vendor was contacted, and since the customer had not updated the application for 3+ years there was already an updated build which addressed this issue. The customer is now in the process of removing the application for users who do not need it and upgrading those which do.
Just another example of critical issues which would have been solved if only the software/application was maintained as requested by the vendor.
Now that problem was solved, time to then address the next crippling issue with Outlook. But we will save that for another blog post…..
Cheers,
Rhoderick