Scanning Exchange databases with file system antivirus is a recipe for disaster. This really should not come as a surprise for admins running Exchange services within the enterprise, since this has been the field requirement for a long time. The documentation provided by Microsoft is very clear in what exclusions are required for file system antivirus and Exchange to coexist. For reference the relevant articles are:
If this is so well documented, then what could possibly go wrong? Plenty….
Update 30-6-2014: Please also see this post on a related issue.
Understanding File System AV Scanning
Every vendor who writes a file system AV product will implement theirs in a different way. Because of this, and the fact that I will not identify vendors by name, this article will be written in a generic style. The concepts however will apply to the vast majority of AV products.
TechNet does a good job of listing the types of file system antivirus scanners:
Memory-resident file-level scanning refers to a part of file-level antivirus software that is loaded in memory at all times. It checks all the files that are used on the hard disk and in computer memory.
On-demand file-level scanning refers to a part of file-level antivirus software that you can configure to scan files on the hard disk manually or on a schedule. Some versions of antivirus software start the on-demand scan automatically after virus signatures are updated to make sure that all files are scanned with the latest signatures.
Other terminology that may be encountered is the term On-Access. This is where AV will process a file when it is accessed. Unlike the On-Demand scan, if a file is never opened then it is never scanned. Reversely if it is opened multiple times then it will likely get scanned each time it is accessed. The exact details of this are at the discretion of the AV vendor.
The heuristics contained within each AV product vary greatly, and they behave differently on the above point and many others. Some do not show the configured file system exclusions in their admin tool graphical interface and you have to look at the registry to see what file system paths are actually being excluded. Others allow the AV team to lock the management application on the Exchange server down so that it is harder/impossible to see what scans are running, to troubleshoot issues and to terminate the AV scan (if required) without waiting for AV team to respond.
Please consult with your AV team and review their vendor's documentation to understand how their product works .
Issues That Can Arise Due To File System AV Scanning
Regrettably there are multiple issues that can and will arise if you allow file system AV to scan Exchange. Note that this is not just the mailbox database file, there are range of other locations that must also be exempted from file system AV scanning. For details see the links at the start of this post.
File-level scanners may scan a file when the file is being used or at a scheduled interval. This can cause the scanners to lock or quarantine an Exchange log file or a database file while Exchange tries to use the file. This behaviour may cause a severe failure in Microsoft Exchange and may also cause -1018 ESE errors.
One thing to note is that file-level scanners do not provide protection against e-mail viruses, such as the Storm Worm. Storm Worm was a backdoor Trojan horse virus that propagated itself through e-mail messages. The worm joined the infected computer to a botnet, where the computer was used to send spam e-mail messages in periodic bursts. Such viruses can affect the performance of the computer and the network that it is attached to.
This is not a new issue. As my friend Dave McGarr puts it over on his blog, Friends don't let friends scan the M- drive ! Because of this, the M:\ drive was hidden by default in Exchange 2003. Exchange 2000, which introduced the M:\ Drive, was often negatively impacted by file system AV scanning M:\…..
A Case In Point
This is the story of a recent engagement where I ran into some serious AV issues. The customer in question had recently completed an Exchange Server Risk Assessment (ExRAP). ExRAP looks at both technical and process aspects of managing messaging services. One interview question specifically asks if the correct AV exclusions have been implemented. The customer stated that they were.
Fast forward 4 months. The customer's stable Exchange environment started to exhibit strange behaviours all of a sudden. Issues included degraded database performance, database failover issues and very poor Outlook client response times. As part of initial troubleshooting Microsoft requested that the AV exclusions be checked to ensure that they are correct and were not causing any issues. Again they were stated as correct. Screen shots and remote assistance sessions showed that the settings were entered. So what was causing databases not to failover between DAG members?
Well it turns out that only half of the puzzle was validated. Unbeknown to the Exchange admins, the AV team had implemented a weekly On-Demand scan that started late Sunday evening and scanned every single file on the server. Yes that's right — zero exclusions… It gets better! These scans were taking a very long time to complete, and in some cases the scan did not complete until Wednesday or Thursday!
The AV product in use has a feature where it will lock a file that looks suspicious for an un-specified amount of time. The lock duration is controlled by the AV engine and is entirely at its discretion. This is what caused the database failover issues. When trying to mount a database on a server, AV locked the Exchange database as it though that MBD01.edb was suspicious. Since the file was locked, Exchange was unable to gain access to the database and mount it. If enough time elapsed then AV would release the file and the database could be mounted. Reviewing traces corroborated this, as we would see Exchange starting to read the database but not progressing further.
Not only was this an unsupported act as far as Microsoft is concerned the impact to the customer was tremendous. Some of the issues experienced were:
Multiple corrupted mailboxes
Databases would not *over between servers
Server performance was impacted
Storage performance was impacted
Rather than just state that the required exclusions be implemented, I thought it would be more beneficial to discuss some of the areas which typically contribute to the above situation, and some resolutions.
All teams must be tightly aligned on how AV is deployed and configured. While server teams like Exchange do not need to know the exact details of implementing AV on the backend, they must understand how to communicate with the other teams effectively, more on this in a minute! For example how do the Exchange servers get the correct AV policy assigned? Is it based on server name, location in AD or are Exchange servers manually tagged with a policy? This sounds minor, but this knowledge is critical in understanding the impact of choosing a different server name or the steps required if reinstalling an Exchange server from scratch.
To assist with communicating effectively, all teams should communicate using the same terminology to minimise any potential misunderstandings. In the above example, the Exchange team understood an AV exclusion to apply to any and all AV scans. However the AV teams did not share this viewpoint, and their terminology was more granular.
Teams should have defined lines of communication. This is applicable not just to escalate issues, but also to ensure that proactive knowledge is shared. For example:
If an update to the core AV product is being rolled out, then the relevant server admins must be notified.
If an AV incident is observed in APAC, then the AV team should investigate the issue and if they find that AV is scanning locations it should not, then global server teams must be notified to validate their configurations.
Communication between teams at the start of the above story was not optimal, though it did improve greatly. Enterprises must ensure that the required lines of communication and escalation are available between all the teams that work together to provide an enterprise solution. This applies to all products, applications and services that operation in an enterprise and is not limited to just Exchange.
Ensure that everyone is totally clear on what other teams expect from them and vice versa. For example if the Exchange admin requests that a certain file be exempted then the Exchange admin's expectation that it is also excluded from any and all scans. The AV team will expect clear and concise guidance from the Exchange admin as to what are the file exclusion requirements. Such requirements are application specific.
There must be a detailed discussion on the configuration of the AV policies that are applied to the Exchange infrastructure. Some examples include:
Action taken when a potential malicious file is located. If the action is to automatically repair then databases could be instantly corrupted.
If the AV client UI is locked down this can prevent local server admins performing investigative work on the machine.
Typically enterprise AV products will be managed by a central tool/directory that pushes out the defined AV configuration to the agents. Normally this is set to overwrite any local changes to the AV configuration. All changes must be made to the central console.
The AV agent health must be monitored by the AV team to ensure that an agent does not "go native", and ignore its configuration. The worst possible case here would be for an agent to revert back to its default configuration which typically means that there are no exclusions and all files and processes are scanned.
AV team must accept that Exchange requires certain file system exclusions to operate in a supported manner by Microsoft. This is a tendency for such AV teams to perceive a security risk by the fact that MDB01.edb is never scanned by file system AV. Their concern that NaughyFile.edb will be stored on the Exchange server needs to be tempered with:
Microsoft is not asking customers to run servers with no file system AV, rather it just needs to be configured to support the application in question – in this case Exchange.
Microsoft does not support scanning Exchange with file system AV. Doing so adds a risk as you are not in a supported configuration from the application vendor.
Only select administrators should logon to Exchange servers.
Exchange servers should not be used for file sharing
Internet browsing should be blocked from all servers in the enterprise
All servers in the enterprise should be at a current patch level to help prevent compromise
All workstations in the enterprise should be at a current patch level help prevent compromise
All servers in the enterprise should have different local administrator passwords
All workstation in the enterprise should have different local administrator passwords
The above are only a few points in a typical discussion on this topic. Please engage with a security consultant to fully discuss such issues, as each enterprise will have different business requirements which translate into the underlying technical configuration. Some customers track these activities through a security sign off or waiver process.
Finally, do not assume that since a previous version of Exchange ran in a given environment, the AV conversation can be skipped! Take the time to ensure that all teams are on the same page, and that the correct exclusions are applied. Exchange 2010 has different exclusions compared to Exchange 2003! Additionally there will likely have been staff changes over the years since older AV policies were defined so have this critical conversation to prevent a critical situation – aka a CritSit!