The below environment was migrated from one hosting provider to another. Unfortunately one DC did not survive, and it was removed from the domain. This was done by using dsa.msc as it can now clean up the AD metadata rather than using NTDSUtil. That was a welcome change in Windows Server 2008.
A replacement server with the same name was built, joined to the domain and then promoted to be a DC. The Defender for Identity sensor was then installed on the newly built server. This was done using the regular sensor installation package and its associated access key.
Two issues were immediately apparent:
- There are two instances of the server – this machine is 2019-DC-2
- The ATP service failed to start – it has a status of "Start Failed"
Note the access key was redacted on purpose.
Duplicate Server
Let's get rid of the duplicate server, this is the machine with a Disconnected status.
To remove this object, mouse over the disconnected server. Note there is now a delete icon shown at the end of the line.
This is shown by the highlighted purple arrow:
Accept the confirmation, and the server is now gone.
Service Failed to Start
The Windows system event log will contain generic service failures. For the relevant detail, let's look in the sensor's local log files on the server. This is stored in:
%programfiles%\Azure Advanced Threat Protection Sensor\<version>\Logs\<version>\Logs
You will note there a couple of logs present here, and it probably makes sense to start with the Microsoft.Tri.Sensor-Errors.log file first.
In the lab environment, the below error was noted:
2021-04-01 15:55:23.2052 Warn DirectoryServicesClient CreateLdapConnectionAsync failed to retrieve group managed service account password. [DomainControllerDnsName=2019-DC-2.wingtiptoys.ca Domain=Wingtiptoys.ca UserName=gMSA-MID ]
2021-04-01 15:55:24.2676 Error DirectoryServicesClient+<CreateLdapConnectionAsync>d__38 Microsoft.Tri.Infrastructure.ExtendedException: CreateLdapConnectionAsync failed [DomainControllerDnsName=2019-DC-2.wingtiptoys.ca]
at async Task<LdapConnection> Microsoft.Tri.Sensor.DirectoryServicesClient.CreateLdapConnectionAsync(DomainControllerConnectionData domainControllerConnectionData, bool isGlobalCatalog, bool isTraversing)
at async Task<bool> Microsoft.Tri.Sensor.DirectoryServicesClient.TryCreateLdapConnectionAsync(DomainControllerConnectionData domainControllerConnectionData, bool isGlobalCatalog, bool isTraversing)
Why was it unable to retrieve the group managed service account (gMSA) password?
The recommend configuration was used to configure the environment. This is a gMSA account, which uses a separate AD group to allow access to retrieve the managed password. The correct computer objects must be in the group to allow access. We can quickly verify that using PowerShell:
Get-ADServiceAccount -Identity gMSA-MID -Properties PrincipalsAllowedToRetrieveManagedPassword
Since the computer object was deleted as part of the AD metadata cleanup, we need to manually add the object for the replacement DC to this group. Even though they have the same name, it is a different object.
No biggie, just add the computer object for the new domain controller to the relevant group. Once AD has replicated, the server has to pick up the new group membership.
In order to pick up the new membership, you can either:
- Restart the server
- Purge the Kerberos tickets for the system context
The below is an example of purging the Kerberos tokens from the system context.
Klist.exe -Li 0x3e7 Purge
Note that the tickets were purged.
Now we can start the service, and be back in business.
Bonus Issue
In addition to the above gMSA issues, there was an additional item to fix. This is not related to Defender for Identity. In the ATP logs, for example Microsoft.Tri.Sensor-Errors.log, the below error was observed.
2021-04-01 21:31:37.1946 Error ExceptionDispatchInfo System.Net.Http.HttpRequestException: An error occurred while sending the request. ---> System.Net.WebException: The remote name could not be resolved: 'wingtiptoyscanadasensorapi.atp.azure.com'
at Stream System.Net.HttpWebRequest.EndGetRequestStream(IAsyncResult asyncResult, out TransportContext context)
This was due to an issue previously observed with Azure DNS - please see that post for the relevant details.
Cheers,
Rhoderick