The error message "Sorry, that didn’t work. Please go back to office.com and try again” is probably one of the most vague that I've seen. It's up there with "please contact your administrator", which is fine unless you are the administrator...
The below is a repro of a case where all users were unable to sign into Office 365. They would receive the aforementioned "Sorry, that didn't work" message and proceed no further.
Let's take a peek at the error, and also jump to some of the individual Office 365 services to see if they are able to provide any more details.
Environment & Symptoms
This environment is federated to Azure AD using AD FS 2019. Both AD FS and WAP are present, and are deployed on the corporate network and DMZ respectively. Everything apparently was working the day before and all of a sudden it failed.
Using Edge Chromium, not IE11 or Legacy Edge for these reasons, let's try to access portal.office.com and see what happens.
Immediately we then get the aforementioned error - sorry! How Canadian is that? It's apologising for something it did not cause, but we will see that further on below.
For readability the error shown is "Sorry, that didn't work. Please go back to Office.com and try again".
The URL has changed to https://www.office.com/landing and we do not have access to the tenant. Bummer.
What about IE though? Did someone tweak the supported browser strings or the integrated auth settings? Let's just try IE11 to see if that works.
Nope. Same issue.
Reverting back to Edge, let's go directly to our mailbox using https://outlook.office365.com - does that work any better?
Nope. But we start to get our first meaningful clue.
Note the error:
X-Auth-Error OpenIdConnect Microsoft.Exchange.Security.OpenIdConnect.OpenIdConnectIdpException
That should start to make you think about the underlying authentication process. We will come back to that.
Again, we see that there appears to be issues on the sign in process. We get some detailed error codes and the statement "Unable to verify token signature".
To recap, we have seen the following snippets:
- Issues with tokens
That really does look like the underlying authentication process is failing, and we need to look into that further.
Review Azure AD Sign-In Logs
To do this, we head over the the Azure AD portal to check the sign-in logs. For some back ground on the logs please review the docs.
In this case, all federated authentication attempts failed. Only cloud accounts were able to authenticate. Below you can see that the administrator accounts are cloud only as this is the recommended security configuration. Protecting Microsoft 365 from on-premises attacks is a must read and covers this and many other security considerations.
Drilling into the details of a failed sign-in, we see the below error code.
Sign-in error code 5000811 Unable to verify token signature. The signing key identifier does not match any valid registered keys.
On the Authentication Details tab there are no further details, it just failed outright.
Verify AD FS Configuration and Health
In the federated scenario, we rely on AD FS. Let's verify that the expected user agent strings are present and that Windows Integrated Auth (WIA) is enabled.
To to this, we can run the below in PowerShell.
Get-AdfsProperties | Select-Object -Expandproperty WIASupportedUserAgents
Did the TLS certificate expire?
That is a Yes & No answer. The main TLS certificate is valid, it expires well into the future with a date of 28-2-2022.
But note the highlighted line, and the fact that there are two certificates listed below.
This is the default AD FS certificate roll over for the Token-Signing and Token-Decryption certificates. They are self signed certificates with a 12 month validity.
Before they expire, AD FS will generate new certificates automatically and publish them to the federation metadata so that other systems can consume the updated information and be prepared for the upcoming switch.
AD FS servers must be domain joined as they need to communicate with your domain controllers as the DCs perform the authentication. Publishing the federation metadata to the Internet directly from an AD FS is not advisable. These are Tier 0 servers, and must be afforded the same level of protection as Domain Controllers. That is why we use WAP, as it is a stand alone machine in the DMZ that can be used to provide federation services externally.
Net up we need to check on WAP.
Verifying Web Application Proxy Health
Is WAP healthy?
The below screen shot will give you an idea.
It is not. Since WAP was broken, the updated federation XML was not available on the Internet. So when we rolled to the new certificate, Azure AD was unable to obtain the updated information.
This is what led to the disconnect between AD FS and Azure AD.
We need to fix this, and it will be done in two steps. Fixing WAP, and then updating Azure AD.
The WAP servers lost connectivity to the internal AD FS farm. They were unable to open a connection to TCP 443 on AD FS. This was due to a firewall change where the wrong object was modified.
The firewall rules were updated to allow the WAP servers to communicate to AD FS on TCP 443.
Then we had to re-establish trust using this process on each WAP server. WAP was now back in business, and was able to authenticate to AD FS.
Time to update Azure AD.
Update Azure AD
WAP was now functioning and available to the Internet. Since the token-signing and token-decryption certificates had rolled to new versions, we need to make Azure AD aware of this ASAP.
To do this we can use Azure AD PowerShell to update the federation metadata.
The below commands were run from a management server, so the primary AD FS server had to be manually specified using the Aet-MsolADFSContext cmdlet.
Then we update the federated domain.
Set-MsolADFSContext –Computer Tail-CA-ADFS-19.tailspintoys.ca
Note that this post assumes you are federating a single top level domain with this AD FS farm, thus the -SupportMultipleDomain switch was not used.
Wait a minute for Azure to process the change, and we are then able to log in.
Let's recap the issue and the resolution.
AD FS generated new token-signing and token-decryption certificates as the old ones hit the CertificateGenerationThreshold. At this point the old certs were still being used and there were no issues until AD FS switched to the new certs. Normally this would not be an issue, but since Azure was unable to access the metadata for these new certs, it was unable to accept them and this is what we saw in the errors above regarding an unexpected signing certificate. The reason the metadata was not updated was due the WAP servers being disconnected from the internal AD FS farm so WAP ceased to function.
In looking back over the logs, the disconnection actually happened months prior. It was never noticed as the customer failed to configure monitoring for any AD FS or WAP servers. With the change to Modern Authentication, users were contacting the AD FS server directly rather than via WAP to obtain a token. This is as a result of Modern Authentication using the passive flow. Legacy authentication (basic auth) to Exchange Online used the active flow which would leverage WAP.