The error message "Sorry, that didn’t work. Please go back to office.com and try again” is probably one of the most vague that I've seen. It's up there with "please contact your administrator", which is fine unless you are the administrator...
The below is a repro of a case where all users were unable to sign into Office 365. They would receive the aforementioned "Sorry, that didn't work" message and proceed no further.
Let's take a peek at the error, and also jump to some of the individual Office 365 services to see if they are able to provide any more details.
Environment & Symptoms
This environment is federated to Azure AD using AD FS 2019. Both AD FS and WAP are present, and are deployed on the corporate network and DMZ respectively. Everything apparently was working the day before and all of a sudden it failed.
Using Edge Chromium, not IE11 or Legacy Edge for these reasons, let's try to access portal.office.com and see what happens.
Immediately we then get the aforementioned error - sorry! How Canadian is that? It's apologising for something it did not cause, but we will see that further on below.
For readability the error shown is "Sorry, that didn't work. Please go back to Office.com and try again".
The URL has changed to https://www.office.com/landing and we do not have access to the tenant. Bummer.
What about IE though? Did someone tweak the supported browser strings or the integrated auth settings? Let's just try IE11 to see if that works.
Nope. Same issue.
Reverting back to Edge, let's go directly to our mailbox using https://outlook.office365.com - does that work any better?
Nope. But we start to get our first meaningful clue.
Note the error:
X-Auth-Error OpenIdConnect Microsoft.Exchange.Security.OpenIdConnect.OpenIdConnectIdpException
That should start to make you think about the underlying authentication process. We will come back to that.
What about Trying Outlook ProPlus Microsoft 365 Apps For Enterprise?
Again, we see that there appears to be issues on the sign in process. We get some detailed error codes and the statement "Unable to verify token signature".
To recap, we have seen the following snippets:
- OpenIDConnect
- X-Auth-Error
- Issues with tokens
That really does look like the underlying authentication process is failing, and we need to look into that further.
Review Azure AD Sign-In Logs
To do this, we head over the the Azure AD portal to check the sign-in logs. For some back ground on the logs please review the docs.
You can access the Azure AD Sign-in report directly with this link.
In this case, all federated authentication attempts failed. Only cloud accounts were able to authenticate. Below you can see that the administrator accounts are cloud only as this is the recommended security configuration. Protecting Microsoft 365 from on-premises attacks is a must read and covers this and many other security considerations.
Drilling into the details of a failed sign-in, we see the below error code.
For reference the full error text is:
Sign-in error code 5000811 Unable to verify token signature. The signing key identifier does not match any valid registered keys.
On the Authentication Details tab there are no further details, it just failed outright.
Verify AD FS Configuration and Health
In the federated scenario, we rely on AD FS. Let's verify that the expected user agent strings are present and that Windows Integrated Auth (WIA) is enabled.
To to this, we can run the below in PowerShell.
Get-AdfsGlobalAuthenticationPolicy
Get-AdfsProperties | Select-Object -Expandproperty WIASupportedUserAgents
That is the same as it was previously configured, no unexpected changes had been made. Note that you may have different browser strings in there so please review your documentation.
Did the TLS certificate expire?
That is a Yes & No answer. The main TLS certificate is valid, it expires well into the future with a date of 28-2-2022.
But note the highlighted line, and the fact that there are two certificates listed below.
This is the default AD FS certificate roll over for the Token-Signing and Token-Decryption certificates. They are self signed certificates with a 12 month validity.
Before they expire, AD FS will generate new certificates automatically and publish them to the federation metadata so that other systems can consume the updated information and be prepared for the upcoming switch.
AD FS servers must be domain joined as they need to communicate with your domain controllers as the DCs perform the authentication. Publishing the federation metadata to the Internet directly from an AD FS is not advisable. These are Tier 0 servers, and must be afforded the same level of protection as Domain Controllers. That is why we use WAP, as it is a stand alone machine in the DMZ that can be used to provide federation services externally.
Net up we need to check on WAP.
Verifying Web Application Proxy Health
Is WAP healthy?
The below screen shot will give you an idea.
It is not. Since WAP was broken, the updated federation XML was not available on the Internet. So when we rolled to the new certificate, Azure AD was unable to obtain the updated information.
This is what led to the disconnect between AD FS and Azure AD.
We need to fix this, and it will be done in two steps. Fixing WAP, and then updating Azure AD.
Fixing WAP
The WAP servers lost connectivity to the internal AD FS farm. They were unable to open a connection to TCP 443 on AD FS. This was due to a firewall change where the wrong object was modified.
The firewall rules were updated to allow the WAP servers to communicate to AD FS on TCP 443.
Then we had to re-establish trust using this process on each WAP server. WAP was now back in business, and was able to authenticate to AD FS.
Time to update Azure AD.
Update Azure AD
WAP was now functioning and available to the Internet. Since the token-signing and token-decryption certificates had rolled to new versions, we need to make Azure AD aware of this ASAP.
To do this we can use Azure AD PowerShell to update the federation metadata.
The below commands were run from a management server, so the primary AD FS server had to be manually specified using the Aet-MsolADFSContext cmdlet.
Then we update the federated domain.
Set-MsolADFSContext –Computer Tail-CA-ADFS-19.tailspintoys.ca
Update-MsolFederatedDomain-DomainName Tailspintoys.ca
Note that this post assumes you are federating a single top level domain with this AD FS farm, thus the -SupportMultipleDomain switch was not used.
Wait a minute for Azure to process the change, and we are then able to log in.
Alternatively, you could also use Azure AD Connect to update the AD FS trust to Azure AD.
Let's recap the issue and the resolution.
Root Cause
AD FS generated new token-signing and token-decryption certificates as the old ones hit the CertificateGenerationThreshold. At this point the old certs were still being used and there were no issues until AD FS switched to the new certs. Normally this would not be an issue, but since Azure was unable to access the metadata for these new certs, it was unable to accept them and this is what we saw in the errors above regarding an unexpected signing certificate. The reason the metadata was not updated was due the WAP servers being disconnected from the internal AD FS farm so WAP ceased to function.
In looking back over the logs, the disconnection actually happened months prior. It was never noticed as the customer failed to configure monitoring for any AD FS or WAP servers. With the change to Modern Authentication, users were contacting the AD FS server directly rather than via WAP to obtain a token. This is as a result of Modern Authentication using the passive flow. Legacy authentication (basic auth) to Exchange Online used the active flow which would leverage WAP.
Cheers,
Rhoderick
So our AD FS certs failed to auto renew and expired. I managed to resolve that after a couple of hours. But now I'm stuck in a catch 22 situation as I need to sync via Update-MsolFederatedDomain to the cloud, but the auth fails since the cloud is using the expired certs :/
Hi Darren,
Admin accounts should be cloud only to help mitigate movement from on-premises to the tenant. Even going back 10 years, before that guidance was explicit, there should always be a "Break Glass" cloud only account.
If you don't have that, contact support please as they will have to to help out.
Cheers,
Rhoderick
PS Unless you have a concrete requirement for AD FS, time to look at other auth options please.
thanks, Rhoderick. Inherited system I didn't setup :/
I'll have crack at creating a cloud only admin acc.
Groovy! Hope you get that fixed and can crack on with your weekend.
This is just one of the reasons to look at WRT to AD FS and security.
https://docs.microsoft.com/en-us/azure/active-directory/fundamentals/protect-m365-from-on-premises-attacks
Cheers,
Rhoderick
just want to leave a note and say this saved my life 🙂
thanks for the detailed explanation
We have to confirm those signatures to keep our laptops safe. This can be accomplished with the aid of putting in today’s product and if the office installation fails KB63363901
we had the same issue yesterday and it got fixed with the solution provided.
How do i run the fix if i can't log in as an admin to sort?
We are completely in cloud. I need to log into cloud 365 but i get the token error when logging in as admin so now what i'm screwed?
That's not the scenario above as this post is for on-premises AD FS.
If you are 100 % sure that you have zero on-premises, that network gear or end point security is NOT intercepting TLS then the best thing will be to open up a MS CSS support case.
Cheers,
Rhoderick