IAM Cloud Periodic Authentication Performance Issues – Friday 6th September
Issue & Impact: EU & US customers have been experiencing temporary delays to authentication, SSO and access to our portal this morning. This led to users in a number of our customers who were attempting to log-in during the period of service outage/slow performance having to wait abnormally long times to access to their federated resources or getting non-response errors.
Root Cause Analysis: With additional load of this week’s new starters from our education customers, IAM Cloud has experienced both a higher rate of authentication activity and a substantially higher rate of password reset requests. This has caused our platform’s Azure auto-scaling to activate a couple of times. However, we’ve noticed a strange new behaviour in the Azure auto-scaling that we’ve not encountered before. Effectively, when scaling our service from e.g. 4 instances to 6 instances, historically (until a few days ago) 2 new instances warmed up alongside our existing 4. However, the Azure auto-scaling feature now ‘replaces’ our e.g. 4 active instances with 6 fresh instances – all of which take about 5-10 minutes to warm up. In a live environment with a large amount of traffic, this effectively has meant that when our auto-scaling scaled up, it first actually reduced the amount of throughput of our service for about 10 minutes (which coincided with the much greater load), and then when warmed up properly, it finally caught up. We’ve raised an escalated ticket with Microsoft Azure’s product team to try to understand if this is a bug or an intended change in policy because there has been no communication about it, and it is a strange – and not particularly helpful - change to the auto-scaling feature.
Resolution: We’ve switched off auto-scaling, leaving our service running in high performance mode for now, until we get further guidance from Microsoft. It is not an essential feature of our platform, and it is only used to help our platform run more efficiently.
We’ll provide future updates here on our status page, as well as on our changelog www.iamcloud.com/changelog
We’re sorry for the inconvenience experienced.