On Friday May 18th, Teamleader was unavailable for about 30 minutes. We are very sorry this happened and we know this might have caused you some trouble. That’s why we’d like to explain what happened exactly and what we will do to prevent this in the future.
What happened?
A setting in our Amazon Web Services will disallow new connections once the number of connections reaches a certain amount. In this case, the server will not return any answers. When this happened, Amazon was no longer able to perform health checks on our machines. As a result, it was perceived as an unhealthy machine, and there was no more traffic possible.
This caused all requests (a click, an action, a new page loading…) to overload the queue which caused Teamleader to be unavailable for a time.
How did this happen?
We performed a migration at the moment of the downtime. The connections to our database remained open for longer than normal. At the same time, one of our endpoints received an unusually high amount of requests, which caused us to reach the limit of the setting we spoke of earlier. This was an error from our side: we were a bit careless with this network setting, which is unacceptable.
How did we solve this problem and how will we prevent it?
Immediately after we noticed something wrong, our team got in contact with Amazon Web Services’ support team to fix the issue as fast as possible.
We have switched off the setting that caused us trouble. Now there is no more hard limit on requests, but a soft limit. This is better in situations in which we get a higher number of requests. Advised by Amazon’s support we edited two settings back to default. We have also implemented extra monitoring to help us understand this problem better and help us prevent this before it impacts our customers.
We thank you for keeping your trust in Teamleader. We know you often count on us for your daily business, so we’re very sorry that this might have caused you any trouble.
Kind regards,
Teamleader