On August 31st, between 19:00 UTC and 21:16 UTC, we experienced a degradation in the US Service Region during which connections to the PagerDuty website and APIs failed intermittently, and some outbound notifications were delayed. Our EU Service Region was not affected.
At around 19:00 UTC on August 31st, Amazon Web Services began to experience networking issues with a single Availability Zone in the us-west-2 region. This led to intermittent availability of our load balancers in the affected Availability Zone. From 19:23 UTC onwards, we began to manually shift traffic away from the affected Availability Zone, and by 21:16 UTC we were confident that the issue had been mitigated.
The nature of the partial outage in the affected Availability Zone resulted in the intermittent availability problems. As a result, it took longer than anticipated to fully understand the customer impact.
We are updating our alerts and monitoring to have better visibility when these kinds of problems occur. This will ensure we are better placed to understand the impact should a similar problem occur in the future.
We understand how important and critical our platform is for our customers. We apologize for any impact this incident had on you and your teams. As always, we stand by our commitment to provide the most reliable and resilient platform in the industry. If you have any questions, please reach out to firstname.lastname@example.org.