On June 9th, 2022 from 22:22 UTC through 23:40 UTC, an infrastructure failure caused a quarter of Incident Notifications to be delayed across all channels (Voice, SMS, Email) for customers sending Notifications via our EU region.
Other notifications in the EU region, such as stakeholder notifications, on-call handoffs, and responder requests, were not impacted. Notifications and other communications in all other regions were not impacted.
On June 9th, 2022 at 22:22 UTC, the service responsible for grouping and scheduling outbound notifications for Voice, SMS, and Emails was unable to reference an outdated load-balancer sidecar container that was no longer available. Consequently, a quarter of Incident Notifications for customers utilizing our EU region were delayed.
PagerDuty’s internal continuous deployment system alerted the engineering team, who swiftly updated the service to reference an active container and redeployed the portion of the affected infrastructure.
After the redeployment at 23:40 UTC, the service was fully online, and all the backlogged notifications were successfully processed and delivered.
Following this incident, our teams completed a review of our infrastructure. We will be updating our monitoring and tooling system to detect references to outdated containers in all our services, preventing such an issue from occurring again.
We apologize for the inconvenience that this has caused. For any questions, comments, or concerns, please reach out to email@example.com