Delayed Incident Notifications
Incident Report for PagerDuty
Postmortem

Summary

On June 21st, between 06:25 UTC and June 21st at 07:08 UTC, PagerDuty incident notifications and status updates were delayed. All delayed notifications and updates were delivered by 07:08 UTC.

What Happened

On June 21st, at approximately 06:25 UTC, our system experienced a sudden increase in traffic to the systems responsible for the scheduling and delivery of incident notifications along with status updates.  This was due to a wide-spread internet event that caused PagerDuty customers’ monitoring systems to generate a significantly higher-than-expected level of alerts. This elevated traffic stressed the services responsible for scheduling incident notifications and status updates past expected surge levels. 1.33% of incident notifications and 5.73% of status updates were delayed as a result.

Shortly after the surge decreased, our system was able to catch up to its expected processing levels. All queued incident notifications and status updates were processed and sent. New incident notifications and status updates were sent on time as of June 21st at 07:08 UTC.

What Are We Doing About This

To prevent internet-wide events from affecting the delivery of incident notifications and status updates in the future we are actively reviewing database resource utilization to ensure proper surge capacity. We are also implementing additional monitoring to detect burst events such as these in the future. Both of which will allow us to better protect our customers from future surge events. 

We apologize for any inconvenience this has caused. For any questions, comments or concerns, please reach out to support@pagerduty.com

Posted Jul 18, 2022 - 06:05 UTC

Resolved
We have resolved an incident where PagerDuty customers in the US service region experienced issues with delayed incident notification deliveries. The issue is now resolved, and there is no ongoing impact to customers. Please reach out to support@pagerduty.com if you have any concerns.
Posted Jun 21, 2022 - 07:24 UTC
Monitoring
We have confirmed the recovery of any delays in incident notification delivery. Functionality has returned to normal for customers. We are continuing to monitor.
Posted Jun 21, 2022 - 07:18 UTC
Investigating
We are investigating potential issues with incident notifications deliverability. On confirmation, we will update with further impact and severity within 15 minutes.
Posted Jun 21, 2022 - 07:01 UTC
This incident affected: Notification Delivery (Notification Delivery (US)).