On June 21st, between 06:25 UTC and June 21st at 07:08 UTC, PagerDuty incident notifications and status updates were delayed. All delayed notifications and updates were delivered by 07:08 UTC.
On June 21st, at approximately 06:25 UTC, our system experienced a sudden increase in traffic to the systems responsible for the scheduling and delivery of incident notifications along with status updates. This was due to a wide-spread internet event that caused PagerDuty customers’ monitoring systems to generate a significantly higher-than-expected level of alerts. This elevated traffic stressed the services responsible for scheduling incident notifications and status updates past expected surge levels. 1.33% of incident notifications and 5.73% of status updates were delayed as a result.
Shortly after the surge decreased, our system was able to catch up to its expected processing levels. All queued incident notifications and status updates were processed and sent. New incident notifications and status updates were sent on time as of June 21st at 07:08 UTC.
To prevent internet-wide events from affecting the delivery of incident notifications and status updates in the future we are actively reviewing database resource utilization to ensure proper surge capacity. We are also implementing additional monitoring to detect burst events such as these in the future. Both of which will allow us to better protect our customers from future surge events.
We apologize for any inconvenience this has caused. For any questions, comments or concerns, please reach out to email@example.com