Notifications delay
Incident Report for PagerDuty
Postmortem

Summary

On September 21st, 2021, from 19:10 UTC to 19:51 UTC, PagerDuty experienced a service degradation that resulted in a small subset of users experiencing delayed notifications. This affected incident assignment notifications only. Other notifications such as Responder requests, On-Call Handoff notifications and Status Update notifications were unaffected during this time.  

What Happened

On Sep 21st, 2021, at 19:10 UTC, we noticed some outdated connections between the notification service and an internal data streaming service caused by an infrastructure change. This resulted in a small pileup of notifications on the service that handles outbound notifications. Service was restored at 19:51 UTC by re-establishing these connections, at which point any remaining unsent notifications were processed and sent.

What Are We Doing About This

All PagerDuty services are required to automatically re-establish connections (with retries) when they are lost. This incident presented an edge case; we are currently improving both the code and our service architecture requirements to address this issue. 

We greatly apologize for the inconvenience that this may have caused. For any questions, comments or concerns, please reach out to support@pagerduty.com.

Posted Sep 28, 2021 - 19:45 UTC

Resolved
We have resolved the issue and notifications are now being sent normally.
Posted Sep 21, 2021 - 19:58 UTC
Identified
We've identified the cause of the delayed notifications and are currently taking steps to remediate the issue.
Posted Sep 21, 2021 - 19:41 UTC
Investigating
We have noticed a problem where some notifications may be delayed. We are currently investigating.
Posted Sep 21, 2021 - 19:27 UTC
This incident affected: Notification Delivery (Notification Delivery (US)).