On September 21st, 2021, from 19:10 UTC to 19:51 UTC, PagerDuty experienced a service degradation that resulted in a small subset of users experiencing delayed notifications. This affected incident assignment notifications only. Other notifications such as Responder requests, On-Call Handoff notifications and Status Update notifications were unaffected during this time.
On Sep 21st, 2021, at 19:10 UTC, we noticed some outdated connections between the notification service and an internal data streaming service caused by an infrastructure change. This resulted in a small pileup of notifications on the service that handles outbound notifications. Service was restored at 19:51 UTC by re-establishing these connections, at which point any remaining unsent notifications were processed and sent.
All PagerDuty services are required to automatically re-establish connections (with retries) when they are lost. This incident presented an edge case; we are currently improving both the code and our service architecture requirements to address this issue.
We greatly apologize for the inconvenience that this may have caused. For any questions, comments or concerns, please reach out to firstname.lastname@example.org.