Issue with stakeholder status updates
Incident Report for PagerDuty
Postmortem

Summary

On Monday, September 20th, from 21:44 UTC until 22:00 UTC, PagerDuty experienced an incident that caused delays in delivery of status update notifications and also caused the inability to view or modify incident subscribers. During this time, status update notifications were delayed by a maximum of 15 minutes. Additionally, viewing or modifying status update notification subscriptions via PagerDuty’s web and mobile applications was not possible. All other types of notifications (on-call handoff notifications, assignment notifications, and responder requests) were unaffected during this time.  

What Happened?

We began the process of removing an unused database cluster. The cluster in question was linked to another cluster that was in use by the system. The removal caused the decommissioning of active instances in that in-use cluster. Restarting the database instances that serve the application restored functionality to the service. No data loss occurred.

What Are We Doing About This?

Process improvements: We are reviewing our operations documentation to ensure proper configuration when performing operational tasks. We are also instituting stricter policies on review of such operations.

Tooling: We are collaborating between teams at PagerDuty to create an automation that would prevent the execution of such operations under certain conditions.

We’d like to apologize for the impact that this had on our customers. If you have any further questions, please reach out to support@pagerduty.com with these questions.

Posted Sep 30, 2021 - 22:03 UTC

Resolved
We have fully recovered.
Posted Sep 20, 2021 - 22:44 UTC
Monitoring
After applying a fix, stakeholder updates are now once again sending normally; we are continuing to monitor.
Posted Sep 20, 2021 - 22:18 UTC
Identified
We are currently experiencing an issue sending stakeholder status updates; all other incident functionality is unaffected. We have identified the cause of the issue and are working on a fix.
Posted Sep 20, 2021 - 22:02 UTC
This incident affected: Notification Delivery (Notification Delivery (US)).