EU Service Region is not Processing Events
Incident Report for PagerDuty
Postmortem

Summary

On Wednesday, April 19 between 14:16 UTC and 14:56 UTC, PagerDuty experienced an incident where event processing had completely stopped in the EU service region. During this time, users could still send events to our Events API, however, they would have experienced a delay in alert and incident creation for those events. Other service regions were not impacted by this incident.

 

What Happened

On April 19, 2023 at 14:16 UTC our teams noticed that event processing had a high degree of failure in the EU service region. Upon further investigation, we noticed a misconfiguration that prevented network communication between our event processing service and other internal services. Users affected would have noticed a delay in alert and incident creation as well as delays in receiving notifications. Our teams addressed this issue and at 14:46 UTC, event processing resumed. Users would have started to see normal event and incident creation at that time. At 14:56 UTC, the accumulated backlog of events finished processing.

 

What Are We Doing About This

We will increase monitoring to detect misconfigurations to prevent these sorts of failures from reoccurring. We are also reviewing our other internal services for similar misconfigurations to ensure they don’t cause future incidents. We apologize for the inconvenience that this has caused. For any questions, comments or concerns, please reach out to support@pagerduty.com.

Posted Apr 27, 2023 - 16:12 UTC

Resolved
We have resolved an incident where all PagerDuty customers in the EU service region experienced issues with processing global and service event rules. The incident is now resolved, all impacted events have been reprocessed, and there is no ongoing impact to customers. Please reach out to support@pagerduty.com if you have any concerns.
Posted Apr 19, 2023 - 15:04 UTC
Update
We are continuing to investigate an incident where all of PagerDuty customers within the EU Service Region are experiencing issues with processing global and service event rules. We have mitigated the issue, are seeing signs of recovery, and we will continue to monitor the issue. We will provide further updates within 20 minutes.
Posted Apr 19, 2023 - 14:50 UTC
Identified
We are investigating an incident where all PagerDuty customers in the EU service region are experiencing issues where global and service events are not being processed. We will provide further updates within 20 minutes.
Posted Apr 19, 2023 - 14:33 UTC
This incident affected: Events API (Events API (EU)).