On Wednesday, April 19 between 14:16 UTC and 14:56 UTC, PagerDuty experienced an incident where event processing had completely stopped in the EU service region. During this time, users could still send events to our Events API, however, they would have experienced a delay in alert and incident creation for those events. Other service regions were not impacted by this incident.
On April 19, 2023 at 14:16 UTC our teams noticed that event processing had a high degree of failure in the EU service region. Upon further investigation, we noticed a misconfiguration that prevented network communication between our event processing service and other internal services. Users affected would have noticed a delay in alert and incident creation as well as delays in receiving notifications. Our teams addressed this issue and at 14:46 UTC, event processing resumed. Users would have started to see normal event and incident creation at that time. At 14:56 UTC, the accumulated backlog of events finished processing.
We will increase monitoring to detect misconfigurations to prevent these sorts of failures from reoccurring. We are also reviewing our other internal services for similar misconfigurations to ensure they don’t cause future incidents. We apologize for the inconvenience that this has caused. For any questions, comments or concerns, please reach out to support@pagerduty.com.