On Monday, May 2nd, between 15:25 UTC and 16:50 UTC, PagerDuty experienced an incident that caused an issue with the processing of global events.
During this time period, all global events – events sent to Global Rulesets and Event Orchestrations – were accepted by our system, but most were dropped before reaching the intended customers.
A deployment caused traffic sent to Global Rulesets and Event Orchestrations to be treated as test data; this in turn caused these global events to be dropped before being processed further in the pipeline. The intended change of the deployment was to copy a specific subset of events as test data, but due to a separate recent change in our service's configuration structure, global events were affected instead.
During the time of impact, 75% of global events were dropped before reaching our internal upstream services. These events failed silently, meaning that systems sending the events received a successful response, but the events did not trigger alerts, incidents, and/or notifications, and were therefore not visible in the web UI, mobile UI, nor REST API.
Our engineers were able to remediate the failure by redeploying a previous version of the codebase. This action restored the processing path of global events and returned our system back to a healthy state. However, we could not recover all of the global events that were impacted. We were able to reprocess 26% of the dropped global events as suppressed events - no notifications were sent out, but they are visible for historical purposes.
Following this incident, our teams conducted a postmortem investigation, which identified the events that contributed to this incident, as well as what we can do to ensure similar incidents do not happen again. The corrective actions included the following:
We sincerely apologize for these failed events, the consequently unsent notifications, and the impact this had on you and your teams. As always, we stand by our commitment to providing the most reliable and resilient platform in the industry. If you have any questions, please reach out to support@pagerduty.com.