Email deliverability outbound is impacted right now
Incident Report for PagerDuty
Postmortem

Summary

On June 9th, 2022 from 12:15 UTC through 18:25 UTC, a temporary failure of a downstream communications provider caused delays in delivery of specific PagerDuty emails. In the US and EU service regions, administrative emails for password resets, user invitations, and responder readiness reports were affected. EU service region status update emails were also affected.

Email notifications for all other types (incident notifications, responder requests, on-call handoff), and all other channels, including Push, SMS and Voice were unaffected by this incident.

What Happened

On June 9th, 2022 at 13:24 UTC, PagerDuty internal monitoring systems alerted that one of our downstream communication providers’ service was failing to accept and send outbound administrative emails and status update emails to a handful of recipients. The downstream communications provider was able to re-establish our connections with them at 13:48 UTC, and our systems were able to automatically retry requests that previously failed. It took a while longer for the communications provider to recover from the backlog of email requests that had built up during the earlier downtime. Newer email notifications were prioritized for delivery and were sent in real-time as of 15:30 UTC. Full recovery of all systems (including the backlog of requests) completed at 18:25 UTC.

What We Are Doing About This

Following this incident, our teams conducted a thorough investigation to discover ways of mitigating this issue both at the technical and organizational level. We had good monitoring in place to catch this issue, but we have identified gaps and potential areas of improvement and are actively addressing those concerns. We will be revisiting our retry and availability strategy for all outbound notifications in order to ensure that you get your message when you need it most.

We sincerely apologize for the delayed notifications you or your teams experienced. We understand how vital our platform is for our customers. As always, we stand by our commitment to providing the most reliable and resilient platform in the industry. If you have any questions, please reach out to support@pagerduty.com.

Posted Jun 17, 2022 - 21:01 UTC

Resolved
We have now fully recovered, and outbound emails are being delivered as normal. Please reach out to support@pagerduty.com if you have any questions.
Posted Jun 09, 2022 - 16:14 UTC
Monitoring
We are seeing improvement with outbound email deliverability. We will update further once we know more information
Posted Jun 09, 2022 - 15:05 UTC
Update
We are continuing to investigate email outbound deliverability issues. The emails affected are administrative emails such as password resets (both regions), invitation emails (both regions), on-call readiness report emails, email status updates (EU only). PagerDuty incident notification emails are not being affected.

We are seeing intermittent success of outbound emails. We will update as as we learn more in due course.
Posted Jun 09, 2022 - 14:47 UTC
Update
We are continuing to investigate email deliverability issues and will update in due course. Incident email notifications are not affected.
Posted Jun 09, 2022 - 14:05 UTC
Investigating
Email deliverability is impacted right now, and we are investigating and will update again soon. Incident notifications are not affected at this time
Posted Jun 09, 2022 - 13:40 UTC
This incident affected: Notification Delivery (Notification Delivery (US), Notification Delivery (EU)).