Email Events Delayed
Incident Report for PagerDuty
Postmortem

Summary

On February 24th from 01:35 to 06:26 UTC and again from 21:10 to 22:25 UTC, PagerDuty experienced an incident where a portion of email events were being rejected. Specifically, the largest impact occurred from 02:45 UTC to 05:15 UTC and from 20:20 UTC to 21:30 UTC.

The incident only impacted the US region and was caused by a storage issue affecting a subset of our mail servers. The email events that were routed to the unaffected mail servers were ingested normally, including emails from customers whose mail-sending servers were configured to retry failed deliveries.

As soon as the issue was identified we resolved the storage issue and email event processing was fully restored, concluding the incident. As an immediate action item we reviewed and made changes to our mail server monitoring and configuration to prevent the issue from reoccurring. At this time we do not have further concerns about the availability and health of the mail servers.

What Are We Doing About This

As part of incident investigation we’ve identified the following action items to help prevent such issues from happening in the future: 

  • Further extending our mail server monitoring to better detect and respond to email processing issues
  • Improving logging for rejected emails to be able to identify affected accounts and routing keys
  • Auditing other similarly-configured PagerDuty servers to proactively ensure the storage issue won’t impact them
Posted Mar 04, 2022 - 00:39 UTC

Resolved
We have been monitoring this issue with signs of recovery and this incident is now resolved.
Posted Feb 24, 2022 - 22:38 UTC
Update
We are continuing to investigate delayed email events and will follow up within 30 minutes with an update.
Posted Feb 24, 2022 - 22:28 UTC
Update
We are continuing to investigate this issue.
Posted Feb 24, 2022 - 22:07 UTC
Investigating
We are investigating reports of delayed email events and will follow up when we have more information.
Posted Feb 24, 2022 - 22:07 UTC