On April 24th, 2018 from 22:47 to April 25th 00:04 UTC, PagerDuty’s web UI was intermittently unavailable and incident notification delivery was delayed, affecting all customers. Notification delivery and the web UI then returned to full availability, although from 00:04 until 01:22 UTC, there were event processing delays and status 500 responses issued from the events API for some customers.
On April 24th, 2018, there was a change to IPSec configuration in PagerDuty infrastructure that was automatically deployed. The configuration change resulted in IPSec tunnels not being renewed, which caused a gradually increasing connectivity disruption between multiple components of the PagerDuty platform. An incident response team identified the issue and deployed a corrected configuration, restoring full connectivity.
We will be expanding our automated infrastructure testing of configuration changes in the pre-deploy phase. For any questions, comments, or concerns, please reach out to firstname.lastname@example.org.