Web Application increase in errors
Incident Report for PagerDuty
Postmortem

Summary

On August 9, 2021, from roughly 14:35 UTC until 16:42 UTC PagerDuty’s web application returned a higher than average number of HTTP 500 error responses on some pages in the application. Event ingestion, notification delivery, and webhook delivery were not affected by this incident.

What Happened

We deployed a change to the web application to provide better telemetry and logging. The change included new calls to an operation that appeared to be a read-only action but that occasionally would result in a database write. When that call was made from within a read-only database context it would produce an error that returned an HTTP 500 response to the user.

What Are We Doing About This

We are investigating our testing procedures and tooling to understand why this bug escaped notice during testing. We are confirming that our monitoring thresholds are set appropriately to detect similar occurrences sooner and reviewing our canary-deploy systems to reduce the likelihood of a re-occurrence. We sincerely apologize for this degradation in performance. For any questions, comments, or concerns, please contact us at support@pagerduty.com.

Posted Aug 16, 2021 - 17:40 UTC

Resolved
The issue with increased error rates has been resolved. All systems are operating as usual
Posted Aug 09, 2021 - 16:43 UTC
Investigating
We are currently investigating an issue in the Web application which may cause an increase in errors for requests.
Posted Aug 09, 2021 - 16:33 UTC
This incident affected: Web Application (Web Application (US), Web Application (EU)).