On August 9, 2021, from roughly 14:35 UTC until 16:42 UTC PagerDuty’s web application returned a higher than average number of HTTP 500 error responses on some pages in the application. Event ingestion, notification delivery, and webhook delivery were not affected by this incident.
We deployed a change to the web application to provide better telemetry and logging. The change included new calls to an operation that appeared to be a read-only action but that occasionally would result in a database write. When that call was made from within a read-only database context it would produce an error that returned an HTTP 500 response to the user.
We are investigating our testing procedures and tooling to understand why this bug escaped notice during testing. We are confirming that our monitoring thresholds are set appropriately to detect similar occurrences sooner and reviewing our canary-deploy systems to reduce the likelihood of a re-occurrence. We sincerely apologize for this degradation in performance. For any questions, comments, or concerns, please contact us at firstname.lastname@example.org.