4XX Responses in Web UI
Incident Report for PagerDuty
Postmortem

Summary

On January 16, 2019, at 22:31 UTC we received reports of 404 responses from users trying to access configuration pages. We found that the issue was isolated to Global Admins on accounts with advanced permissions. All other user roles and accounts without this feature were not affected.

What Happened?

Following a deployment, a logic error was introduced into a permissions check method. The execution of the method in question was in a code path that required a Global Admin user to request certain configuration pages, while also using advanced permissions. The logic error introduced on this code path resulted in 400s being returned. The logic error, coupled with how we were handling these types of error messages, ultimately caused a delay in identifying a root cause.

Once the pattern of a user role and feature was found and confirmed, we were able to identify the source of the issue. The deployment was rolled back to a healthy state.

What Are We Doing About This?

First, we will implement a set of tests to catch these types of logic errors going forward. Secondly, we will implement better logging to identify these types of issues faster. Lastly, we will work to improve the messaging on our 4XX error pages.

We would like to apologize for this service interruption. For any questions, comments, or concerns, please reach out to support@pagerduty.com.

Posted 8 months ago. Jan 30, 2019 - 00:23 UTC

Resolved
We have fully recovered, and have completed mitigation actions necessary to ensure the incident does not recur.
Posted 8 months ago. Jan 17, 2019 - 01:33 UTC
Monitoring
We are beginning to see signs of recovery. Our engineers are still investigating and monitoring this issue.
Posted 8 months ago. Jan 17, 2019 - 01:20 UTC
Update
We are still working to identify a root cause, however we have further isolated this issue to Global Admins on accounts with a specific feature added. All other user roles and accounts without this feature are not affected.
Posted 8 months ago. Jan 17, 2019 - 00:38 UTC
Update
Our engineers are still actively investigating this issue to determine a root cause, however we have isolated the problem to Global Admin users specifically. All other user roles appear to be unaffected.
Posted 8 months ago. Jan 16, 2019 - 23:55 UTC
Investigating
We are receiving reports of 4XX responses from our Web UI. Our engineers are actively investigating to determine a root cause.
Posted 8 months ago. Jan 16, 2019 - 23:31 UTC
This incident affected: Web Application.