iOS Mobile App Functionality Impacted
Incident Report for PagerDuty
Postmortem

Summary

On August 3nd at 20:06 UTC, we suffered a partial disruption of our iOS mobile application for 6 hours and 35 minutes. Push notification delivery was not affected. During this time, the PagerDuty iOS mobile app continually displayed red error banners to all users trying to view the Open Incidents screen when there were no open incidents for that filter (Mine, My Teams, All).

What Happened?

A deployment to our API codebase accidentally introduced a bug. This caused our API to return an unexpected value (null) for one of the response schema properties, pertaining to pagination, when returning an empty list. The iOS mobile app was not able to handle this case, which resulted in an error message to the user on any view where there would be a list of data but for which there was no data to display.

Our monitoring solutions did not automatically detect this issue, and so initial investigation did not begin until 01:24 UTC when PagerDuty engineers became aware of the error. Shortly thereafter, the issue was identified and the necessary change to fix the API bug was deployed. Full functionality was restored at 02:41 UTC.

What Are We Doing About This?

Our resolution time was delayed by a few different factors. Firstly, the issue was not automatically identified by our mobile apps monitoring tool. Secondly, it took a long time for PagerDuty engineers to identify that customer experience was disrupted, and so there was a long delay before initiating our incident response process. We will be improving the coverage of our automatic monitoring and testing to detect these cases in future. We have also put in place safeguards against making such API changes in the future.

We would like to again apologize for any inconvenience this issue caused. If you have any questions, please do not hesitate to contact us at support@pagerduty.com.

Posted 11 months ago. Aug 11, 2017 - 22:25 UTC

Resolved
This incident has been resolved.
Posted 12 months ago. Aug 03, 2017 - 03:03 UTC
Monitoring
We have recovered from the issue impacting the iOS mobile app, and are continuing to monitor.
Posted 12 months ago. Aug 03, 2017 - 02:51 UTC
Identified
Our engineering team has identified the cause of the aforementioned issue with the iOS app and is taking action to correct it.
Posted 12 months ago. Aug 03, 2017 - 01:50 UTC
Investigating
We are experiencing an issue affecting functionality of our iOS mobile application. The Android app is unaffected; all other systems are functioning. Our engineering team is currently investigating.
Posted 12 months ago. Aug 03, 2017 - 01:39 UTC