On October 19, 2022, the marketing website (https://www.pagerduty.com ) was unreachable from 01:31 UTC to 02:16 UTC. This outage was only for the marketing website and didn’t affect the PagerDuty application. Visitors weren’t able to log in to the PagerDuty application through the marketing website or create new trials.
While deploying our marketing website, a server configuration update caused traffic to our main site to serve 500 responses for any incoming requests. The problem was caused by a settings update on one of our providers' caching services. We enabled the newly required setting, which resolved the issue. However, due to this setting, our deployment had not completed successfully, and users were temporarily redirected to one of our staging instances. After updating the route to the correct endpoint, traffic began to flow correctly to our production instance.
Following this incident, our teams conducted a thorough post-mortem investigation which identified an update made by our web host provider that caused the issue, which was triggered by our normal deployment process. We have identified this issue with our web host provider and they are working to provide us with feedback to address the problem. We have also identified some internal opportunities to refine our current deployment pipeline, including the creation of developer runbooks and updates to the configuration of our caching settings.
We would like to express our sincere apologies for the service degradation. For any questions, comments, or concerns, please contact us at support@pagerduty.com