Identified - Document Processing issues
Incident Report for Signable Status Page
Postmortem

On April 28, 2024, a scheduled database maintenance window led to a severe service outage that impacted the Signable API and document processing services for over 19 hours. Signers were still able to Sign envelopes and send documents using a template. The root cause was a misconfiguration during the database cluster cloning process, which prevented the public API and other internal services from connecting to the new database cluster.

Incident Summary

During the maintenance window on April 28, the production database cluster was cloned with a subnet misconfiguration. This meant the public API and other internal services attempted to connect to the cluster using a private IP. As they resided in a separate VPC, they lost connectivity to the new database cluster.

While the services that run the Signable Web app and Signing page were largely unaffected, the issue was initially raised because documents were not being processed through the Signable App.

Issues with error logging and monitoring in the affected services made it difficult to identify the root cause promptly.

Customer Impact

The outage severely impacted Signable's ability to process customer documents and accept calls from the public API, causing significant disruption for customers, many of whom rely on the platform for time-sensitive transactions.

Resolution and Recovery

The incident was declared at 8:09 AM on April 29, and the product team immediately began working towards a resolution.

The issue was resolved at 1:30 PM on April 29 by cloning the database cluster again with instances in public subnets, restoring connectivity for the Public API and document processing services.

Lessons Learned and Improvements

In order of criticality, Signable will be implementing improvements in the following areas: 

  • Implement more robust QA processes and regression testing for maintenance deployments that happen out of hours.
  • Improve error logging and monitoring for the services affected to enable proactive issue detection. 
  • Enhance alerting mechanisms based on customer report volume and service request patterns.
  • Streamline incident communication channels and processes for better coordination and clarity for customer-facing updates during the incident.
  • Review maintenance scheduling to minimise the impact on customer-critical periods.

Signable is committed to learning from this incident and implementing the necessary improvements to prevent similar outages and provide a more reliable service to our customers.

If you have further questions or require more information, please get in touch with us at help@signable.co.uk.

Posted May 02, 2024 - 10:50 BST

Resolved
We can confirm that our monitoring is showing that the issues impacting Signable services have been resolved. A full post-mortem will be published as soon as it is available. If you continue to experience problems please contact our support team hello@signable.co.uk. A full report will be published as soon as we have finished investigating the root cause. Thank you for your patience.
Posted Apr 29, 2024 - 13:28 BST
Monitoring
We have implemented a fix for the issues with API-v1 and the Signable App and are monitoring the situation. We will share an update as soon as we can confirm it has been fully resolved. Thank you for your patience. A full report will be published as soon as we have finished investigating the root cause.
Posted Apr 29, 2024 - 13:20 BST
Update
As part of the remediation efforts, emergency maintenance is taking place on Signable services. All services are taken down to apply these fixes. We will update you once they are back up again.
Posted Apr 29, 2024 - 12:36 BST
Update
We are continuing to work on a fix for this issue. We apologise for the inconvenience and appreciate your patience. Once the issue is resolved, a detailed report will follow. Thank you for your understanding.
Posted Apr 29, 2024 - 12:18 BST
Update
We are continuing to work on a fix for this issue.
Posted Apr 29, 2024 - 09:56 BST
Identified
We have identified a fix for the issues affecting document processing. We’re working to implement it and will let you know as soon as it is deployed.
Posted Apr 29, 2024 - 09:13 BST
Investigating
We have identified a fix for the issues affecting document uploading and processing. We’re working to implement it and will let you know as soon as it is deployed.
Posted Apr 29, 2024 - 08:13 BST
This incident affected: Web App, API, Document Processing, Email, and New platform API.