On Thursday, January 20 we did a release to add Data Residency support (adding hosting in the European Union using the AWS Frankfurt Germany region), and change the AWS instance class for our production instances.
The release process was estimated to take ~20 minutes, with the majority of that time spent on shutting down the AWS instance, changing the type, and restarting all services.
After resizing the instance, we noticed that our EBS data volumes were performing slowly, to the point of being unusable.
The initial diagnosis was the slow EBS volume behavior behaved like a volume recently restored from a backup snapshot (where data needs to be lazy-loaded behind the scenes by AWS).
We could not find an explanation for this behavior, as we had not made any changes to the data volumes. Working with AWS support, it was determined changing the instance class from r5 to r5b caused the volumes to go through an optimization step behind the scenes. During this optimization, we encountered an issue with some of the data volumes, resulting in increased latency.
AWS support tried to investigate the issue, but could not find a lead as to the cause, or possible fix. This issue needed to be escalated to the EBS services team to investigate. The timeline for optimization to finish and/or the EBS team to do their investigation was unknown at this time, and AWS Support could not even provide an estimate.
Because of the uncertain AWS estimates, we decided to restore all data from snapshots (created prior to the release). Because we regularly create backup snapshots, including during each release, we had a full data set to restore.
At this stage, we rolled back our instance changes, and used EBS Fast Snapshot Restore (FSR) to recover all production data from backup snapshots. This process took about 4 hours in total, as we waited for FSR to go through optimizations, and credits to accrue that allowed us to recreate the volumes. Once that was done, things returned back to normal.
We apologize for the interruption. We are reviewing our release and operations processes to address these issues in the future.
Git Integration for Jira Cloud Operations Team