Multiple internal server errors detected on BSC, opBNB, ETH, Aptos, Polygon, ETH beacon chain
Incident Report for NodeReal
Postmortem

Root Cause:
Amazon will be ending support for EKS version 1.23 on 11 October 2023. The routine check reveals some network layer components are using version EKS 1.23. After the EKS upgrade, our service, which leverages kube-api service to do leader election(thruster) or new service endpoint watcher(coordinator), all got connection timeout.

Mitigation:
Migrating kubernetes POD into new consolidated instances will enhance the future monitoring system and ensure service endpoints connections are healthy.

Forward Work:
Under new instances, our monitoring system for all Dev/QA/PROD is reinforced. The system will ensure future infrastructure component versions are consistent in all EKS clusters.

Posted Sep 22, 2023 - 02:22 UTC

Resolved
The EKS endpoint cannot be connected after the upgrade causing major outage on BSC, opBNB, ETH, Aptos, Polygon, ETH beacon chain
Posted Sep 07, 2023 - 05:30 UTC