Possible cluster metrics/logs delay

Incident Report for ESS (Public)

Resolved

This incident is now resolved. There are no longer logging delays present in either us-east-1 or us-west-2. Thank you for your patience as we investigated and resolved this issue.

Posted Mar 15, 2019 - 13:57 UTC

Update

We have deployed the configuration fix in production. If the delays continues to reduce we will mark this incident as resolved on our next update. We appreciate your patience as we work to restore everything to normal.

Posted Mar 15, 2019 - 13:27 UTC

Update

We have successfully tested a configuration fix and we are in the process of deploying it. Meanwhile, delays have been further reduced. Both us-east-1 and us-west-2 are now close to 5 minutes and we continue to resolve the delays across all deployments. If the delays continues to reduce we will mark this incident as resolved on our next update. We appreciate your patience as we work to restore everything to normal.

Posted Mar 15, 2019 - 12:35 UTC

Update

We have found a misconfigured component which prevents the delay in us-east-1 and and us-west-2 to be fully cleared. We're in the process of fixing it. There are still some delays in logs for these regions, us-east-1 remains close to 5 minutes and us-west-2 close to 10 minutes. We appreciate your patience as we work to restore everything to normal.

Posted Mar 15, 2019 - 11:40 UTC

Update

We are continuing to monitor the situation. There are still some delays in logs for 2 regions, us-east-1 remains close to 5 minutes and us-west-2 close to 10 minutes and we continue to resolve the delays across all deployments. We appreciate your patience as we work to restore everything to normal.

Posted Mar 15, 2019 - 10:45 UTC

Update

We are continuing to monitor the situation. There are still some delays in logs for 2 regions, us-east-1 is close to 5 minutes and us-west-2 close to 10 minutes and we continue to resolve the delays across all deployments. We appreciate your patience as we work to restore everything to normal.

Posted Mar 15, 2019 - 10:02 UTC

Update

We are continuing to monitor the situation. There are still some delays in logs for 2 regions, but we've seen improvements in both. us-east-1 is currently around 6 minutes and us-west-2 close to 10 minutes and we continue to resolve the delays across all deployments. We appreciate your patience as we work to restore everything to normal.

Posted Mar 15, 2019 - 09:22 UTC

Update

We are continuing to monitor the situation. There are still some delays in logs for 2 regions, us-east-1 less than 10 minutes and us-west-2 close to 10 minutes and we continue to resolve the delays across all deployments. We appreciate your patience as we work to restore everything to normal.

Posted Mar 15, 2019 - 08:36 UTC

Update

Posted Mar 15, 2019 - 07:57 UTC

Update

We are continuing to monitor the situation. There are still some delays in logs for some regions and we continue to resolve the delays across all deployments. We appreciate your patience as we work to restore everything to normal.

Posted Mar 15, 2019 - 07:16 UTC

Update

Posted Mar 15, 2019 - 06:44 UTC

Monitoring

We have identified a cause and are investigating. The delay in logs and performance metrics is coming back to near real-time for more regions, and we continue to resolve the delays across all deployments. We appreciate your patience as we work to restore everything to normal.

Posted Mar 15, 2019 - 06:12 UTC

Identified

We are have identified a cause and are investigating. We are suspecting an issue with monitoring that is causing a global metering cluster outage. We are currently working on rectifying this. We appreciate your patience as we work to restore everything to normal.

Posted Mar 15, 2019 - 05:36 UTC

Investigating

We are currently investigating this issue.

Posted Mar 15, 2019 - 04:54 UTC