Outage of actors, API and storage

Incident Report for Apify

Resolved

This incident has been resolved.

Posted Dec 08, 2021 - 09:26 CET

Monitoring

A fix has been implemented and we are monitoring the results.

Posted Dec 08, 2021 - 01:17 CET

Update

After about 8 hours the whole system is finally recovering as we found the way how to bypass still broken AWS services.

Current state:
- All the Apify components are now fully operational including Actors, Scheduler, API, ...
- Performance may still be degraded but it's recovering
- We are monitoring the situation and will keep you informed in a case of change

Summary of the impact:
- Since the start of the incident both actors and API had a higher error rate and degraded performance that escalated to an almost complete outage during the past 1-2 hours
- Scheduler was down for the past 1-2 hours
- Many operations such as abort did not work for many of the runs
- No data loss

Root cause:
- Outage of our cloud provider Amazon Web Services (AWS).
- AWS service is still not fully recovered.
- For more information see: https://status.aws.amazon.com/#ME_block

Posted Dec 08, 2021 - 01:15 CET

Update

Amazon Web Services are finally recovering but some of the services we critically depend on are still not fully operational. Actors and API are still degraded and we experience a complete outage of the scheduler.

Posted Dec 08, 2021 - 00:21 CET

Update

Current state:
- There are finally signs of improvement
- Actors are partially functional - runs are starting with a delay and finishing but the error rate is still high
- API has a higher error rate than usual but performance got back to normal

For more info on Amazon Web Services outage see https://status.aws.amazon.com/

Posted Dec 07, 2021 - 20:44 CET

Identified

The issue has been identified.

Posted Dec 07, 2021 - 18:05 CET

Investigating

We are experiencing an outage of Apify actors and API due to the regional outage of our cloud provider (Amazon Web Services). We are trying to minimize the impact on our customers.

Some actor runs may not start at all or might be delayed. There is no data loss.

Posted Dec 07, 2021 - 17:00 CET

This incident affected: External services (AWS EC2, AWS S3, AWS SQS, AWS elb-us-east-1, AWS elasticache-us-east-1, AWS dynamodb-us-east-1, Braintree Payment Gateway API, Braintree Canadian Processing, Braintree APAC Processing, Braintree European Processing, Braintree United States Processing, npm, Inc. Registry Reads, Braintree 3D Secure, AWS ecr-us-east-1, AWS eks-us-east-1), Storage (Dataset, Request queue, Key-value store), and API (api.apify.com), Actors, Scheduler.