We have implemented a fix, and the response time has been recovering since 09:30 UTC. As of 10:30 UTC, the response time has returned to normal. We will continue to monitor the response time. We apologize for any inconvenience caused.
Summary: We experienced a partial outage of the scheduler and degraded performance of Apify API.
Timeframe: The issue started at a small scale from 11.00 UTC and then escalated around 15.00 UTC. By 16.15 UTC, the issue was resolved.
Impact:
- During the incident, some scheduled Actors did not start or were run twice.
- In response, we've aborted some of the runs that started after 15:00 UTC and turned off the Schedule component for approximately 45 minutes.
- If your runs were aborted, they can now be safely resurrected; see: https://docs.apify.com/platform/actors/running/runs-and-builds#resurrection-of-finished-run
- There was no data loss for runs in progress, but some runs were not triggered.
- API performance was degraded at various levels along the incident, and some runs started via API may have been in a READY state for longer than usual.
We're sorry for the inconvenience. We take this issue seriously and will implement measures to mitigate similar issues in the future.
We will provide a post-mortem summary here with more details later this week.