AIWatch — GitHub Copilot Incidents

🟢 GitHub Copilot: Resolved — Disruption with Copilot next edit suggestions

Wed, 17 Jun 2026 19:28:16 GMT

🟢 Resolved · lasted 1h 31m

On June 17, 2026, between 16:57 UTC and 19:14 UTC, Copilot code completions were degraded and users were unable to receive Next Edit Suggestions. Standard ghost text code completions were not affected. This was due to a configuration change that caused the service's routing layer to incorrectly discard all Next Edit Suggestion model endpoints as invalid.

We mitigated the incident by deploying a corrected configuration change at 18:55 UTC, with full recovery observed at 19:14 UTC.

We are working to improve the resilience of our routing layer to limit impact due to a subset of invalid configurations, and to improve our alerting to detect sudden traffic changes that are not captured by standard error rate monitors.

🟢 GitHub Copilot: Resolved — Incident with Copilot Availability

Wed, 17 Jun 2026 04:44:07 GMT

🟢 Resolved · lasted 54m

This incident has been resolved. Thank you for your patience and understanding as we addressed this issue. A detailed root cause analysis will be shared as soon as it is available.

🟢 GitHub Copilot: Resolved — Disruption with some GitHub services

Tue, 16 Jun 2026 18:15:24 GMT

🟢 Resolved · lasted 30m

This incident has been resolved. Thank you for your patience and understanding as we addressed this issue. A detailed root cause analysis will be shared as soon as it is available.

🟢 GitHub Copilot: Resolved — Disruption with Claude Opus 4.7

Mon, 08 Jun 2026 10:03:32 GMT

🟢 Resolved · lasted 58m

On June 8, 2026, between 08:40 UTC and 09:30 UTC, the Claude Opus 4.7 model experienced degraded availability with error rates peaking at 8.4% and averaging 1.9%. This was due to an upstream provider issue that caused temporary unavailability and rate limiting on secondary failover systems. Users selecting Auto or alternative models were unaffected. We are improving provider failover mechanisms and monitoring to prevent similar issues.

🟢 GitHub Copilot: Resolved — Copilot Code Review Failing

Thu, 04 Jun 2026 19:59:27 GMT

🟢 Resolved · lasted 1h 57m

On June 4, 2026, from 17:30 UTC to 18:55 UTC, Copilot Code Review experienced elevated failures for review requests on GitHub.com. Affected users saw “Copilot ran into an error” on pull requests when requesting a code review.

During the incident window, an average of 81.6% of Copilot Code Review requests failed, with a peak failure rate of 93.9%. Approximately 36,800 code review requests failed. GitHub Enterprise Cloud with data residency was not impacted.

The issue was caused by a newly released dependency used by the Copilot Code Review processing workflow. The release introduced an incompatibility with the runtime environment. Because the workflow automatically consumed the latest release, the incompatible version was picked up without sufficient compatibility validation and caused review processing to fail.

We mitigated the incident by removing the problematic dependency version and redeploying the affected processing service. New code reviews began recovering at 18:44 UTC, and the failure rate returned to baseline by 18:55 UTC. Remaining timed-out work drained by 19:59 UTC.

To reduce the risk of recurrence, we are pinning the dependency version instead of automatically consuming the latest release, adding compatibility checks for future releases, improving fast-failure behavior when the review processor cannot start, adding shorter timeout controls for review workflows, and improving monitoring for review completion failures.

🟢 GitHub Copilot: Resolved — Disruption with OpenAI Models

Thu, 28 May 2026 20:41:58 GMT

🟢 Resolved · lasted 1h 41m

On May 28th, 2026, between approximately 18:27 and 20:41 UTC, the GitHub Copilot service was degraded due to an issue with the Responses API of an upstream provider affecting the GPT-5.2, GPT-5.3-Codex, GPT-5.4, and GPT-5.5 models. Requests routed to these models via the Responses API returned elevated error rates, which also affected Copilot coding agent and Copilot code review. No other models were impacted.

We mitigated the incident by shifting traffic away from the affected models while the upstream provider deployed a fix.

GitHub is working to improve automated failover for the affected models and strengthen monitoring to prevent similar incidents in the future.

🟢 GitHub Copilot: Resolved — Disruption with some GitHub services

Tue, 26 May 2026 16:35:57 GMT

🟢 Resolved · lasted 52m

On May 26, 2026, between 15:10 UTC and 16:35 UTC the Copilot service was degraded and many models were no longer available for use. On average, the error rate was ~5% and peaked at 11% of requests to the service. This was due to a change that introduced a configuration mismatch in HMAC signing credentials which caused the list of available models to be truncated. This was mitigated by rolling back the change. This rollback was complete by 15:34 UTC though users continued to see impact until cache TTLs expired.

We are working to improve our monitoring and error handling to reduce time to detection and better experience for issues like this in the future.

🟢 GitHub Copilot: Resolved — Incident with Copilot

Tue, 19 May 2026 05:30:00 GMT

🟢 Resolved · lasted 1m

On May 19, 2026, between 05:30 UTC and 14:50 UTC, some Copilot users experienced failures when using code completions, chat sessions, and cloud agent sessions. At peak impact, approximately 13% of Copilot API requests failed, and approximately 24% of remote sessions failed to initialize. A partial mitigation at 08:16 UTC reduced the Copilot API error rate to approximately 0.3%, but intermittent failures persisted until a full fix was deployed at 14:15 UTC and recovery was verified by 14:50 UTC. The incident was caused by rate limits being exceeded on a shared infrastructure component. A recently enabled feature increased call volume to this component, and the combined load exceeded capacity limits as traffic increased during business hours. We mitigated the incident by deploying a caching layer to reduce load on shared infrastructure. To prevent recurrence, we are separating rate limit scopes between services, adding monitoring for internal dependency rate limiting, and reducing redundant calls.

🟢 GitHub Copilot: Resolved — Incident with multiple GitHub services

Thu, 23 Apr 2026 17:30:49 GMT

🟢 Resolved · lasted 1h 19m

On April 23, 2026, between 16:03 UTC and 17:27 UTC, multiple GitHub services experienced elevated error rates and degraded performance due to DNS resolution failures originating from our DNS infrastructure in our VA3 datacenter. Approximately 5–7% of overall traffic was affected during the impact window:

- Webhooks: ~0.35% of API requests returned 5xx (peak ~0.39%). ~0.88% of requests exceeded 3s latency; at peak, >3s responses represented ~10% of Webhooks API traffic.

- Copilot Metrics: ~9% of Copilot Insights dashboard requests returned 5xx.

- Copilot cloud agents: ~10% of cloud agent sessions were affected and failing.

- Octoshift: 0.88% of active repo migrations failed and 79% saw elevated durations (avg. 5.2 min) during this period.

- Git Operations: averaged 1.25% errors over the duration of the incident, with a peak of 2.07% errors.

- Actions: Workflow run status updates experienced delays of up to ~8s over the duration of the incident window.

Our DNS infrastructure in VA3 entered a degraded state and began intermittently returning NXDOMAIN responses and timing out on lookups for both internal service discovery and external endpoints. This caused a cascading impact across the dependent services listed above.

We identified a specific load pattern under which our DNS resolvers began failing. The evidence points to a recently introduced traffic-balancing mechanism, rolled out progressively to support our growth, as the root cause. We have since reverted this change.

We are immediately prioritizing investments in a more controlled rollout and validation process, including a dedicated environment to safely shadow production DNS traffic and detect these failure modes before they can affect production.

🟢 GitHub Copilot: Resolved — Investigating errors on GitHub

Thu, 23 Apr 2026 15:18:41 GMT

🟢 Resolved · lasted 39m

On April 23, 2026 between 14:30 UTC and 15:18 UTC multiple services were degraded on github.com. During this time approximately 1.5% of all web requests resulted in a 5xx status and unicorn pages for github.com users. We also saw elevated error rates across Actions workflow runs, Copilot, Codespaces and Packages, leading to degraded experiences during this timeframe. Codespaces impact peaked at 45% failures for create requests and 65% failures for resume requests. Packages impact was mainly Maven related with 50% failure rates in downloads and 70% failure rates in uploads. Actions experienced a peak of 8% of failed jobs and up to 85% of jobs impacted by run start delays of more than 5 minutes.

This was due to a configuration change to an internal billing service that led to a cache being overwhelmed and causing requests to time out. These timeouts cascaded across multiple services and eventually caused requests to queue up and exhaust web request workers.

This configuration change was reverted at 14:42 UTC and following this, all services began to see recovery immediately.

To prevent this situation in the future, we are taking steps to ensure that failures and timeouts in the billing service don’t cascade to other services causing impact. This includes implementing more aggressive timeouts on callers of these billing services, adding circuit breaker configurations for cache timeouts and using more resilient cache options. We have also decreased max request timeouts within the billing service that caused impact and added more capacity to our cache to prevent traffic spikes from having the same impact.