Source: ai-watch.dev — Real-time AI service status monitoring Period: April 1–30, 2026 Published: May 2026 Services monitored: 31 — 23 API services, 5 coding agents, 3 AI apps

Summary

  • Most reliable: Pinecone (100/100 — zero incidents, 99.84% uptime), Groq Cloud (93/100 — zero incidents, 100% uptime)
  • Riskiest this month: Gemini API (single 242h API key incident dominated April; Google publishes no comparable 30-day uptime metric, so the official figure is unavailable — see Notable Incidents #1), Deepgram (Score 55, 74h 20m longest, 16h 15m avg resolution)
  • High incident count, fast recovery: Together AI (139 incidents, avg 42m — up from 20 in March), Mistral (97 incidents, avg 8m — up from 7 in March). The two services run their status pages on different platforms (Together AI on BetterStack, Mistral on Instatus). The month-over-month jump is partly platform-reporting style — for Together AI, BetterStack’s recovery-period mechanism tends to register short state changes as separate down/resolved pairs (AIWatch deduplicates those, and the counts above are already after that filtering); for Mistral, AIWatch’s probe-corroboration filter was retuned in late April (#372), surfacing micro-incidents that earlier filtering had absorbed. Whether the residual reflects platform-reporting style or genuine micro-instability isn’t determinable from counts alone. What’s observable: fast recovery kept user-facing impact bounded by client-side retry, not extended unavailability
  • Watch out: Codex landed mid-month — partial 9-day window only (see Official Uptime). Anthropic per-model counts (Claude API 40 + claude.ai 37 + Claude Code 31) often track the same root event — see Incident Summary methodology before comparing across providers
Summary in Korean
  • 가장 안정적: Pinecone (100점, 인시던트 0건·업타임 99.84%), Groq Cloud (93점, 인시던트 0건·업타임 100%)
  • 이번 달 가장 위험: Gemini API (242시간짜리 API 키 장애 하나가 4월을 지배. Google이 비교 가능한 30일 업타임 지표를 공개하지 않아 공식 수치는 미제공 — Notable Incidents #1 참고), Deepgram (점수 55, 최장 74시간 20분·평균 복구 16시간 15분)
  • 잦은 장애, 빠른 복구: Together AI (139건, 평균 42분에 복구 — 3월 20건 대비 증가), Mistral (97건·평균 8분 — 3월 7건 대비 증가). 두 서비스는 서로 다른 status page 플랫폼을 사용합니다 — Together AI는 BetterStack, Mistral은 Instatus. 월간 건수 증가에는 측정 방식 변경도 일부 반영돼 있습니다. Together AI의 BetterStack은 짧은 상태 변화를 별개 down/resolved 쌍으로 기록하는 경향이 있어 AIWatch가 자체적으로 중복을 제거하고 있고, 위 139건은 이미 그 보정을 거친 수치입니다. Mistral은 4월 말 probe corroboration 필터가 재조정되면서(#372) 기존에 흡수되던 마이크로 장애가 surfacing 되기 시작했습니다. 남은 건수가 보고 방식 차이인지 실제 마이크로 장애인지는 건수만으로 단정하기 어렵습니다. 분명한 건, 빠른 복구 덕분에 사용자에게는 재시도 수준의 영향만 남았다는 점입니다.
  • 주의 필요: Codex는 월 중간에 추가돼 9일치 partial window 만 존재합니다(Official Uptime 섹션 참고). Anthropic의 Claude API 40건, claude.ai 37건, Claude Code 31건은 Opus·Sonnet·Haiku를 별도로 집계한 결과이며 같은 사건이 여러 번 잡힌 것 — 다른 provider 와 비교하기 전에 Incident Summary methodology 확인 권장.

Recommendations

Use CaseRecommendedWhy
Production-critical Cohere API (LLM), Pinecone (vector DB) Cohere 99.85% uptime / Score 85 / 3 incidents avg 36m — the strongest April reliability among general-LLM APIs. Pinecone 99.84% / Score 100 / zero incidents — solid choice for the vector DB / RAG layer that production AI apps often depend on alongside their LLM.
Low latency / cost Groq Cloud, Fireworks AI Groq 100% uptime / zero incidents; Fireworks 99.40% uptime / 7m avg recovery; p75 RTT 213ms / 210ms
Coding Agents Cursor, Windsurf Cursor 99.76% / Windsurf 99.84% full-month uptime; Score 88 / 89 (Good).
Voice / audio AssemblyAI (with fallback) AssemblyAI longest 48m vs ElevenLabs 19h 30m vs Deepgram 74h 20m — by far the shortest worst-case in the category. 22m avg recovery; the other two had multi-hour outages.
General purpose OpenAI API, OpenRouter OpenAI 97.44% uptime / Score 84 — the only major-LLM general-purpose API that finished April in the Good tier (Claude API 96.46% / Score 61 Fair, Gemini API Score 62 Fair with no published uptime, both struggled). OpenRouter 99.84% uptime / Score 82 routes to many of the same model families, useful as a fallback layer when a single upstream wobbles.

Key Insight

Patterns from April 2026 reliability data, reading beyond the Summary table’s service-by-service view: tooling-shift caveat for month-over-month reads, within-category spread, and the two-month-running concentration risk in the Major-LLM tier.

  • Single-month deltas reflect tooling changes too — not just vendor changes: April → March score comparisons this period are confounded by AIWatch-side changes that landed in April: (1) Score gained a new Responsiveness component (20% weight, sourced from probe p50 + stability CV) — previously-100-scoring services are now bounded by API speed even when uptime + incidents looked perfect, (2) grade thresholds tightened (Excellent 85 → 90, Good 70 → 75) to absorb the upward shift, (3) affected-days weighting moved to Atlassian-style impact (MAJOR=1.0, MINOR=0.3), and (4) Gemini gained aistudio.google.com/status as a second monitoring source on Apr 22, catching incidents the gcloud Vertex feed had missed. Most services moved 10–17 points downward as a result — Cohere 100 → 85, OpenRouter 99 → 82, Hugging Face 100 → 87, DeepSeek 92 → 82 — even those with no real reliability change. Gemini’s full 86 → 62 drop layers a real event (the 242h API key issue) on top of those formula shifts. May → April will be more comparable, but a few small May changes (codex/chatgpt uptime aggregate fix, incident grouping rework) still introduce friction; expect a fully apples-to-apples month-over-month from June onward.
  • Within categories, the spread is wide — vendor choice matters more than you’d expect: April’s Voice category split sharply — Deepgram (Score 55, 74h 20m longest), ElevenLabs (65, 19h 30m longest), AssemblyAI (82, 22m avg recovery). Same use case, a 200× spread in worst-case downtime. Coding agents told a similar story — Windsurf (Score 89) and Cursor (88) carried the top tier; GitHub Copilot (69, 84h 32m total) and Claude Code (66, 37h 56m) sat at the bottom; Codex (partial window). For production setups picking a single vendor in either category, the reliability cost of the wrong choice is significant.
  • Major-LLM concentration risk has shown up two months running, not just April: March had two Major-LLMs at Excellent — OpenAI (88, official uptime) and Gemini (86, no published 30-day uptime) — while Claude API sat at Fair (59) due to per-model component inflation. April widened the gap: Gemini joined Claude in Fair (62 / 61), leaving OpenAI alone in Good at 84. The pattern isn’t “two of three slipped this month” — it’s “the same single provider has consistently been the most reliable Major-LLM for at least two consecutive months (Excellent 88 in March, Good 84 in April).” Over two months of data, Major-LLM vendors aren’t equally reliable as failover candidates — single-month rankings make the gap easy to miss. (For concrete cross-tier and aggregator picks, see Recommendations.)
Key Insight in Korean

Summary 테이블이 보여주는 서비스 단위 결과 너머로 보이는 세 가지 패턴 — 월간 점수 비교 시 주의할 점 (측정 도구 변경 영향), 카테고리 내 격차, 두 달 연속 Major-LLM tier 단일 벤더 집중 리스크.

  • 한 달짜리 점수 변화는 벤더 변화뿐 아니라 측정 도구 변화도 반영: 4월 → 3월 점수 차이는 벤더의 신뢰성 변화만 반영하는 게 아닙니다 — 같은 기간 AIWatch 측정 인프라에도 네 가지 변경이 있었기 때문입니다: (1) 점수 산식에 새 Responsiveness 컴포넌트 추가 (20% 가중치, probe p50 + stability CV 기반) — 업타임과 인시던트가 완벽해도 API 속도·안정성에 따라 점수 상한이 결정됨, (2) 등급 기준 강화 (Excellent 85 → 90, Good 70 → 75) — 산식 변경으로 점수가 전체적으로 상향됐기 때문에 등급 기준도 함께 강화, (3) Affected-days 가중치를 Atlassian 방식 (MAJOR=1.0, MINOR=0.3)으로 조정, (4) Gemini에 aistudio.google.com/status 멀티 소스 추가 (4월 22일, gcloud Vertex 피드만으로는 놓쳤던 인시던트가 감지되기 시작). 그 결과 대부분 서비스가 10~17점 하락했습니다 — Cohere 100 → 85, OpenRouter 99 → 82, Hugging Face 100 → 87, DeepSeek 92 → 82 — 실제 신뢰성 변화가 없는 서비스도 함께 떨어졌습니다. Gemini의 86 → 62 하락은 이 산식 변경 위에 4월의 실제 사건(242시간 API 키 장애)이 겹친 결과입니다. 5월 → 4월 비교는 조건이 더 가까워지지만, 소규모 5월 변경(codex/chatgpt 업타임 집계 수정, incident grouping 재작업) 영향이 남아 있어 조건이 완전히 동일한 월간 비교는 6월부터 가능합니다.
  • 카테고리 내 격차가 크다 — 벤더 선택이 생각보다 더 중요: 4월 Voice 카테고리는 양극화가 뚜렷했습니다 — Deepgram (점수 55, 최장 74시간 20분), ElevenLabs (65, 최장 19시간 30분), AssemblyAI (82, 평균 복구 22분). 같은 사용 사례 안에서 최악 다운타임이 세 자릿수 차이. Coding agents도 비슷한 패턴 — Windsurf (점수 89), Cursor (88)가 상위; GitHub Copilot (69, 총 84시간 32분), Claude Code (66, 37시간 56분)가 하위; Codex (partial window). 두 카테고리 어느 쪽이든 단일 벤더로 production을 구성하면 잘못 선택했을 때 신뢰성 비용이 큽니다.
  • Major-LLM 단일 벤더 집중 리스크는 4월만의 현상이 아니라 두 달 연속 패턴: 3월에는 Major-LLM tier에서 점수상 Excellent였던 곳이 OpenAI (88, 공식 업타임)와 Gemini (86, 공식 30일 업타임 미공개) 둘이었고, Claude API는 모델별 컴포넌트 부풀림으로 이미 Fair (59)였습니다. 4월에 격차가 더 벌어져 Gemini가 Claude와 함께 Fair (62 / 61)로 떨어지고 OpenAI만 Good에 84점으로 남았습니다. 패턴은 "이번 달 두 곳이 떨어진 것"이 아니라 "같은 한 벤더가 두 달 연속 Major-LLM 중 가장 안정적이었던 것 (3월 Excellent 88, 4월 Good 84)"입니다. 두 달 연속 데이터로 보면 Major-LLM 벤더들 간 failover 후보로서의 신뢰성은 동등하지 않습니다 — 한 달치 순위만 봐서는 이 격차가 잘 드러나지 않습니다. (구체적 cross-tier / aggregator 선택지는 Recommendations 참고.)

Daily Service Status


AIWatch Score — April 2026 Reliability Rankings

AIWatch Score (0–100) is designed to answer one question:

“Which AI service is safest to rely on in production?”

Combines four components — Uptime (40%), Incident affected days (25%), Recovery speed (15%), Responsiveness (20%, derived from p75 probe RTT). The per-service p75 RTT figures feeding Responsiveness are listed in the API Response Time — Monthly p75 section below; full breakdown of weights, fallbacks, and penalties is in About This Report. How it’s calculated →

29 of 31 services ranked. Amazon Bedrock and Azure OpenAI are excluded from this ranking because neither publishes an accessible uptime metric — their Score would otherwise inherit an industry-average assumption rather than a measured value. Both finished April with zero observed incidents (see the “Zero incidents recorded” note under Incident Summary).

Rank Service Score Grade Uptime Source Why
1 Pinecone 100 Excellent Official Zero incidents, 99.84% uptime
2 Modal 94 Excellent Official 8 incidents, avg 4h 7m
3 Groq Cloud 93 Excellent Official Zero incidents, 100.00% uptime
4 Windsurf 89 Good Official 3 incidents, avg 6h 38m
5= Cursor 88 Good Official 20 incidents, avg 1h 11m
5= Fireworks AI 88 Good Official 30 incidents, fast recovery (avg 7m)
7 Hugging Face 87 Good Official 6 incidents, fast recovery (avg 9m)
8= Voyage AI 86 Good Official 1 incident, 11m
8= Codex † 86 Good Partial (9-day) 7 incidents, avg 1h 23m
10 Cohere API 85 Good Official 3 incidents, avg 36m
11 OpenAI API 84 Good Official 6 incidents, avg 6h 57m
12 Together AI 83 Good Official 139 incidents, avg 42m
13= DeepSeek API 82 Good Official 1 incident, 1h 4m
13= OpenRouter 82 Good Official 2 incidents, avg 1h 5m
13= AssemblyAI 82 Good Official 3 incidents, fast recovery (avg 22m)
16= xAI (Grok) 77 Good Estimate Zero incidents (no published 30-day uptime)
16= Stability AI 77 Good Official Zero incidents, 100.00% uptime
18= Mistral API 76 Good Estimate 97 incidents, fast recovery (avg 8m)
18= Perplexity 76 Good Estimate Zero incidents (no published 30-day uptime)
20 Character.AI 73 Fair Official 22 incidents, fast recovery (avg 24m)
21= Replicate 71 Fair Official 2 incidents, avg 38m
21= ChatGPT 71 Fair Official 15 incidents, avg 2h 28m
23 GitHub Copilot 69 Fair Official 26 incidents, avg 3h 15m
24 Claude Code 66 Fair Official 31 incidents, avg 1h 13m
25 ElevenLabs 65 Fair Official 5 incidents, avg 4h 26m
26 Gemini API 62 Fair Estimate 3 incidents, avg 117h 13m (dominated by 242h API key issue)
27= Claude API 61 Fair Official 40 incidents, avg 1h
27= claude.ai 61 Fair Official 37 incidents, avg 1h 6m
29 Deepgram 55 Fair Estimate 5 incidents, avg 16h 15m

Grade scale: Excellent (90+) · Good (75+) · Fair (55+) · Degrading (40+) · Unstable (<40)

† Codex was added to monitoring on 22 Apr 2026; only 9 days of data are available for this month, not directly comparable to full-month peers — see the Official Uptime note below.

AIWatch Score Rankings

Uptime Source column: Official (read directly from the service’s status page) · Estimate (no official metric; only the Score input is computed — the % itself is not surfaced) · Partial (9-day) (Codex was added on 22 Apr, mid-month). Full definitions: About This Report → Uptime Source.


Official Uptime (Primary Component)

Reference table. Official 30-day uptime metrics from each service’s status page (where published). The narrative-driven sections below (Incident Summary / Notable Incidents / Observations) cover what these numbers mean for vendor selection.

Amazon Bedrock, Azure OpenAI, ChatGPT, Deepgram, Gemini, Mistral, Perplexity, and xAI are excluded from this table — Bedrock / Azure OpenAI / Deepgram / Gemini / Mistral / Perplexity / xAI do not publish a rolling-30-day uptime percentage on their status pages; ChatGPT’s group-aggregate uptime calculation is being reworked (the 30-day figure on the live AIWatch dashboard is currently null pending that fix). xAI’s status page does expose per-endpoint live success rates measured since their monitoring system’s last restart, but those numbers are not directly comparable to the 30-day figures shown above.

Codex was added to monitoring on 22 Apr 2026; only 9 days of data exist for this month and the resulting aggregate is excluded from the table to avoid misleading comparison with full-month services. OpenAI’s own status.openai.com reported the Codex group at ~99.98% uptime during the same window — for context, not as an AIWatch-measured value.

ServiceUptime
Groq Cloud100.00%
Stability AI100.00%
Hugging Face99.97%
Modal99.95%
Cohere API99.85%
OpenRouter99.84%
Pinecone99.84%
Windsurf99.84%
AssemblyAI99.77%
Voyage AI99.77%
Replicate99.76%
Cursor99.76%
GitHub Copilot99.73%
DeepSeek API99.54%
Fireworks AI99.40%
Character.AI98.86%
OpenAI API97.44%
ElevenLabs97.27%
Claude Code96.85%
Claude API96.46%
Together AI96.22%
claude.ai95.66%

API Response Time — Monthly p75

These p75 figures are the input to the Responsiveness component (20% weight) of AIWatch Score. Lower is better. The two tables answer different questions: Score Rankings sorts by which service is safest to rely on (combining uptime, incidents, recovery, and responsiveness); this table sorts by which service is fastest at the network layer.

Rank Service p75 (ms)
1 Gemini API 140
2 Claude API 173
3 Fireworks AI 210
4 Groq Cloud 213
5 OpenAI API 223
6= Mistral API 234
6= Cohere API 234
8 Together AI 261
9 Perplexity 398
10 Hugging Face 414
11 OpenRouter 442
12 Replicate 480
13 xAI (Grok) 490
14 ElevenLabs 492
15 DeepSeek API 569
16 Voyage AI 699
17 Stability AI 741
18 AssemblyAI 885
19 Deepgram 2193

Note: Probe RTT measures direct API endpoint response time from Cloudflare Workers edge (5-min intervals). Values reflect network round-trip time, not inference latency. Services without probe coverage (Bedrock, Azure OpenAI, Pinecone) are excluded from rankings. p95 / spike-count / month-over-month columns will return once the underlying archive schema carries those fields.


Incident Summary

Note on methodology: Incident counts reflect all affected components per service. Anthropic in particular counts Opus / Sonnet / Haiku as separate components, so a single root event can appear three times across Claude API / claude.ai / Claude Code; other providers report at the service level. Higher incident count does not on its own mean lower reliability — adjust for granularity before comparing across providers. Official uptime % is based on a single primary component, so it isn’t directly comparable to the count column.

Live dashboard vs report counts: The numbers below are the unconsolidated monthly totals. The live ai-watch.dev dashboard groups same-title incidents on the same calendar day into a single cluster row, so what users see day-to-day is a smaller list — e.g., Mistral’s 97 monthly entries render as ~6 cluster rows on a recent snapshot. The report intentionally exposes the raw count so monthly comparisons stay consistent across services.

ServiceIncDowntime (longest)LongestAvg Resolution
Together AI13997h 49m (15h 16m)15h 16m42m
Mistral API9712h 15m (1h 14m)1h 14m8m
Claude API4039h 40m (5h 57m)5h 57m1h
claude.ai3740h 40m (5h 57m)5h 57m1h 6m
Claude Code3137h 56m (5h 57m)5h 57m1h 13m
Fireworks AI303h 19m (17m)17m7m
GitHub Copilot2684h 32m (15h 37m)15h 37m3h 15m
Character.AI228h 47m (4h 10m)4h 10m24m
Cursor2023h 39m (6h 23m)6h 23m1h 11m
ChatGPT1536h 59m (12h 20m)12h 20m2h 28m
Modal832h 53m (23h 2m)23h 2m4h 7m
Codex (9-day window)79h 38m (4h 13m)4h 13m1h 23m
OpenAI API641h 42m (36h 2m)36h 2m6h 57m
Hugging Face653m (15m)15m9m
ElevenLabs522h 10m (19h 30m)19h 30m4h 26m
Deepgram581h 14m (74h 20m)74h 20m16h 15m
Gemini API3351h 39m (242h)242h117h 13m
Cohere API31h 47m (1h 25m)1h 25m36m
AssemblyAI31h 5m (48m)48m22m
Windsurf319h 53m (14h 47m)14h 47m6h 38m
OpenRouter22h 10m (1h 5m)1h 5m1h 5m
Replicate21h 15m (48m)48m38m
DeepSeek API11h 4m (1h 4m)1h 4m1h 4m
Voyage AI111m (11m)11m11m

Zero incidents recorded (7 services): Groq Cloud, Pinecone, Stability AI — confirmed via published 30-day uptime metrics. Amazon Bedrock, Azure OpenAI, Perplexity, xAI (Grok) — AIWatch recorded no incidents, but these services don’t expose a comparable rolling 30-day uptime metric, so the zero count reflects AIWatch’s monitoring coverage as much as actual incident-free operation.


Notable Incidents

1. Gemini API — 10-Day API Key Issue (Apr 17–28)

Affected: Gemini API (newly-created keys) Duration: 242h

A single status page entry — “Gemini API is having some issues serving recently created keys” — remained open for ten days. This was the longest single incident across all 31 monitored services in April. (Google does not publish a comparable 30-day uptime metric for Gemini on either gcloud or aistudio.google.com/status, so a percentage cannot be cited.) New customer onboarding and key-rotation flows would have been the most affected paths; existing keys were not the documented scope. Two further incidents in the same month (65h 17m batch API issue, 44h 22m postpay upgrade disruption) compounded the impact.

This incident also prompted a mid-month change to AIWatch’s monitoring setup. The gcloud Vertex feed AIWatch had been polling does not surface direct outages on Google’s developer-facing Gemini API surface — exactly the surface affected here — so the issue showed up in our data only as the page-level indicator drifted, days into the event. On Apr 22, AIWatch added aistudio.google.com/status as a second monitoring source; incidents from either Google feed are now merged in the dashboard. Future Gemini-API-direct incidents of this shape should appear within minutes rather than days.


2. Deepgram — 74h Voice Agent Degradation

Affected: Voice Agent component Duration: 74h 20m

Deepgram’s longest April incident lasted just over three days, mirroring the same pattern as March 2026. Per the prior month’s writeup, Deepgram’s Voice Agent depends on upstream LLM providers — when one degrades, this surface degrades with it. Core STT/TTS APIs remained available. Multi-LLM fallback at the application layer is the documented mitigation.


3. OpenAI — Apr 20 Cluster (ChatGPT 12h 20m + API 36h 2m, separate incidents on independent components)

Affected: ChatGPT (15 incidents · 36h 59m total) and OpenAI API (6 incidents · 41h 42m total) Longest single events: ChatGPT 12h 20m · OpenAI API 36h 2m

ChatGPT’s longest incident this month was a 12h 20m window during the April 20 cluster. Across the full month ChatGPT recorded 15 incidents totaling 36h 59m (avg 2h 28m), while OpenAI API recorded 6 incidents totaling 41h 42m (avg 6h 57m, longest 36h 2m). Total impact was roughly comparable, but the shape differed — the consumer surface saw frequent shorter outages, the developer API saw fewer but much longer ones. The 36h 2m API-side and 12h 20m ChatGPT-side figures are separate incidents on independent components, not two views of the same event — OpenAI’s status page tracks ChatGPT and the developer API as distinct surfaces. (ChatGPT’s own 30-day uptime % is excluded from the table — see Official Uptime caveat above.)


4. GitHub Copilot — 26 Incidents, 84h 32m Total Downtime

Affected: Copilot Chat, Webhooks, Codespaces, Actions Longest: 15h 37m

Copilot continued its March 2026 pattern of frequent multi-component incidents. CI/CD pipelines and developer workflows that depend on full GitHub integration (not just AI completion) bore the brunt — Webhooks and Codespaces disruptions are the recurring failure mode. Average resolution was 3h 15m.


Observations

Actionable takeaways per service. Descriptive context for each event lives in earlier sections — Summary, Incident Summary, and Notable Incidents. This section is what to do with that data.

  • If you build on Gemini: prefer long-lived keys with rotation cadences ≥ monthly — newly-created keys were the affected scope of the 242h April incident. Monitor both gcloud Vertex and aistudio.google.com/status; they don’t always agree and direct-API outages often surface on AI Studio first.
  • If you build on Anthropic: monitor the per-model components (Opus / Sonnet / Haiku) individually rather than the aggregated count across them — single-model traffic isn’t well represented by the combined incident total, and your retry / failover decisions need per-model granularity.
  • If you build on Deepgram: configure multiple LLM providers in the Voice Agent for failover. The longest-incident this month traced back to upstream LLM dependency on that surface, so a single-LLM Voice Agent setup carries the full upstream blast radius.
  • If you build on Together AI or Mistral: standard exponential backoff with sub-minute initial retry absorbs the flap pattern; set client-side timeouts to cover the longest column in Incident Summary so your retry budget survives the worst case rather than the average.
  • Quietly reliable picks within their own role (not interchangeable — each fits a different use case): Hugging Face (6 incidents · 9m avg) for OSS model hosting; Modal (8 · 4h 7m avg / 99.95% uptime) for serverless GPU compute; Cohere API (3 · 36m avg) as a fallback within the LLM tier. Low incident count combined with fast recovery makes each resilient within its category — Hugging Face and Modal are not LLM-API substitutes, and Cohere doesn’t replace inference infrastructure.

Security Alerts

Note: Security alerts captured during the month from OSV.dev (AI SDK package vulnerabilities) and Hacker News (security posts mentioning monitored services). Section omitted for months without detections.

Total alerts: 5

By source

Source Count
OSV.dev 5

By severity

Critical High Medium Low
0 1 4 0

Most affected services

Service Count
Anthropic (Claude) 2
LangChain 2
Hugging Face 1

Top Findings

1. LangChain Core has Path Traversal vulnerabilites in legacy load_prompt functions · high

  • Source: OSV.dev
  • Affected: LangChain
  • Detected: 2026-04-24
  • Fix Version: 1.2.22

2. Claude SDK for Python has Insecure Default File Permissions in Local Filesystem Memory Tool · medium

  • Source: OSV.dev
  • Affected: Anthropic (Claude)
  • Detected: 2026-04-24
  • Fix Version: 0.87.0

3. Claude SDK for Python: Memory Tool Path Validation Race Condition Allows Sandbox Escape · medium

  • Source: OSV.dev
  • Affected: Anthropic (Claude)
  • Detected: 2026-04-24
  • Fix Version: 0.87.0

4. LangChain has incomplete f-string validation in prompt templates · medium

  • Source: OSV.dev
  • Affected: LangChain
  • Detected: 2026-04-24
  • Fix Version: 0.3.84

5. HuggingFace Transformers allows for arbitrary code execution in the Trainer class · medium

  • Source: OSV.dev (also published as GHSA-69w3-r845-3855)
  • Affected: Hugging Face
  • Detected: 2026-04-24
  • Fix Version: 5.0.0rc3

About This Report

  • Data Sources: Real-time data is aggregated from official status pages via multiple frameworks, including Atlassian Statuspage, incident.io, Google Cloud Status, Better Stack, Instatus, OnlineOrNot, and RSS feeds (Source: ai-watch.dev).
  • Monitoring Frequency: All 31 services are polled every 5 minutes via Cloudflare Workers. Health check probes measure direct API response times (RTT) at the same interval.
  • AIWatch Score (0–100): Calculated from four components — Uptime (40%), Incident affected days (25%), Recovery speed (15%), and Responsiveness (20%). Services without probe data use 80→100 score redistribution plus a 5% penalty to reflect the missing responsiveness signal. Services with fewer than 7 days of probe samples receive an additional insufficient-data penalty (Codex’s 9-day window this period satisfies the 7-day minimum, so no extra penalty applied). Full methodology: ai-watch.dev/#about-score
  • Uptime Source: Official = service publishes a rolling 30-day uptime metric AIWatch reads directly. Estimate = no official metric; AIWatch substitutes an industry-average assumption (99.5%) or its own poll-derived figure for the Score’s Uptime input. The estimated % itself is not surfaced as a percentage in this report — only its contribution to Score is shown — to stay consistent with the live AIWatch dashboard. Partial (Nd) = an official source exists but AIWatch’s measurement window is shorter than the full month (e.g. service newly tracked mid-month). The label only describes the Uptime input quality — the Score itself is computed identically across all services.
  • Incident Counting: Incident counts reflect all affected components per service. Providers differ in reporting granularity — Anthropic reports per-model incidents (Opus/Sonnet/Haiku each counted separately), while others report at the service level.
  • Uptime Metrics: Uptime percentages reflect official single-component figures provided by the status pages. Services marked with “—” do not provide a publicly accessible uptime metric.
  • Timezone Standard: All timestamps are recorded in UTC.

Next report: May 2026