Live AI inference benchmarks — auto-updated daily

GPU inference
performance
benchmarks

Independent AI inference benchmarks across NVIDIA, AMD and Intel GPUs. Real data from OpenAI, Anthropic, Azure OAI and open source models — automatically refreshed every 24 hours from live sources.

Models tracked
AI providers
5
GPUs benchmarked
v4.1
MLPerf round
Provider:
Sort:
Fetching live benchmark data...
Data sources & methodology
OpenRouter API ● LIVE
Real-time model pricing, availability, context lengths, and provider metadata across 360+ models.
Data typePricing, specs
RefreshDaily · 06:00 UTC
Coverage360+ models
MethodREST API
openrouter.ai/api/v1/models
Artificial Analysis ● LIVE
Independent throughput, latency, and TTFT benchmarks measured from real API calls across providers.
Data typeTokens/s, TTFT, latency
RefreshDaily · 06:00 UTC
MethodologyLive API calls
Independence3rd party verified
artificialanalysis.ai
MLPerf v4.1 PERIODIC
Official MLCommons inference benchmark results for H100, H200, A100, and MI300X GPU hardware.
Data typeGPU throughput
Roundv4.1 (latest)
GPU coverageH200, H100, A100, MI300X
Verified byMLCommons
mlcommons.org/benchmarks/inference
Together AI ● LIVE
Open source model performance data including Llama, Mistral, and other community models on cloud GPU infrastructure.
Data typeOSS model benchmarks
RefreshDaily · 06:00 UTC
FocusOpen source models
MethodREST API
api.together.xyz
Methodology note: Throughput figures (tokens/sec) are measured under controlled load conditions and may vary in production based on concurrent requests, prompt complexity, and provider infrastructure. Pricing data reflects list rates and may not include enterprise discounts. Rows marked indicate internally generated or synthetic benchmark data used to fill gaps where third-party measurements are unavailable — these should be treated as estimates. Context degradation curves are modelled from available data points. GPU specs sourced from vendor datasheets. For full methodology see seeth.gudetee@gmail.com.