Live AI inference benchmarks — auto-updated daily

GPU inference
performance
benchmarks

Independent AI inference benchmarks across NVIDIA, AMD and Intel GPUs. Real data from OpenAI, Anthropic, Azure OAI and open source models — automatically refreshed every 24 hours from live sources.

—

Models tracked

—

AI providers

GPUs benchmarked

v4.1

MLPerf round

Fetching live benchmark data...

OpenRouter API ● LIVE

Real-time model pricing, availability, context lengths, and provider metadata across 360+ models.

Data typePricing, specs

RefreshDaily · 06:00 UTC

Coverage360+ models

MethodREST API

openrouter.ai/api/v1/models

Artificial Analysis ● LIVE

Independent throughput, latency, and TTFT benchmarks measured from real API calls across providers.

Data typeTokens/s, TTFT, latency

RefreshDaily · 06:00 UTC

MethodologyLive API calls

Independence3rd party verified

artificialanalysis.ai

MLPerf v4.1 PERIODIC

Official MLCommons inference benchmark results for H100, H200, A100, and MI300X GPU hardware.

Data typeGPU throughput

Roundv4.1 (latest)

GPU coverageH200, H100, A100, MI300X

Verified byMLCommons

mlcommons.org/benchmarks/inference

Together AI ● LIVE

Open source model performance data including Llama, Mistral, and other community models on cloud GPU infrastructure.

Data typeOSS model benchmarks

RefreshDaily · 06:00 UTC

FocusOpen source models

MethodREST API

api.together.xyz

⚠

Methodology note: Throughput figures (tokens/sec) are measured under controlled load conditions and may vary in production based on concurrent requests, prompt complexity, and provider infrastructure. Pricing data reflects list rates and may not include enterprise discounts. Rows marked ✱ indicate internally generated or synthetic benchmark data used to fill gaps where third-party measurements are unavailable — these should be treated as estimates. Context degradation curves are modelled from available data points. GPU specs sourced from vendor datasheets. For full methodology see seeth.gudetee@gmail.com.

GPU inferenceperformancebenchmarks

GPU inference
performance
benchmarks