NVIDIA published AgentPerf results from Artificial Analysis on June 12, 2026. The important point is not only that new GPUs are faster. Agentic AI is beginning to get infrastructure benchmarks that resemble real workflows more closely than single-response LLM tests.

NVIDIA draws a clear distinction between conversational AI and agentic AI. A chat completion is a short sprint: one model call and one response. An agent behaves more like a long relay. It breaks a goal into steps, observes, reasons, uses tools, and continues until the task is done. Real tasks may include dozens or hundreds of LLM calls plus tool delays from code execution, database search, compilation, and web browsing.

That means traditional inference benchmarks may not capture the stress of production agents. Enterprises need to know how responsive agents are, how many agentic tasks can run at once, and how much useful work is delivered per dollar and per watt. AgentPerf is designed to measure that layer.

In the first published round, NVIDIA GB300 NVL72 led the benchmark on a DeepSeek V4 Pro agentic workload. NVIDIA says GB300 NVL72 can run up to 20 times more agents per megawatt than the HGX H200 system. The advantage comes from rack-scale design, 72 connected GPUs, CUDA kernels, TensorRT LLM, and full-stack optimization for mixture-of-experts model execution.

The methodology is also notable. AgentPerf is built from real coding-agent trajectories: an agent receives a task, reads files, writes and edits code, runs commands, and iterates based on results. Tool calls are simulated with representative CPU processing time, so the results focus on accelerated computing performance.

For enterprises buying AI infrastructure, this matters. When agents move from demos into everyday workflows, the cost limit is not only model pricing. It is whether the infrastructure can handle long-context, multi-tool, multi-step work at high concurrency. The number of concurrent agents per accelerator, rack, and megawatt directly affects the economics of AI automation.

Overall, NVIDIA's AgentPerf update shows agentic AI entering an infrastructure competition. Model quality remains important, but real deployment also depends on latency, throughput, energy efficiency, tool-call overhead, and system scalability. The more AI agents behave like workflows, the less useful it becomes to measure only a single fast answer.

NVIDIA Blackwell leads AgentPerf as AI agents get a more realistic infrastructure benchmark

More insights

Google I/O 2026 shows how Gemini, AI Studio, and Antigravity are entering real production workflows

OpenAI expands Codex for every role with plugins, Sites, and annotations

OpenAI’s Ona acquisition points Codex toward persistent cloud agent workspaces