Calculadora de Rendimiento NPU

Calcula TOPS a partir de unidades MAC y frecuencia de reloj, estima latencia de inferencia para modelos como YOLOv8 y LLaMA, y compara más de 8 NPUs con gráficos de barras SVG interactivos.

Number of Multiply-Accumulate units in the NPU

NPU operating frequency

Lower precision = higher TOPS but less accuracy

Typical real-world utilization is 50-80%

Peak TOPS

4.10 TOPS

Effective TOPS

2.87 TOPS

Equivalent GFLOPS

4.1 TFLOPS

¿Tienes una sugerencia?

Solicita una nueva herramienta o sugiere mejoras — ¡únete a nuestra comunidad en Slack!

Dejar feedback en Slack

What is an NPU Performance Calculator?

An NPU (Neural Processing Unit) Performance Calculator estimates the theoretical and real-world performance of dedicated AI accelerator chips. It calculates TOPS (Tera Operations Per Second) from the chip's MAC (Multiply-Accumulate) array size and clock frequency, considering different numerical precisions like INT8, FP16, and FP32. As AI PCs and edge AI devices become mainstream — with market penetration projected to reach 59% by 2026 — understanding NPU capabilities is essential for developers, hardware engineers, and system architects who need to evaluate whether a given NPU can run their AI models at acceptable latency and power budgets.

How to Use the NPU Performance Calculator

  1. Open the TOPS Calculator tab to compute raw NPU performance from MAC units, clock frequency, and precision (INT8/FP16/FP32)
  2. Adjust the utilization slider (typically 50-80%) to estimate real-world effective TOPS
  3. Switch to Inference Estimator tab to select an NPU preset and AI model preset
  4. View estimated inference latency (ms), FPS for vision models, or tokens/s for LLMs
  5. Use the NPU Comparison tab to select multiple NPUs and generate side-by-side comparison charts
  6. Compare TOPS and TOPS/W (power efficiency) across different NPU architectures
  7. Use custom inputs to enter specifications for NPUs not in the preset database

Frequently Asked Questions

What does TOPS mean and how is it calculated?

TOPS stands for Tera Operations Per Second — a measure of how many trillion operations an AI accelerator can perform per second. It is calculated as: TOPS = MAC Units × Clock Frequency (GHz) × 2. The ×2 factor accounts for each MAC unit performing one multiplication and one addition per clock cycle. For example, an NPU with 2048 MACs running at 1 GHz delivers 4.096 TOPS at INT8 precision.

Why does precision (INT8 vs FP16 vs FP32) affect NPU performance?

NPUs achieve maximum TOPS at INT8 precision because 8-bit integers require fewer transistors per operation, allowing more parallel computations. FP16 (half-precision float) typically delivers half the TOPS of INT8, while FP32 delivers one-quarter. Most inference workloads use INT8 or FP16 quantized models with minimal accuracy loss, making INT8 TOPS the most commonly cited spec.

How accurate are the inference time estimates?

The estimates are theoretical based on peak TOPS and model FLOPS. Real-world performance depends on memory bandwidth, data transfer overhead, model optimization (quantization, pruning), and software framework efficiency. Typical real-world utilization is 50-80% of peak TOPS. The utilization slider lets you adjust for these factors to get more realistic estimates.

Which NPU is best for running LLMs locally?

For local LLM inference, you need high TOPS and large memory bandwidth. As of 2024, AMD XDNA 2 (50 TOPS), Intel NPU 4 (48 TOPS), and Qualcomm Hexagon (45 TOPS) lead the PC NPU market. However, TOPS alone doesn't determine LLM performance — memory bandwidth and software optimization are equally important. Use the Inference Estimator tab to compare specific models across NPUs.

Herramientas Relacionadas