Real-Time GPU Utilization Monitoring: An In-Depth Overview
Introduction To monitor GPU utilization in real time on Linux, the quickest method is executing . This command updates GPU statistics every second, displaying core utilization,...
Introduction
To monitor GPU utilization in real time on Linux, the quickest method is executing nvidia-smi --loop=1. This command updates GPU statistics every second, displaying core utilization, VRAM usage, temperature, and power draw. Real-time GPU monitoring starts with nvidia-smi and extends to process-specific views, container metrics, and alerts for lengthy tasks. This guide outlines command-level workflows applicable on Ubuntu, GPU droplets, Docker hosts, and Kubernetes clusters. For those developing deep learning systems, use this guide alongside setting up a deep learning environment on Ubuntu.

Key Takeaways
- Use
nvidia-smi --loop=1for rapid host-level GPU checks on Linux. - Use
nvidia-smi pmon -s umto detect which PID is utilizing GPU cores and memory bandwidth. - For terminal dashboards,
nvtopoffers interactive drill-downs, whilegpustatprovides lightweight snapshots. - In containers and Kubernetes, expose metrics via NVIDIA runtime support and DCGM Exporter.
- Persistent alerting should be configured in monitoring platforms like Datadog Agent or Zabbix templates.
- GPU memory and core utilization are distinct signals, with high memory but low core usage common in input-stalled jobs.
- On Windows, Unified GPU Usage Monitoring consolidates engine activity, viewable in Task Manager and WMI.

Understanding GPU Utilization Metrics
GPU utilization metrics indicate whether a job is compute-bound, memory-bound, input-bound, or idle. Track core utilization, memory usage, memory controller load, temperature, and power draw collectively rather than individually.
GPU Core Utilization vs. Memory Utilization
GPU core utilization reflects the percentage of time kernels actively execute on streaming multiprocessors (SMs) during the sampling window. GPU memory utilization often refers to memory controller activity, while memory usage indicates allocated VRAM in MiB. Low core utilization with high VRAM typically suggests the model is present but waiting on data or synchronization.
SM Utilization, Memory Bandwidth, and Power Draw
SM utilization reveals CUDA core activity, memory bandwidth shows how intensively memory channels are used, and power draw indicates electrical load relative to the card limit. These metrics together elucidate why workloads with similar utilization rates can perform differently.
Importance for Deep Learning Workloads
These metrics are crucial because training throughput is limited by the slowest pipeline stage. If GPU cores stay idle while CPU or storage is saturated, adding more GPUs won't enhance throughput.
GPU Bottlenecks and Out of Memory Errors
Most GPU-related issues in ML pipelines arise from input bottlenecks or VRAM pressure. Diagnose both by sampling GPU, CPU, and process-level memory during a real training job.
CPU Preprocessing Bottlenecks
If CPU preprocessing is the bottleneck, GPU utilization decreases between mini-batches even when VRAM is allocated. This pattern occurs when operations like image decoding, augmentation, or tokenization are slower than kernel execution.
Resolving Out of Memory (OOM) Errors
OOM errors occur when requested allocations surpass available VRAM, often due to large batch sizes or concurrent processes. Solutions include reducing batch size, using gradient accumulation, enabling mixed precision, terminating stale processes, and optimizing transform stages.
Monitoring GPU Utilization with nvidia-smi
nvidia-smi is the fastest tool for real-time GPU telemetry on Linux servers, available with NVIDIA drivers. It documents fields used by most higher-level integrations.
Basic nvidia-smi Output
Running nvidia-smi without flags gives a comprehensive snapshot of GPU and process state, focusing on GPU-Util, Memory-Usage, Temp, and Pwr:Usage/Cap.
Running nvidia-smi in Continuous Loop Mode
Use loop mode for live updates without scripts. --loop=1 refreshes every second.
Logging nvidia-smi Output to a File
Redirect sampled output to a file for later inspection.
Querying Specific Metrics with nvidia-smi --query-gpu
Use --query-gpu with --format=csv for parseable output in scripts, ideal for cron jobs and custom exporters.
Per-Process GPU Monitoring
Per-process monitoring identifies which application is consuming GPU time. Use nvidia-smi pmon for utilization by PID.
Using nvidia-smi pmon for Process-Level Metrics
Run pmon in loop mode to monitor active compute processes. -s um displays utilization and memory throughput per process.
Correlating Process IDs to Application Names
Map PIDs to full command lines to identify notebook kernels, training scripts, and inference workers.
Interactive GPU Monitoring with nvtop and gpustat
nvtop provides interactive process control, while gpustat offers compact snapshots in scripts. Both complement nvidia-smi.
Installing and Running nvtop
Install nvtop and start it in the terminal for live bars and per-process views.
Installing and Running gpustat
Install gpustat with pip and use watch mode for updates.
Choosing Between nvtop, gpustat, and nvidia-smi
Use nvidia-smi for core data, gpustat for terminal snapshots, and nvtop for interactive debugging.
GPU Monitoring with Glances
Install Glances with the GPU extra for a single terminal dashboard covering GPU, CPU, memory, disk, and network.
GPU Monitoring Inside Docker and Kubernetes
Containerized GPU monitoring requires host runtime support and workload-level metric collection. Use NVIDIA Container Toolkit for Docker and DCGM Exporter for Kubernetes.
Exposing GPU Metrics in Docker
Install the NVIDIA Container Toolkit on the host, then run containers with --gpus all.
Monitoring GPU Utilization in Kubernetes
Deploy DCGM Exporter as a DaemonSet on GPU nodes to expose Prometheus metrics.
Setting Up Persistent GPU Monitoring
With Datadog
Install Datadog Agent on each GPU node and enable the NVIDIA integration for long-term retention and alerting.
With Zabbix
Install the Zabbix agent on GPU hosts and attach an NVIDIA GPU template, configuring trigger thresholds for utilization and temperature.
Unified GPU Usage Monitoring on Windows
Unified monitoring combines multiple engine activities into a single utilization view, configurable via NVIDIA Control Panel and registry settings.
Comparing GPU Monitoring Tools
Utilize this table to choose a tool based on data depth, overhead, and alerting needs. Start with CLI tools for diagnostics, progressing to Datadog, Zabbix, or DCGM for persistent monitoring.
Conclusion
Real-time GPU monitoring is vital for optimizing deep learning performance, troubleshooting bottlenecks, and ensuring efficient resource usage. Choose the right tool based on your specific needs to monitor, troubleshoot, and maximize GPU workload performance effectively.