The `top` command, part 1: interpreting high-level CPU information

February 10, 2021

After running company software on some virtual machines and observing erratic resource usage behaviour, I thought it would be wise to understand what sort of data structures underpin a server and its computations. The top command is readily available in many environments to visualize process usage statistics, and has provided insights to countless people already.

Quick disclaimer: I'm aware that there are many tools that improve the visualizations of top (htop, atop, and more). I found that top was more than adequate to prompt a deep-dive in the fundamentals of process statistics and task scheduling. Let's jump right in!

Typical top output looks more-or-less like the following:

top - 13:21:37 up 103 days,  8:11,  1 user,  load average: 0.28, 0.74, 0.86
Tasks: 127 total,   1 running, 126 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.2 us,  0.2 sy,  0.0 ni, 99.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  3881240 total,   976772 free,   316568 used,  2587900 buff/cache
KiB Swap:  4194300 total,  4177276 free,    17024 used.  2430508 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
28186 root      20   0  162000   2272   1556 R   0.3  0.1   0:00.09 top
    1 root      20   0  193748   6352   2204 S   0.0  0.2  53:16.17 systemd
    2 root      20   0       0      0      0 S   0.0  0.0   0:03.54 kthreadd

# ... (124 more tasks)

In this article, I want to focus on discovering and explaining the high-level summary statistics (lines 1-3).

I intend to skip over the following items to limit the scope of the article:

Memory information
Deeper understanding of input/output
Deeper understanding of individual tasks

Last update time

top - 13:21:37

top will always print the system time at the moment of updating on-screen statistics.

Stangely enough, without specifying a different delay (and not finding any configuration overrides), I observed a delay of 3 seconds per update, while the man pages advertise a default delay of 1.5 seconds. Feel free to adjust this delay by pressing d and entering a new delay value in seconds.

System uptime

up 103 days, 8:11

The system uptime is the amount of time the system has been running since its last restart.

It's worh clarifying that the 8:11 is part of the 103 days (even if two spaces separate the information). We're meant to read this value as "103 days, 8 hours, and 11 minutes".

Logged-in users

1 user

This number is described as the number of users currently logged on.

There was no mention of exactly what 1 user means in top's man pages. However, scanning the end of the man pages revealed related commands. In particular, w appeared to also be a system metrics summary tool like top. Here's the output of w:

 14:13:29 up 103 days,  9:03,  1 user,  load average: 0.00, 0.01, 0.05
USER     TTY      FROM             LOGIN@   IDLE   JCPU   PCPU WHAT
root     pts/0    10.3.60.116      13:32    1.00s  0.10s  0.00s w

After hopping over to another terminal window and logging in as root again (without logging out of the first termina window), I was able to observe the mention of 2 users in top, w, and who as well, and observe the number drop back to 1 after logging out from the second terminal window.

Load average

load average: 0.00, 0.01, 0.05

These numbers represent the average number of running and waiting threads (tasks) measured across 1 minute, 5 minutes, and 15 minutes respectively. This is a very terse explanation.

The conclusion of Brendan Gregg - Linux Load Averages explains the number and its suggested interpretation:

These system load averages count the number of threads working and waiting to work, and are summarized as a triplet of exponentially-damped moving sum averages that use 1, 5, and 15 minutes as constants in an equation. This triplet of numbers lets you see if load is increasing or decreasing, and their greatest value may be for relative comparisons with themselves.

Explaining exponentially-damped moving sum averages is difficult. The key idea (shown by a graph in the article) is that recent measurements have a large weight while old measurements (even the measurements older than 1/5/15 minutes) have a small weight.

The interpretation of the system load average is different depending on the underlying operating system (quoted from the same document):

On Linux, load averages are (or try to be) "system load averages", for the system as a whole, measuring the number of threads that are working and waiting to work (CPU, disk, uninterruptible locks). Put differently, it measures the number of threads that aren't completely idle. Advantage: includes demand for different resources.

On other OSes, load averages are "CPU load averages", measuring the number of CPU running + CPU runnable threads. Advantage: can be easier to understand and reason about (for CPUs only).

I noticed that load average numbers update every 5 seconds even if top is set to a different delay (statistics refresh rate). Ray Walker - Examining Load Average explains that the Linux kernel produces the values in /proc/loadavg, and the refresh rate of 5 seconds is defined within the function that produces the values.

Task overview

Tasks: 127 total, 1 running, 126 sleeping, 0 stopped, 0 zombie

We're given the total number of tasks and their states at the moment of measurement by top.

I immediately got confused by the mention of "tasks" due to my preconceived notion of processes (lines of execution in isolation from each other) and threads (lines of execution that share resources such as filesystem info, memory space, signal handlers, set of open files). It turns out that in Linux, this type of thinking breaks down because all newly created "lines of execution" can be defined as a new task or process (terms appear interchangeable), and result in a new PID (process ID) regardless of whether any resources (or specific resources) are shared.

The clone system call documentation for the CLONE_THREAD flag mentions what can be considered a "thread" in Linux:

To make the remainder of the discussion of CLONE_THREAD more readable, the term "thread" is used to refer to the processes within a thread group.

From observing the fields, it appears that top's traditional view of tasks is by grouping tasks of the same thread group.

Refer to man top for a description of each field.
Notice how the CPU value is a summation of the threads in the thread group.

USER      PR  NI    VIRT    RES    SHR S %CPU  %MEM     TIME+ COMMAND                     PID  PPID   GID  PGRP  TGID
root      20   0  102784   4388    768 R 199.4  0.4   0:13.08 `- ackermann_multi         4016  3923     0  4016  4016

The H key allows us to toggle to a view of all tasks (now, threads are shown separately).

USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND                     PID  PPID   GID  PGRP  TGID
root      20   0  102784   4388    768 R 99.2  0.3   0:13.08 `- ackermann_multi         4016  3923     0  4016  4016
root      20   0  102784   4388    768 S  0.2  0.3   0:00.01     `- ackermann_multi     4017  3923     0  4016  4016
root      20   0  102784   4388    768 S  0.0  0.3   0:00.00     `- ackermann_multi     4018  3923     0  4016  4016
root      20   0  102784   4388    768 S 99.4  0.3   0:00.00     `- ackermann_multi     4019  3923     0  4016  4016

It can display system summary information as well as a list of processes or threads currently being managed by the Linux kernel.

Process states

Running: The process is currently using the CPU.

Sleeping: The process is NOT using the CPU (NOT directly controlled by users)

I believe (?) this encompasses various waiting states such as runnable, interruptible sleep, and uninterruptible sleep (further research and experimentation needed).
The scheduler will put the running process aside in favor of another process for various reasons. Exact reasons would be a topic of further research and experimentation.
We typically observe many processes sleeping in the OS until events (input-output, or other) wake them.

Stopped: The process is NOT using the CPU (directly controlled by users).

The process has been suspended by Ctrl-z or kill --STOP <pid>
Consult Dave McKay - How to Run and Control Background Processes on Linux for job control strategies (stop/resume processes, foreground/background processes, and related signals)

Zombie: The process is no longer executing, and its process descriptor is still in memory.

Under normal circumstances, this period of time is incredibly small for a given child process and the cleanup initiated by the parent process succeeds.
The process descriptor (zombie process) lingers if the parent process is unable to obtain the terminated process's status.
A real problem occurs when too many process descriptors (zombie or not) prevent other processes from being created.
More details about in the wait system call documentation under "Notes".

Aggregated CPU percentages

%Cpu(s): 0.2 us, 0.2 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st

From the man pages for top...

us, user : time running un-niced user processes

An un-niced process is a process run with unaltered priority or higher priority (same or lower nice value).

This relates to the concept of niceness, which is explained in more detail in the experimentation post following this article.

sy, system : time running kernel processes

Kernel processes can be thought of as the execution time spent performing system calls. From Wikipedia - System Calls, we get a glimpse at how system calls work:

[System calls are] the programmatic way in which a computer program requests a service from the kernel of the operating system on which it is executed.

The library's wrapper functions expose an ordinary function calling convention (a subroutine call on the assembly level) for using the system call, as well as making the system call more modular. Here, the primary function of the wrapper is to place all the arguments to be passed to the system call in the appropriate processor registers (and maybe on the call stack as well), and also setting a unique system call number for the kernel to call. In this way the library, which exists between the OS and the application, increases portability.

ni, nice : time running niced user processes

A niced process is a process run with lower priority (higher nice value).

This relates to the concept of niceness, which is explained in more detail in the experimentation post following this article.

id, idle : time spent in the kernel idle handler

If the CPU has no work to do, the CPU is still doing something useful. The idle handler will run when no other work can be done by the processor. The idle handler listens for timer or peripheral interrupts, and applies strategies to reduce system power consumption.

More details in Stack Exchange - Idle CPU Process

wa, IO-wait : time waiting for I/O completion

From What exactly is iowait?

To summarize it in one sentence, 'iowait' is the percentage of time the CPU is idle AND there is at least one I/O in progress.

If the CPU is idle, the kernel then determines if there is at least one I/O currently in progress to either a local disk or a remotely mounted disk (NFS) which had been initiated from that CPU. If there is, then the 'iowait' counter is incremented by one. If there is no I/O in progress that was initiated from that CPU, the 'idle' counter is incremented by one.

hi : time spent servicing hardware interrupts

From Wikipedia - Interrupt

A hardware interrupt is a condition related to the state of the hardware that may be signaled by an external hardware device [...] to communicate that the device needs attention from the operating system (OS)

si : time spent servicing software interrupts

From Wikipedia - Interrupt

A software interrupt is requested by the processor itself upon executing particular instructions or when certain conditions are met.

st : time stolen from this vm by the hypervisor

From Stack Exchange - CPU Usage

It represents time when the real CPU was not available to the current virtual machine — it was "stolen" from that VM by the hypervisor (either to run another VM, or for its own needs).

Alternate resource for even MORE content about usage numbers and troubleshooting tips

Derek Haynes - Understanding Linux CPU stats