The `top` command, part 1.5: `top` experimentation
A lot of content written for "The top
command, part 1" was validated with some experimentation. This article highlights some of the experimentation I've done to highlight differences in top
output when observing different computational loads and task scheduler priorities.
After preparing a cpu-bound program, I experiment with different runs of the program (single vs multi process, different priorities) to see how top visualizes its usage.
Preparing a CPU-bound program - ackermann_multi
I wanted to prepare a program that was CPU-intensive and had a predictable runtime in the tens of seconds. I decided to use a program that computes the Ackermann function for reasonably difficult parameters.
The program computes ackermann(3, 0)
, ackermann(3, 0)
, ackermann(3, 1)
, ackermann(3, 1)
, ..., ackermann(3, 13)
, ackermann(3, 13)
. I discuss the execution time and the choice of computing each result twice later in the article.
package main
import (
"log"
"time"
)
func main() {
var wg sync.WaitGroup
for j := 0; j <= 13; j++ {
wg.Add(2)
go func(j int) {
defer wg.Done()
runTest(3, j)
}(j)
go func(j int) {
defer wg.Done()
runTest(3, j)
}(j)
}
wg.Wait()
}
func runTest(i, j int) {
start := time.Now()
ans := ackermann(i, j)
elapsed := time.Since(start)
log.Printf("Calculated `ackermann(%d, %d) = %d` in %s", i, j, ans, elapsed)
}
func ackermann(m, n int) (ans int) {
if m == 0 {
return n + 1
} else if n == 0 {
return ackermann(m-1, 1)
} else {
return ackermann(m-1, ackermann(m, n-1))
}
}
Experiment 1 - single core vs multi-core
Single core
The first implementation of my program had a different main function that ran each calculation once (not twice) on the default goroutine. Ex: ackermann(3, 0), ackermann(3, 1), ..., ackermann(3, 13)
func main() {
for j := 0; j <= 13; j++ {
runTest(3, j)
}
}
The effort is kept to one of my two CPU cores, and ends up taking ~35 seconds on my CentOS 8 virtual machine.
Here is the output from top
in a separate window during a run of ./ackermann_single
top
, press H
(threads view)
%Cpu(s): 50.5 us, 0.5 sy, 0.0 ni, 49.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3797 root 20 0 102784 4392 768 R 99.9 0.1 0:14.34 ackermann_singl
Throughout the 35 seconds of execution, CPU usage stays constant at 50%. What was most surprising was that computations were not exclusively directed at the same core...
top
, press H
(threads view), press 1
(single-CPU view)
top - 20:03:48 up 105 days, 14:53, 2 users, load average: 0.65, 0.33, 0.20
Threads: 249 total, 2 running, 247 sleeping, 0 stopped, 0 zombie
%Cpu0 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 : 1.0 us, 1.0 sy, 0.0 ni, 98.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 3881240 total, 1892088 free, 244080 used, 1745072 buff/cache
KiB Swap: 4194300 total, 3899260 free, 295040 used. 2659276 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3790 root 20 0 104192 7556 776 R 99.9 0.2 0:24.25 ackermann_singl
That's one CPU core at 100%, and subsequent top
refreshes (same run) would show the other CPU core spiking to 100% as well, but always with an overall usage of the CPU at 50%. The CPU usage share would go back and forth between the two cores.
top - 19:59:31 up 105 days, 14:49, 2 users, load average: 0.25, 0.17, 0.13
Threads: 248 total, 2 running, 246 sleeping, 0 stopped, 0 zombie
%Cpu0 : 1.0 us, 0.0 sy, 0.0 ni, 99.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 3881240 total, 1895080 free, 241368 used, 1744792 buff/cache
KiB Swap: 4194300 total, 3899260 free, 295040 used. 2662020 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3750 root 20 0 102784 4924 760 R 99.0 0.1 0:17.15 ackermann_singl
3749 root 20 0 162132 2472 1616 R 1.0 0.1 0:00.34 top
I was not able to explain why the process would switch CPUs throughout the run. Many internet posts hint at the scheduling algorithm adapting to other significant work on the same machine, but as far as I can tell, there was no other significant work to switch with.
load average: 0.95, 0.61, 0.31
- After 3-4 minutes of repeatedly running my program, I saw a 1-minute load average up to 0.95 in
top
.
Multi-core
To make the program as computationally efficient as possible, I rewrote the main function and introduced a new goroutine for each computation. Note: the programmer can prepare many goroutines, and Go runtime itself takes care of spreading the goroutines among different OS threads.
Also, to keep the runtime the same and make it likely for the scheduler to spread work evenly between my two CPU cores, I duplicated each calculation.
func main() {
var wg sync.WaitGroup
for j := 0; j <= 13; j++ {
wg.Add(2)
go func(j int) {
defer wg.Done()
runTest(3, j)
}(j)
go func(j int) {
defer wg.Done()
runTest(3, j)
}(j)
}
wg.Wait()
}
top
, press H
(threads view)
%Cpu(s): 99.0 us, 0.5 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.5 si, 0.0 st
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3829 root 20 0 103040 8884 772 R 99.0 0.2 0:07.77 ackermann_multi
3826 root 20 0 103040 8884 772 R 97.1 0.2 0:07.67 ackermann_multi
Here, CPU usage is maxed out at 100%, and we also see it in the single-CPU view.
top
, press H
(threads view), press 1
(single-CPU view)
%Cpu0 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3819 root 20 0 103040 10996 776 R 99.9 0.3 0:19.44 ackermann_multi
3822 root 20 0 103040 10996 776 R 99.9 0.3 0:19.23 ackermann_multi
load average: 2.52, 1.57, 0.79
- After 3-4 minutes of repeatedly running my program, I saw a 1-minute load average of 2.52 in
top
. - This highlights that load averages can be inexact, should be taken with a grain of salt, and should only be looked at in comparison to a previous measurement on the same machine
Experiment 2 - niceness and priority
Moving forward, we stick to running the concurrent version of the program.
nice -n -1 ./ackermann_multi
# top cpu usage output
%Cpu(s): 99.8 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.2 si, 0.0 st
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2758 root 19 -1 103040 7028 800 R 198.0 0.2 0:05.98 ackermann_multi
./ackermann_multi
# top cpu usage output
%Cpu(s): 99.8 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.2 si, 0.0 st
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2765 root 20 0 103040 8608 772 R 200.0 0.2 0:12.75 ackermann_multi
nice -n 1 ./ackermann_multi
# top cpu usage output
%Cpu(s): 0.0 us, 0.2 sy, 99.7 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.2 si, 0.0 st
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2777 root 21 1 103040 11520 800 R 199.3 0.3 0:26.70 ackermann_multi
When commands run without a changed priority, they run at niceness 0 (priority 20). Possible niceness values range from -20 (least nice, highest priority, takes bigger share of CPU cycles) to 19 (most nice, lowest priority, takes less share of CPU cycles). The calculated priority value is niceness + 20
which translates to a range of 0 to 39.
I haven't determined a strong reason for differentiation of niceness and priority other than the perceived convenience of users reasoning about priority around the default value of niceness 0 (-20 to 19), and the perceived convenience of the scheduler implementation reasoning about non-negative priority values (0 to 39).
Other notes:
- In
top
CPU statistics, CPU usage from processes with niceness -20 to 0 (priority 0 to 20) are recorded underus
- In
top
CPU statistics, CPU usage from processes with niceness 1 to 19 (priority 21 to 39) are recorded underni
- Running
nice -n 20 <command>
(niceness too high) simply sets the niceness to 19 before executing the command - Running
nice -n -21 <command>
(niceness too low) simply sets the niceness to -20 before executing the command
Experiment 3 - niceness in action
The previously reported execution time of ~35 seconds (on my machine) applies at any niceness/priority when the program runs alone.
From 2 terminal windows, I ran the programs almost simultaneously (start one program, go to the other window, start the other one) so that one run would be at the highest possible niceness, and the other would be at the lowest possible niceness.
time nice -n -20 ./ackermann_multi
...
real 0m35.950s # This is a similar execution time as a normal run that can get the CPU time for itself
user 1m11.180s # This highlights the fact that the amount of user-space computation time is nearly twice as much (2 CPU cores)
sys 0m0.172s
time nice -n 19 ./ackermann_multi
...
real 1m11.860s # Given the above did not finish any slower than it usually does, this program spent that 35 seconds without any CPU time
user 1m11.642s # Similar amount of computation time as the run above
sys 0m0.086s
I tried both variations (starting -20 before 19, starting 19 before -20) and observed a similar pattern:
- At the moment of starting the program that will use niceness -19, the output of the program running at niceness 20 was NOT observed (or stopped being observed) for even the simplest computations until the program with niceness -19 was complete
- With both programs running, the CPU summary shows the overwhelming share of usage under
us
for the first 35 seconds, thenni
for the next 35 seconds
Trying the same thing with niceness -1 vs 0, and 0 vs 1...
Niceness 0 and 1 (1 before 0 and 0 before 1)
Without much difference between the two, this was my observation:
time nice -n 1 ./ackermann_multi
...
real 1m8.074s
user 1m7.557s
sys 0m0.131s
time ./ackermann_multi
...
real 1m2.088s
user 1m7.567s
sys 0m0.093s
This was the output from top
throughout the run:
%Cpu(s): 55.2 us, 0.5 sy, 44.3 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3066 root 20 0 103040 8616 776 R 109.9 0.2 0:12.24 ackermann_multi
3062 root 21 1 103040 8344 788 R 89.1 0.2 0:10.98 ackermann_multi
The top
output very clearly shows the share of CPU usage percentage between the program of niceness 0 (considered "unniced", us
) vs niceness 1 (considered "niced", ni
) with a split of ~55% vs ~45%.
At the process level (summation of percentages per thread), the run with niceness 0 was shown to take 100-120% CPU (showing 110% most of the time) and, conversely, the run with niceness 1 was shown to take 80-100% CPU (showing 90% most of the time).
Niceness -1 and 0 (0 before -1 and -1 before 0)
As expected, the only difference of testing with -1 and 0 (as opposed to 0 and 1) is that both are considered "unniced" and CPU usage are both registered under us
.
%Cpu(s):100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3029 root 19 -1 103040 8088 776 R 109.9 0.2 0:09.65 ackermann_multi
3025 root 20 0 103040 8352 800 R 90.1 0.2 0:08.76 ackermann_multi
The split of CPU usage summation in threads was also ~110% vs ~90% (~55% + ~45% = ~100% grouped under us
).