June 4, 2013
High CPU usage on DB host?
Its Unix 101 statement, but I have heard it wrong so many time, so decided to put a blog for it.
If in OLTP environment, per ‘top’ %wa (IOWAIT) is major contributor for CPU busy, adding CPU would not help. There is no need for more CPU. Period.
************** %wa IS PERCENTAGE WHAT COUNT TO CPU IDLE ******************
Simple test.
Push some IO:
user1@myhost:~$ dd if=/dev/zero of=/tmp/file1 conv=notrunc bs=1000 count=3000000 &
[1] 31240
user1@myhost:~$ dd if=/dev/zero of=/tmp/file2 conv=notrunc bs=1000 count=3000000 &
[2] 31241
user1@myhost:~$ dd if=/dev/zero of=/tmp/file3 conv=notrunc bs=1000 count=3000000 &
[3] 31242
user1@myhost:~$ dd if=/dev/zero of=/tmp/file4 conv=notrunc bs=1000 count=3000000 &
[4] 31243
top looks:
user1@myhost:~$ top -b -i
top - 23:05:42 up 8:37, 12 users, load average: 4.36, 3.91, 6.28
Tasks: 239 total, 5 running, 230 sleeping, 0 stopped, 4 zombie
Cpu(s): 3.1%us, 20.5%sy, 0.0%ni, 12.9%id, 63.3%wa, 0.0%hi, 0.3%si, 0.0%st
Mem: 4080460k total, 3809420k used, 271040k free, 1580k buffers
Swap: 4145148k total, 104240k used, 4040908k free, 1824928k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
31240 amoseyev 20 0 4376 588 496 D 2 0.0 0:12.29 dd
31241 amoseyev 20 0 4376 588 500 D 2 0.0 0:12.32 dd
31242 amoseyev 20 0 4376 592 500 D 2 0.0 0:12.38 dd
31243 amoseyev 20 0 4376 592 500 D 1 0.0 0:11.50 dd
%wa is high. iostat consistently giving write performance about 44MB/sec:
user1@myhost:~$ iostat 1 1000
avg-cpu: %user %nice %system %iowait %steal %idle
1.76 0.00 12.09 50.13 0.00 36.02
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 2111.00 8352.00 45668.00 8352 45668
As shown above, dd process what causes I/O load almost always in “D” state, what is uninterruptible sleep.
“uninterruptible” comes from fact what they cant be killed, as process is in kernel mode (IO call has to be done in kernel mode).
Its uninterruptible but still SLEEP. Its idle process. It does not block CPU. If any other thread would require CPU (either for number cranching or for another I/O call), schedule would put it on CPU while dd is in sleep.
But if no other CPU load is available, top counts next idle CPU cycle as %wa.
Now push some real CPU load:
user1@myhost:~$ cat /dev/urandom > /dev/null &
[1] 31224
user1@myhost:~$ cat /dev/urandom > /dev/null &
[2] 31225
user1@myhost:~$ cat /dev/urandom > /dev/null &
[3] 31229
user1@myhost:~$ cat /dev/urandom > /dev/null &
[4] 31231
user1@myhost:~$
user1@myhost:~$ top -b -i
top - 23:19:16 up 8:50, 12 users, load average: 7.84, 7.15, 7.10
Tasks: 239 total, 6 running, 229 sleeping, 0 stopped, 4 zombie
Cpu(s): 0.8%us, 98.1%sy, 0.0%ni, 0.0%id, 0.5%wa, 0.0%hi, 0.6%si, 0.0%st
Mem: 4080460k total, 3838860k used, 241600k free, 2168k buffers
Swap: 4145148k total, 104240k used, 4040908k free, 2264144k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
31279 amoseyev 20 0 4220 544 456 R 96 0.0 0:16.22 cat
31281 amoseyev 20 0 4220 544 456 R 87 0.0 0:13.96 cat
31280 amoseyev 20 0 4220 544 456 R 83 0.0 0:15.20 cat
31278 amoseyev 20 0 4220 540 456 R 80 0.0 0:16.01 cat
31241 amoseyev 20 0 4376 588 500 D 2 0.0 0:14.29 dd
31242 amoseyev 20 0 4376 592 500 D 2 0.0 0:14.26 dd
31240 amoseyev 20 0 4376 588 496 D 1 0.0 0:14.17 dd
31243 amoseyev 20 0 4376 592 500 D 1 0.0 0:13.34 dd
%wa went to almost 0. %sy is close to 100%.
So when CPU spends its cycles on real load, it does not addup to %wa. And same time, I/O throughput did not change with CPU been 100% busy:
user1@myhost:~$ iostat sda 1 1000
avg-cpu: %user %nice %system %iowait %steal %idle
1.50 0.00 98.50 0.00 0.00 0.00
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 1415.00 5376.00 44500.00 5376 44500
What again proves what OLTP I/O does not need that much CPU.
The “D” process also add up to load average, so its its also not the best value to judge how busy CPU is.
All said above would be applicable for for OLTP, A lot of small I/Os. NAS, SAN. RAW/block devices, or file system. All of them.
If we are talking about some crazy 1+GB/sec full table scans in OLAP/dw world, CPU probably would be affected, especially if its NFS (and not direct NFS). But it will be mostly on %sy and %si (not %wa), as ethernet traffic would be handled thru soft interrupts, and with high throughput, its CPU intensive. Context switches may also add up to CPU on some platforms, where they still used for switching between user/kernel modes.