Testing tools that measure process memory usage
The problem
You want to understand how much memory a process uses so you can allocate resources
appropriately. The tricky thing is if subprocesses are used, depending on how they’re
used, the tracking tool might not “see” all memory used by all child processes, but this
isn’t always obvious. For me, I was using /usr/bin/time -v
, which reports Maximum resident set size (kbytes)
. I was getting a figure of a few gigabytes, so it could’ve
been the total memory used. Turns out it wasn’t (real usage was much higher) but I had to
use this steps in this blog to figure it out.
The solution
First we need a program that will use a known amount of memory. Then we can run a memory tracker on it and know if the reported memory usage is what we expect.
Here’s a nodejs script that will use 2GB of memory:
// Create a large array to consume memory
const memoryConsumer = []
const targetSize = 2 * 1024 * 1024 * 1024 // 2GB in bytes
const chunkSize = 10 * 1024 * 1024 // Increased to 10MB chunks for faster allocation
console.log('Starting memory allocation...')
const startTime = Date.now()
let allocatedBytes = 0
while (allocatedBytes < targetSize) {
// Create a buffer of 10MB and store it in the array
memoryConsumer.push(Buffer.alloc(chunkSize, 13))
allocatedBytes += chunkSize
// Log progress every ~256MB
if (allocatedBytes % (256 * 1024 * 1024) < chunkSize) {
const allocatedGB = allocatedBytes / (1024 * 1024 * 1024)
const elapsedSeconds = (Date.now() - startTime) / 1000
console.log(`Allocated: ${allocatedGB.toFixed(2)} GB in ${elapsedSeconds.toFixed(2)} seconds`)
}
}
const totalTime = (Date.now() - startTime) / 1000
console.log(`Memory allocation complete in ${totalTime.toFixed(2)} seconds!`)
//---------- OPTION 1 ------------------------------
// use this if you want the program to exit after the memory is allocated
process.exit()
//---------- OPTION 2 ------------------------------
// ...or, use this code if you want the program to wait to be killed
// console.log('Press Ctrl+C to exit')
// // Keep the program running
// setInterval(() => {}, 1000)
//--------------------------------------------------
If we save that to mem.js
and run it with memory tracking:
$ /usr/bin/time -v node mem.js
...
Allocated: 2.00 GB in 0.40 seconds
Memory allocation complete in 0.40 seconds!
Command being timed: "node mem.js"
...
Maximum resident set size (kbytes): 2141892
…we can see that we used the expected amount of memory. This isn’t surprising, but it’s good to test the obvious stuff first.
Now, if we make a script to run 3 instances of mem.js
in parallel:
# spawn.sh
echo "start"
( node mem.js ) &
( node mem.js ) &
( node mem.js ) &
echo "waiting"
wait
echo "done"
…and run that with the same approach as above:
$ /usr/bin/time -v bash spawn.sh
start
waiting
Starting memory allocation...
Starting memory allocation...
Starting memory allocation...
Allocated: 0.25 GB in 0.07 seconds
...
Memory allocation complete in 0.81 seconds!
done
Command being timed: "bash spawn.sh"
...
Maximum resident set size (kbytes): 2140492
…we still only see 2GB of max memory usage, even though we had 3 processes in parallel
so we’d expect ~6GB of memory usage. We need something else to track memory usage, so
here’s a track.sh
bash script that can do that:
#!/bin/bash
# watch memory usage of processes attached to this terminal, writing the results to a CSV
logFile="${LOG_FILE:-log.csv}"
echo "totalMemKb,timestamp,separateMemValues" > "$logFile"
awkProg=$(cat << "HEREDOC"
{
if (NR>1) {
sum+=$1; values=values $1 " "
}
}
END {
print sum "," now "," values
}
HEREDOC
)
while true; do
ps T -o rss | awk -v now="$(date +%s)" "$awkProg" >> "$logFile"
sleep 0.1 # TODO use larger values, like "1" for longer running processes
done
Let’s run track.sh
:
bash track.sh & pid=$!; /usr/bin/time -v bash spawn.sh; kill $pid
…and we can see /usr/bin/time
still reports 2GB of usage. Let’s look in the log file
created by track.sh
:
$ cat log.csv
totalMemKb,timestamp,separateMemValues
124960,1739833328,9608 3592 1148 3448 31488 32552 35832 3440 3852
1510676,1739833328,9608 3608 1148 3448 489480 499688 496356 3484 3856
2940312,1739833328,9608 3612 1148 3448 964352 977024 973728 3456 3936
4146308,1739833328,9608 3612 1148 3448 1410600 1355864 1354748 3452 3828
4808540,1739833328,9608 3612 1148 3448 1629192 1579472 1574560 3472 4028
5456184,1739833328,9608 3612 1148 3448 1846464 1798592 1785936 3440 3936
6112552,1739833328,9608 3612 1148 3448 2066640 2018768 2001888 3424 4016
25204,1739833329,9608 3612 1148 3448 0 0 0 3440 3948
…and we can see the max recorded memory usage is 6112552
KB, which is more like what
we expected.
The key part of track.sh
is ps T
, where the T
flag means:
$ ps --help s
...
T all processes on this terminal
…that is, we track everything spawned from this terminal, including all child processes. The tool writes a CSV so you can chart your memory usage over time, but if you only care about the max figure recorded, you can get that with:
awk -F ',' 'NR>1 && max < $1 { max = $1 } END { print max }' log.csv
Take note that track.sh
isn’t 100% accurate because if the memory usage spikes up then
down in between samples (faster than the sleep
time), then we won’t record that higher
number. It’s good enough for my purposes though.