Testing tools that measure process memory usage

The problem

You want to understand how much memory a process uses so you can allocate resources appropriately. The tricky thing is if subprocesses are used, depending on how they’re used, the tracking tool might not “see” all memory used by all child processes, but this isn’t always obvious. For me, I was using /usr/bin/time -v, which reports Maximum resident set size (kbytes). I was getting a figure of a few gigabytes, so it could’ve been the total memory used. Turns out it wasn’t (real usage was much higher) but I had to use this steps in this blog to figure it out.

The solution

First we need a program that will use a known amount of memory. Then we can run a memory tracker on it and know if the reported memory usage is what we expect.

Here’s a nodejs script that will use 2GB of memory:

// Create a large array to consume memory
const memoryConsumer = []
const targetSize = 2 * 1024 * 1024 * 1024 // 2GB in bytes
const chunkSize = 10 * 1024 * 1024        // Increased to 10MB chunks for faster allocation

console.log('Starting memory allocation...')
const startTime = Date.now()

let allocatedBytes = 0

while (allocatedBytes < targetSize) {
  // Create a buffer of 10MB and store it in the array
  memoryConsumer.push(Buffer.alloc(chunkSize, 13))
  allocatedBytes += chunkSize

  // Log progress every ~256MB
  if (allocatedBytes % (256 * 1024 * 1024) < chunkSize) {
    const allocatedGB = allocatedBytes / (1024 * 1024 * 1024)
    const elapsedSeconds = (Date.now() - startTime) / 1000
    console.log(`Allocated: ${allocatedGB.toFixed(2)} GB in ${elapsedSeconds.toFixed(2)} seconds`)
  }
}

const totalTime = (Date.now() - startTime) / 1000
console.log(`Memory allocation complete in ${totalTime.toFixed(2)} seconds!`)
//---------- OPTION 1 ------------------------------
// use this if you want the program to exit after the memory is allocated
process.exit()
//---------- OPTION 2 ------------------------------
// ...or, use this code if you want the program to wait to be killed
// console.log('Press Ctrl+C to exit')
// // Keep the program running
// setInterval(() => {}, 1000)
//--------------------------------------------------

If we save that to mem.js and run it with memory tracking:

$ /usr/bin/time -v node mem.js
...
Allocated: 2.00 GB in 0.40 seconds
Memory allocation complete in 0.40 seconds!
  Command being timed: "node mem.js"
  ...
  Maximum resident set size (kbytes): 2141892

…we can see that we used the expected amount of memory. This isn’t surprising, but it’s good to test the obvious stuff first.

Now, if we make a script to run 3 instances of mem.js in parallel:

# spawn.sh
echo "start"
( node mem.js ) &
( node mem.js ) &
( node mem.js ) &
echo "waiting"
wait
echo "done"

…and run that with the same approach as above:

$ /usr/bin/time -v bash spawn.sh
start
waiting
Starting memory allocation...
Starting memory allocation...
Starting memory allocation...
Allocated: 0.25 GB in 0.07 seconds
...
Memory allocation complete in 0.81 seconds!
done
  Command being timed: "bash spawn.sh"
  ...
  Maximum resident set size (kbytes): 2140492

…we still only see 2GB of max memory usage, even though we had 3 processes in parallel so we’d expect ~6GB of memory usage. We need something else to track memory usage, so here’s a track.sh bash script that can do that:

#!/bin/bash
# watch memory usage of processes attached to this terminal, writing the results to a CSV
logFile="${LOG_FILE:-log.csv}"
echo "totalMemKb,timestamp,separateMemValues" > "$logFile"

awkProg=$(cat << "HEREDOC"
{
  if (NR>1) {
    sum+=$1; values=values $1 " "
  }
}

END {
  print sum "," now "," values
}
HEREDOC
)

while true; do
  ps T -o rss | awk -v now="$(date +%s)" "$awkProg" >> "$logFile"
  sleep 0.1  # TODO use larger values, like "1" for longer running processes
done

Let’s run track.sh:

bash track.sh & pid=$!; /usr/bin/time -v bash spawn.sh; kill $pid

…and we can see /usr/bin/time still reports 2GB of usage. Let’s look in the log file created by track.sh:

$ cat log.csv
totalMemKb,timestamp,separateMemValues
124960,1739833328,9608 3592 1148 3448 31488 32552 35832 3440 3852 
1510676,1739833328,9608 3608 1148 3448 489480 499688 496356 3484 3856 
2940312,1739833328,9608 3612 1148 3448 964352 977024 973728 3456 3936 
4146308,1739833328,9608 3612 1148 3448 1410600 1355864 1354748 3452 3828 
4808540,1739833328,9608 3612 1148 3448 1629192 1579472 1574560 3472 4028 
5456184,1739833328,9608 3612 1148 3448 1846464 1798592 1785936 3440 3936 
6112552,1739833328,9608 3612 1148 3448 2066640 2018768 2001888 3424 4016 
25204,1739833329,9608 3612 1148 3448 0 0 0 3440 3948

…and we can see the max recorded memory usage is 6112552 KB, which is more like what we expected.

The key part of track.sh is ps T, where the T flag means:

$ ps --help s
...
  T                   all processes on this terminal

…that is, we track everything spawned from this terminal, including all child processes. The tool writes a CSV so you can chart your memory usage over time, but if you only care about the max figure recorded, you can get that with:

awk -F ',' 'NR>1 && max < $1 { max = $1 } END { print max }' log.csv

Take note that track.sh isn’t 100% accurate because if the memory usage spikes up then down in between samples (faster than the sleep time), then we won’t record that higher number. It’s good enough for my purposes though.

comments powered by Disqus