How to treat lines of a file as commands and execute in parallel
The problem
I had to copy a lot of files of varying sizes (a few KB up to hundreds of GB). Copying in
parallel is faster so I need to chunk the large files and copy the chunks in parallel. I
solved this by reading the size of each file and dynamically generating commands for all
the required chunks, then executing the commands in parallel with a managed max
concurrency using xargs
. Each line in the file actually contained a few commands, which
makes it slightly harder. An example file:
# commands.txt
echo "starting file A chunk 1" && some-cmd A 1 && echo "successfully copied file A chunk 1" || exit 255
echo "starting file A chunk 2" && some-cmd A 2 && echo "successfully copied file A chunk 2" || exit 255
echo "starting file B chunk 1" && some-cmd B 1 && echo "successfully copied file B chunk 1" || exit 255
...
Note: I’ve got the || exit 255
because that will cause xargs
to fail-fast instead of
finishing all runs and then reporting a failure.
The easy solution
Using parallel
is the easiest solution, it “just works”. You can pipe the commands in
and set your concurrency and you’re done:
parallel -j8 < commands.txt
Using xargs
I didn’t have parallel
installed, and although I could install it, I was trying to avoid
it. I did have xargs
installed and that can handle it. You need to set some params
though:
xargs -n1 --delimiter='\n' -P8 bash -c < commands.txt
-n1
invokes the commands once for each arg. Without thisxargs
will try to supply multiple args to the command--delimiter='\n'
args are separated by newlines. By default it’s spaces, so we need to set this to read the entire line as one arg-P8
run at most 8 concurrent commandsbash -c
run the arg withbash
. If you don’t set a command, it defaults toecho
. The arg will be supplied as an argument, not via stdin, so we need to tell bash that it’s getting the commands as a param:-c