Getting file md5 checksums in different OSs
Generating md5 file checksums vary from one OS to another, and because I like developing in varying environments, I need solid equivalences.
For cache breaking I like to use MD5 hashes of files. At the Westwing magazine we had some javascript bundles that we change very seldomly, so a timestamp on each build is too aggressive cache break. We also use file checksums for font bundles.
For this we have a bash script that will create the hash from the bundled files, so, if nothing has changes, we avoid unnecessary requests to our users.
We used to have the projects inside Vagrant Boxes, so all scripts run in linux environment, no problem. But as we moved to Docker (we love it) and we do frontend development directly on our machines, some things changed. For instance the md5sum function, in mac OSX it’s called just md5 and the output is slightly different. Because of the output an alias for md5sum will not work.
# Linux
md5sum Gruntfile.js
7593278019c7726f7271904cd7dda73a Gruntfile.js
# mac OSX
bc. md5sum Gruntfile.js
MD5 (Gruntfile.js) = 7593278019c7726f7271904cd7dda73a
# win 10 with Cmder
md5sum Gruntfile.js
7593278019c7726f7271904cd7dda73a *Gruntfile.js
To deal with this we need 3 things:
1. Check the OS is mac
2. Filter out the hash from the output
3. Store on a variable
To figure out which system the script is running into we can use uname
, and with that we can choose syntax. For detecting OS I found the following script in Stack Overflow
#!/usr/bin/env bash
ENVIRONMENT = "unknown"
if [ "$(uname)" == "Darwin" ]; then
# Do something under Mac OS X platform
ENVIRONMENT = "mac"
elif [ "$(expr substr $(uname -s) 1 5)" == "Linux" ]; then
# Do something under GNU/Linux platform
ENVIRONMENT = "linux"
elif [ "$(expr substr $(uname -s) 1 10)" == "MINGW32_NT" ]; then
# Do something under Windows NT platform
ENVIRONMENT = "win"
fi
To get the hash we can use regex with sed
, or even simpler, we use cut
, which is a little like String.prototype.split()
in JS or explode()
in PHP, it. The main important difference, is the delimiter is only one byte character (ASCII only) and it returns a list of fields, not an array, but you can access it’s elements with -f using 1 based index. It is not an array but for this it helps to see it that way.
To get just md5sum
we pipe ( |
) the output to cut
, split on a space character and get the first field.
md5sum Gruntfile.js | cut -d ' ' -f 1
to store this in a var we to do a Command Substitution so the output can be stored on a variable.
GRUNTHASH=$( md5sum Gruntfile.js | cut -d ' ' -f 1 )
Will output: 7593278019c7726f7271904cd7dda73a
Storing the hash
Finally we can store it to a file as a constant.
echo "<?php define(<'HASH_GRUNT', '$GRUNTHASH'); ?>" > app/path/hashes.php;
Putting it all together
We can build a simple function for hashing.
#!/bin/bash
get_md5_checksum () {
if [ "$(uname)" == "Darwin" ]; then
THEHASH=($(md5 $1 | cut -d= -f2 | cut -d " " -f2))
else
THEHASH=($(md5sum $1 | cut -d ' ' -f 1))
fi
}
get_md5_checksum Gruntfile.js
echo $THEHASH
This will output the hash of the file either on mac, or linux.
Epilogue
This seemingly simple task taught me a whole bunch of things about bash and bash scripts. I feel way more powerful now with this new set of tools.
How to md5 a file in PowerShell ?
Get-FileHash <filepath> -Algorithm MD5
Yeah, you can add that to the mix, if you are up to the task.