Bank Statement Automation

I've been trying over the last couple of years to get my financial life in order, and part of that has meant finding bank statements from the different providers and ensuring that they're filed away on the off chance that I need them (surprise: I needed them). To do that, I've spent some time building up a small library of Hazel scripts, and I thought I'd share them with the world now.

Hazel is an amazing tool, but I'm not going to re-hash how awesome it is here. You can find a myriad of articles waxing poetic about Hazel. I'm here to share some of the rules and related scripts that I've been generating over the last while, in hopes that maybe someone else will find some use out of them.

These screenshots should help get Hazel set up and monitoring your folders properly.

Screenshot showing Hazel folders

Screenshot showing Hazel rules

Note of course that this doesn't work with every single provider; for example, I have a credit card with Lowe's (they give a pretty good discount) that can't get the date of the statement, and so fails. Even trying with OCR (tesseract, etc) the contents of the file are simply too difficult to work with reliably. It would be awesome if there were some kind of requirement for all of the files generated to have some specific formatting (at least around provider name, account number, statement date, etc), but that seems as unlikely as me winning the lottery.

I've made a change to the way dates are handled that should hopefully work to determine the statement date. Essentially, we attempt to parse anything with numbers to see if it looks like a date, and then work from there. It worked on the Lowe's statement from above but also on a Discover statement which had also proven difficult.

There's a provider_map in the code below that will allow you to override the found name of the statement provider; if you find that the provider found in the file is not to your liking, you can set an additional entry, and manually re-run the classifier below to update the path. There are also failsafes, so that (for example) either the provider or the statement date cannot be found, the file will be moved, and a record added to a failure log file so that the the statement can be researched further.

Finally, this tool requires pdftotext and terminal-notifier - pdftotext (I believe) comes installed on MacOS by default; terminal-notifier is a ruby gem that gives system notifications, and can be installed via gem - I'm open to improvements to the system.

So, without further ado...

#!/bin/bash

shopt -s nocasematch
set -e
set -x

dest=~/Documents/Personal/Statements
error_log="$dest/fail.log"

## Bash 3 hack for not having associative arrays
provider_map=(
    "PennyMacUSA.com:Penny Mac"
    "lowes:Lowes"
)

dryrun=0
provided_file="$1"
shift

while [[ $# -gt 0 ]]; do
    key="$1"

    case $key in
        -d|--dryrun)
            dryrun=1
            shift # past argument
        ;;
        -v|--verbose)
            set -v
            shift
        ;;
        *)    # unknown option
            echo "Unsupported argument ${key}"
            exit 1
        ;;
    esac
done

# Try to get the provider name from the contents of the file
pdftextcontents=$( pdftotext -f 1 "$provided_file" - 2>/dev/null )

function log_and_exit()
{
    message=$1
    provided_file=$2
    dest=$3

    if [ $dryrun -ne 1 ]; then
        mv "$provided_file" "$dest"
    fi

    echo "[ $( date +"%F %H:%M:%S" ) ] Couldn't determine ${message} in ${provided_file}; moving to ${dest} and bailing" | tee -a $error_log | terminal-notifier -title "Error handling bank statement ${provided_file}" -timeout 60
    echo "$pdftextcontents" | tee -a $error_log
    echo "" | tee -a $error_log
    echo "------------------------------------------------------------------------------------------" | tee -a $error_log

	exit 1
}

function provider_exit()
{
    provided_file=$1
    dest=$2

    log_and_exit "provider" "$provided_file" "$dest"
}

function statement_exit()
{
    provided_file=$1
    dest=$2
    found_date=$3

    if [ ! -z "$found_date" ]; then
        echo "Found date ${found_date}, which is invalid"
    fi
    log_and_exit "statement date" "$provided_file" "$dest"
}

provider=$( echo "$pdftextcontents" | grep 'www\.' | head -n 1 )

# No provider found; sometimes, the first line of the pdf
# contains a provider that we can use
if [ -z "$provider" ]; then
    provider=$( echo "$pdftextcontents" | head -n 1 )
fi

if [ -z "$provider" ]; then
    provider_exit "$provided_file" "$dest"
fi

# This allows us to change the mapping of a provider from
# what may have been found to something more helpful to us
for mapping in "${provider_map[@]}"; do
    _found_provider=${mapping%%:*}
    _mapped_provider=${mapping#*:}

    if [[ "$provider" == *"${_found_provider}"* ]]; then
        provider="${_mapped_provider}"
        break
    fi
done

dest="$dest/${provider}"
mkdir -p "$dest"/

## this code attempts to parse any value that comes in as a date
## anything that fails generates no output. any date that is in
## the future generates no output. only things that are in the
## past will generate anything
statement_dt=$( echo "${pdftextcontents}" | tr ' ' '\n' | sort | uniq | grep '\d' | grep -E '[/-]' | xargs -I{} php -r 'try { $f = (new \DateTime("{}")); $d = $f->diff((new \DateTime())); if ($d->days > 0 && $d->invert == 1) { } else { echo $f->format("Y-m\n"); } } catch (\Exception $e) { }' | sort -hr | head -n 1 );

if [ -z "$statement_dt" ]; then
    statement_exit "$provided_file" "$dest"
fi

formatted_dt=$( echo "${statement_dt}" | php -r '$dt=fgets(STDIN); echo date("Y-m", strtotime($dt));' )

if [ $dryrun -ne 1 ]; then
    mkdir -p "$dest"/

    mv "$provided_file" "$dest"/"$formatted_dt".pdf
else
    echo "Would mv $provided_file $dest/$formatted_dt.pdf"
fi