Rule-based automation of gnucash transaction source detection

May 29, 2011 at 8:57 pm Leave a comment

I’ve decided to start using gnucash to track my accounts (as an alternative to nothing).

An issue that arises when using any piece of accounting software is the input of data. This is somewhat simplified by companies providing you with electronic records, but one still has the problem of assigning each transaction to a category.

Gnucash has some inbuilt approaches for this. For ofx files it has a bayesian filter than can guess which account a transaction is likely to involve (e.g income, interest, auto expenses) based on your previous assignments. However, I’m afraid that such an approach will be wrong quite a lot. Therefore I’ve hacked up an alternative approach that allows one to define a set of rules and match based on them.

Here is some code:


# This program is distributed under a GPL3 license.
# The author should be "Arg and gah and ap and pa"

"""Open a gnu cash file then find transactions which come from or go to
an unknown source and apply a set of user defined rules to guess this
source.

Caveats:
 * This approach is intrinsically prey to tweaks in gnucash's xml format.
   This is probably unlikely to happen, the program is simple enough that
   it would be easy to repair. A better approach would be to implement this
   in gnucash, but this has too large a fixed cost for me to consider at the
   moment.
 
 * There are already inbuilt features of gnucash to do this but 
   these don't do quite what I want
    * A bayesian mapper. I don't feel like using a bayesian mapper here 
      because it is liable to mismap requiring me to check results. I'm 
      sure for some people this works fine..
 
    * Qif support for account mapping:
       My understand is that in qif you map each source once and 
       it then remembers these sources. However my bank won't send me qif.
"""

# CONFIGURE BY HAND 
GNUCASH_FILE = 'play'
def match_account(transaction):
    """Matching function to be tweaked. 

    This takes the details of an unmapped transaction and decides 
    which account the transaction came from or should go to."""

    if transaction.description.startswith('INT EARNED'):
        return 'Interest Income'
    elif transaction.description.startswith('SAVE THE CHILDREN'):
        return 'Salary'
    else:
        return None
# END CONFIGURE BY HAND 

from contextlib import closing
import datetime
import logging
import gzip
import shutil
import os
import lxml.etree as E
from lxml.etree import tostring

from logging import debug, info

run_time = datetime.datetime.now().isoformat()

logging.basicConfig(
    filename='%s-unbalanced-remap.log%s' % (GNUCASH_FILE, run_time),
    level=logging.DEBUG)

NAMESPACES = {
     'gnc': "http://www.gnucash.org/XML/gnc",
     'act': "http://www.gnucash.org/XML/act",
     'slot': "http://www.gnucash.org/XML/slot",
     'split': "http://www.gnucash.org/XML/split",
     'trn': "http://www.gnucash.org/XML/trn"
}

def xpath(el, xpath):
    return el.xpath(xpath, namespaces=NAMESPACES)

def fetch_account_ids(tree):
    accounts = {}
    for xml_acc in xpath(tree, '//gnc:account'):
        name, = xpath(xml_acc, 'act:name/text()')
        id, = xpath(xml_acc, 'act:id/text()') 
        if name in accounts:
            raise Exception('%r is duplicated' % (name,))
        accounts[name] = id
    return accounts

shutil.copy(GNUCASH_FILE, '%s-backup-%s' % (GNUCASH_FILE, run_time))
with closing(gzip.open(GNUCASH_FILE)) as stream:
    tree = E.XML(stream.read())

account_ids = fetch_account_ids(tree)
error_id = account_ids['Imbalance-GBP']

class XmlTransaction(object):
    def dump_xml(f):
        def patched(self, *args, **kwargs):
            try:
                return f(self, *args, **kwargs)
            except Exception:
                print tostring(self.xml, pretty_print=True)
                raise
        return patched
    
    def __init__(self, xml):
        self.xml = xml

    @dump_xml
    def is_imbalance(self):
        if len(xpath(self.xml, './/trn:split')) != 2:
            return False
        else:
            return len(xpath(self.xml, './/trn:split[split:account/text()'
                ' = "%s"]/split:value/text()' % error_id)) == 1

    @dump_xml
    def get_amount(self):
        amount, = xpath(self.xml, './/trn:split[split:account/text() = "%s"]/split:value/text()' % error_id)
        return amount

    @dump_xml
    def get_notes(self):
        return xpath(self.xml, './/slot[slot:key/text()="notes"]/text()')

    @dump_xml
    def set_account_id(self, id):
        account_xml, = xpath(self.xml, './/trn:split[split:account/text() = "%s"]/split:account' % error_id)
        account_xml.text = id

    @dump_xml
    def get_description(self):
        description, = list(xpath(self.xml, 'trn:description/text()')) or ['']
        return description

class TransactionDetails(object):
    """Parse those details of an account that relevant to us"""
    def __init__(self, xml):
           
        self.xml = xml
        info = XmlTransaction(xml)
        self.description = info.get_description()
        self.is_imbalance = info.is_imbalance()
        self.notes = info.get_notes()
        if self.is_imbalance:
            self.amount = info.get_amount()
        else:
            self.amount = None

    def __repr__(self):
        return ('' %
            (self.description, self.amount, self.notes))
        
# Main loop
for xml_transaction in xpath(tree, '//gnc:transaction'):
    details = TransactionDetails(xml_transaction)
    if not details.is_imbalance:
        debug('Ignoring %s. Not imbalance.' % details)
        continue
    else:
        account_name = match_account(details)
        if account_name is None:
            info("Could not match: %r" % details)
        else:
            try:
                account_id = account_ids[account_name]
            except KeyError:
                print 'Valid account names: %r ' % (sorted(account_ids.keys()),)
                raise
            XmlTransaction(xml_transaction).set_account_id(account_id)
            debug('Remapping transaction %s to %s(%s)' % (details, account_name, account_id))
        
with closing(gzip.open(GNUCASH_FILE, 'w')) as f:
    f.write('\n')
    f.write(tostring(tree, pretty_print=True))


Advertisements

Entry filed under: Uncategorized. Tags: , .

Improved python dir R cheat sheet

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Trackback this post  |  Subscribe to the comments via RSS Feed


May 2011
M T W T F S S
« Apr   Jun »
 1
2345678
9101112131415
16171819202122
23242526272829
3031  

%d bloggers like this: