Everybody Loves CSVs

Clients love CSV files. Well, clients love Excel, and if you give the option of either importing or exporting data in a format Excel can handle, they will take it. So I end up fighting with CSV files quite a bit. On the output side, I put together a Django package that makes creating CSV views easy. On the input side, I was playing around with importing a bunch of data from CSV files and, inspired by a couple of talks I recently watched on pyvideo.org, I realized I could put together some handy re-usable helper functions for importing csv data. I just included the result in my django helper collection because it should be useful in a lot of projects.

The basic idea is to write a generator that looks at a CSV, figures out what the headers are from the first row, and creates a Row named tuple with appropriately named keys. It then simply yields a new Row instance for each row in the csv. The cool thing about this is that you can then access items in your row by name instead of by index, which makes life much less confusing.

I broke it up into two functions: one that handles the list of row data itself:

def generate_rows(f):
    spaces = re.compile(r'\s')
    headers = tuple(re.sub(spaces, '', i) for i in next(f, tuple()))
    Row = namedtuple('Row', headers)
    for line in f:
        yield Row(*(i.strip() for i in line))

And then a wrapper around that to take care of csv file opening stuff:

def generate_csv_rows(csv_path):
    with open(csv_path, 'rU') as f:
        reader = csv.reader(f)
        for line in generate_rows(reader):
            yield line

So say you have a file, data.csv that looks something like this:

last,first
Smajda,Jon

You can use generate_csv_rows like this:

for row in generate_csv_rows('data.csv'):
    print("Hello {first} {last}!".format(**row._asdict()))