Using data from spreadsheets in Fedora with Python

Posted by Paul W. Frields 

Python is one of the most popular and powerful programming languages available. Because it’s free and open source, it’s available to everyone — and most Fedora systems come with the language already installed. Python is useful for a wide variety of tasks, but among them is processing comma-separated value (CSV) data. CSV files often start off life as tables or spreadsheets. This article shows how to get started working with CSV data in Python 3.

CSV data is precisely what it sounds like. A CSV file includes one row of data at a time, with data values separated by commas. Each row is defined by the same fields. Short CSV files are often easily read and understood. But longer data files, or those with more fields, may be harder to parse with the naked eye, so computers work better in those cases.

Here’s a simple example where the fields are NameEmail, and Country. In this example, the CSV data includes a field definition as the first row, although that is not always the case.

John Q. Smith,,USA
Petr Novak,,CZ
Bernard Jones,,UK

Reading CSV from spreadsheets

Python helpfully includes a csv module that has functions for reading and writing CSV data. Most spreadsheet applications, both native like Excel or Numbers, and web-based such as Google Sheets, can export CSV data. In fact, many other services that can publish tabular reports will also export as CSV (PayPal for instance).

The Python csv module has a built in reader method called DictReader that can deal with each data row as an ordered dictionary (OrderedDict). It expects a file object to access the CSV data. So if our file above is called example.csv in the current directory, this code snippet is one way to get at this data:

f = open('example.csv', 'r')
from csv import DictReader
d = DictReader(f)
data = []
for row in d:

Now the data object in memory is a list of OrderedDict objects :

[OrderedDict([('Name', 'John Q. Smith'),
               ('Email', ''),
               ('Country', 'USA')]),
  OrderedDict([('Name', 'Petr Novak'),
               ('Email', ''),
               ('Country', 'CZ')]),
  OrderedDict([('Name', 'Bernard Jones'),
               ('Email', ''),
               ('Country', 'UK')])]

Referencing each of these objects is easy:

>>> print(data[0]['Country'])
>>> print(data[2]['Email'])

By the way, if you have to deal with a CSV file with no header row of field names, the DictReader class lets you define them. In the example above, add the fieldnames argument and pass a sequence of the names:

d = DictReader(f, fieldnames=['Name', 'Email', 'Country'])

A real world example

I recently wanted to pick a random winner from a long list of individuals. The CSV data I pulled from spreadsheets was a simple list of names and email addresses.

Fortunately, Python also has a helpful random module good for generating random values. The randrange function in the Random class from that module was just what I needed. You can give it a regular range of numbers — like integers — and a step value between them. The function then generates a random result, meaning I could get a random integer (or row number!) back within the total number of rows in my data.

So this small program worked well:

from csv import DictReader
from random import Random

d = DictReader(open('mydata.csv'))
data = []
for row in d:

r = Random()
winner = data[r.randrange(0, len(data), 1)]
print('The winner is:', winner['Name'])
print('Email address:', winner['Email'])

Obviously this example is extremely simple. Spreadsheets themselves include sophisticated ways to analyze data. However, if you want to do something outside the realm of your spreadsheet app, Python may be just the trick!


Leave a Reply

Your email address will not be published. Required fields are marked *