This is a detailed tutorial on reading excel files in python. Learn to extract the data from an Excel Workbook or spreadsheet using the xlrd module.
Reading an Excel File in Python
Spreadsheets are used worldwide to store tabular data. You can easily read and write these files in python. Recently, I published an article on reading and writing CSV files. The spreadsheets are quite different from the CSV files. Therefore, I found another good module for specifically parsing the excel files.
In this tutorial, we’ll make use of the xlrd
module to read spreadsheets in python. So, you need to install this module first. Simply run the following command to install this module.
pip install xlrd
Example Excel File.
I’ll use the Excel File shown in the screenshot given below for an illustration of performing different reading operations on the file using the xlrd module in python.
This excel sheet is named demo.xlsx
Reading A Particular Cell Value
The following python code illustrates the use of the xlrd
module to read a particular cell value from the example excel sheet.
#Importing xlrd Module import xlrd #Open Workbook, Provide file path as argument to the function open_workbook() workBook = xlrd.open_workbook("demo.xlsx") #Select the desired sheet from the workbook #sheet_by_index method fetches the sheets using numerical index values, starting from 0 spreadSheet = workBook.sheet_by_index(0) #Fetching Cell Value at Location 0, 0 cell = spreadSheet.cell_value(0, 0) #Printing the Cell Value print(cell)
This way, you can fetch the value of any particular cell from any particular excel sheet of an Excel Workbook.
Extracting the Number of Rows & Columns
This module also allows you to find the total number of rows and columns of the spreadsheet in an excel workbook. The following example illustrates this concept.
#Importing xlrd Module import xlrd #Open Workbook, Provide file path as argument to the function open_workbook() workBook = xlrd.open_workbook("demo.xlsx") #Select the desired sheet from the workbook #sheet_by_index method fetches the sheets using numerical index values, starting from 0 spreadSheet = workBook.sheet_by_index(0) #Fetching No. of Rows rows_count = spreadSheet.nrows #Fetching No. of Columns columns_count = spreadSheet.ncols #Printing the Values print("No. of Rows: " + str(rows_count)) print("No. of Columns: " + str(columns_count))
Extracting Column Names
You can easily extract the name of all columns using the cell_value()
function used in the first example by doing iteration over the number of columns. The following example illustrates the same.
#Importing xlrd Module import xlrd #Open Workbook, Provide file path as argument to the function open_workbook() workBook = xlrd.open_workbook("demo.xlsx") #Select the desired sheet from the workbook #sheet_by_index method fetches the sheets using numerical index values, starting from 0 spreadSheet = workBook.sheet_by_index(0) #Fetching No. of Columns columns_count = spreadSheet.ncols #Iterating all column name values for i in range(columns_count): col_name = spreadSheet.cell_value(0,i) print(col_name)
Extracting Particular Column & Row Values
The following example illustrates how you can fetch all of the values for a particular column. Again, we’ll do this using iteration over the range of the number of rows. Here we’re reading the values for the second column, Full Name. As the column and row index starts from 0, so the index for the second column here will be 1.
#Importing xlrd Module import xlrd #Open Workbook, Provide file path as argument to the function open_workbook() workBook = xlrd.open_workbook("demo.xlsx") #Select the desired sheet from the workbook #sheet_by_index method fetches the sheets using numerical index values, starting from 0 spreadSheet = workBook.sheet_by_index(0) #Fetching No. of Rows row_count = spreadSheet.ncols #Fetching the Value For Second Column #Iterating all column name values for i in range(row_count): value = spreadSheet.cell_value(i,1) print(value)
The example given below illustrates how you can fetch the values for an entire row. We need not do iteration here, rather we’ll make use of the row_values()
function here. This function takes the row index as the parameter and returns the list of values of that particular row.
#Importing xlrd Module import xlrd #Open Workbook, Provide file path as argument to the function open_workbook() workBook = xlrd.open_workbook("demo.xlsx") #Select the desired sheet from the workbook #sheet_by_index method fetches the sheets using numerical index values, starting from 0 spreadSheet = workBook.sheet_by_index(0) #Fetching Row Values secondRow = spreadSheet.row_values(1) #Printing the Row Values print(secondRow)
Apart from the simple reading of the Excel files, you can do much more with the spreadsheets using the xlrd
module. You can view the full documentation of the xlrd module here.