Can't save excel file using openpyxl - python

I'm having an issue with saving an Excel file in openpyxl.
I'm trying to create a processing script which would grab data from one excel file, dump it into a dump excel file, and after some tweaking around with formulas in excel, I will have all of the processed data in the dump excel file. My current code is as so.
from openpyxl import load_workbook
import os
import datetime
from openpyxl.cell import get_column_letter, Cell, column_index_from_string, coordinate_from_string
dump = dumplocation
desktop = desktoplocation
date ="%Y-%m-%d")
excel = load_workbook(dump+date+ ".xlsx", use_iterators = True)
sheet = excel.get_sheet_by_name("Sheet1")
query = raw_input('How many rows of data is there?\n')
except ValueError:
print 'Not a number'
#sheetname = raw_input('What is the name of the worksheet in the data?\n')
for filename in os.listdir(desktop):
if filename.endswith(".xlsx"):
print filename
data = load_workbook(filename, use_iterators = True)
ws = data.get_sheet_by_name(name = '17270115')
#copying data from excel to data excel
for row in sheet.iter_rows():
for cell in row:
for rows in ws.iter_rows():
for cells in row:
if (n>=17) and (n<=32):
cell.internal_value = cells.internal_value
#adding column between time in UTC and the data
column_index = 1
new_cells = {}
sheet.column_dimensions = {}
for coordinate, cell in sheet._cells.iteritems():
column_letter, row = coordinate_from_string(coordinate)
column = column_index_from_string(column_letter)
# shifting columns
if column >= column_index:
column += 1
column_letter = get_column_letter(column)
coordinate = '%s%s' % (column_letter, row)
# it's important to create new Cell object
new_cells[coordinate] = Cell(sheet, column_letter, row, cell.value)
sheet.cells = new_cells
#setting columns to be hidden
for coordinate, cell in sheet._cells.iteritems():
column_letter, row = coordinate_from_string(coordinate)
column = column_index_from_string(column_letter)
if (column<=3) and (column>=18):
column.set_column(column, options={'hidden': True})
A lot of my code is messy I know since I just started Python two or three weeks ago. I also have a few outstanding issues which I can deal with later on.
It doesn't seem like a lot of people are using openpyxl for my purposes.
I tried using the normal Workbook module but that didn't seem to work because you can't iterate in the cell items. (which is required for me to copy and paste relevant data from one excel file to another)
UPDATE: I realised that openpyxl can only create workbooks but can't edit current ones. So I have decided to change tunes and edit the new workbook after I have transferred data into there. I have resulted to using back to Workbook to transfer data:
from openpyxl import Workbook
from openpyxl import worksheet
from openpyxl import load_workbook
import os
from openpyxl.cell import get_column_letter, Cell, column_index_from_string, coordinate_from_string
dump = "c:/users/y.lai/desktop/data/201501.xlsx"
desktop = "c:/users/y.lai/desktop/"
excel = Workbook()
sheet = excel.add_sheet
query = raw_input('How many rows of data is there?\n')
except ValueError:
print 'Not a number'
#sheetname = raw_input('What is the name of the worksheet in the data?\n')
for filename in os.listdir(desktop):
if filename.endswith(".xlsx"):
print filename
data = load_workbook(filename, use_iterators = True)
ws = data.get_sheet_by_name(name = '17270115')
#copying data from excel to data excel
for x in range(6,int(query)):
for s in range(65,90):
for cell in Cell(sheet,chr(s),x):
for rows in ws.iter_rows():
for cells in rows:
if q>=5:
if (n>=17) and (n<=32):
cell.value = cells.internal_value
But this doesn't seem to work still
Traceback (most recent call last):
File "xxx\Desktop\", line 40, in <module>
for cell in Cell(sheet,chr(s),x):
File "xxx\AppData\Local\Continuum\Anaconda\lib\site-packages\openpyxl\", line 181, in __init__
self._shared_date = SharedDate(base_date=worksheet.parent.excel_base_date)
AttributeError: 'function' object has no attribute 'parent'
Went through the API but..I'm overwhelmed by the coding in there so I couldn't make much sense of the API. To me it looks like I have used the Cell module wrongly. I read the definition of the Cell and its attributes, thus having the chr(s) to give the 26 alphabets A-Z.

You can iterate using the standard Workbook mode. use_iterators=True has been renamed read_only=True to emphasise what this mode is used for (on demand reading of parts).
Your code as it stands cannot work with this method as the workbook is read-only and cell.internal_value is always a read only property.
However, it looks like you're not getting that far because there is a problem with your Excel files. You might want to submit a bug with one of the files. Also the mailing list might be a better place for discussion.

You could try using xlrd and xlwt instead of pyopenxl but you might find exactly what you are looking to do already available in xlutil - all are from python-excel.


How to merge multiple .xls files with hyperlinks in python?

I am trying to merge multiple .xls files that have many columns, but 1 column with hyperlinks. I try to do this with Python but keep running into unsolvable errors.
Just to be concise, the hyperlinks are hidden under a text section. The following ctrl-click hyperlink is an example of what I encounter in the .xls files: ES2866911 (T3).
In order to improve reproducibility, I have added .xls1 and .xls2 samples below.
ES2866911 (T3)
EP3887362 (A1)
AR118706 (A2)
ES2867600 (T3)
Desired outcome:
ES2866911 (T3)
EP3887362 (A1)
AR118706 (A2)
ES2867600 (T3)
I am unable to get .xls file into Python without losing formatting or losing hyperlinks. In addition I am unable to convert .xls files to .xlsx. I have no possibility to acquire the .xls files in .xlsx format. Below I briefly summarize what I have tried:
1.) Reading with pandas was my first attempt. Easy to do, but all hyperlinks are lost in PD, furthermore all formatting from original file is lost.
2.) Reading .xls files with openpyxl.load
InvalidFileException: openpyxl does not support the old .xls file format, please use xlrd to read this file, or convert it to the more recent .xlsx file format.
3.) Converting .xls files to .xlsx
from xls2xlsx import XLS2XLSX
x2x = XLS2XLSX(input.file.xls)
wb = x2x.to_xlsx()
TypeError: got invalid input value of type <class 'xml.etree.ElementTree.Element'>, expected string or Element
import pyexcel as p
p.save_book_as(file_name=input_file.xls, dest_file_name=export_file.xlsx)
TypeError: got invalid input value of type <class 'xml.etree.ElementTree.Element'>, expected string or Element
During handling of the above exception, another exception occurred:
4.) Even if we are able to read the .xls file with xlrd for example (meaning we will never be able to save the file as .xlsx, I can't even see the hyperlink:
import xlrd
wb = xlrd.open_workbook(file) # where vis.xls is your test file
ws = wb.sheet_by_name('Sheet1')
ws.cell(5, 1).value
'AR118706 (A2)' #Which is the name, not hyperlink
5.) I tried installing older versions of openpyxl==3.0.1 to overcome type error to no succes. I tried to open .xls file with openpyxl with xlrd engine, similar typerror "xml.entree.elementtree.element' error occured. I tried many ways to batch convert .xls files to .xlsx all with similar errors.
Obviously I can just open with excel and save as .xlsx but this defeats the entire purpose, and I can't do that for 100's of files.
You need to use xlrd library to read the hyperlinks properly, pandas to merge all data together and xlsxwriter to write the data properly.
Assuming all input files have same format, you can use below code.
# imports
import os
import xlrd
import xlsxwriter
import pandas as pd
# required functions
def load_excel_to_df(filepath, hyperlink_col):
book = xlrd.open_workbook(file_path)
sheet = book.sheet_by_index(0)
hyperlink_map = sheet.hyperlink_map
data = pd.read_excel(filepath)
hyperlink_col_index = list(data.columns).index(hyperlink_col)
required_links = [v.url_or_path for k, v in hyperlink_map.items() if k[1] == hyperlink_col_index]
data['hyperlinks'] = required_links
return data
# main code
# set required variables
input_data_dir = 'path/to/input/data/'
hyperlink_col = 'Publication_Number'
output_data_dir = 'path/to/output/data/'
output_filename = 'combined_data.xlsx'
# read and combine data
required_files = os.listdir(input_data_dir)
combined_data = pd.DataFrame()
for file in required_files:
curr_data = load_excel_to_df(data_dir + os.sep + file, hyperlink_col)
combined_data = combined_data.append(curr_data, sort=False, ignore_index=True)
cols = list(combined_data.columns)
m, n = combined_data.shape
hyperlink_col_index = cols.index(hyperlink_col)
# writing data
writer = pd.ExcelWriter(output_data_dir + os.sep + output_filename, engine='xlsxwriter')
combined_data[cols[:-1]].to_excel(writer, index=False, startrow=1, header=False) # last column contains hyperlinks
workbook =
worksheet = writer.sheets[list(workbook.sheetnames.keys())[0]]
for i, col in enumerate(cols[:-1]):
worksheet.write(0, i, col)
for i in range(m):
worksheet.write_url(i+1, hyperlink_col_index, combined_data.loc[i, cols[-1]], string=combined_data.loc[i, hyperlink_col])
reading hyperlinks -
pandas to_excel header formatting - Remove default formatting in header when converting pandas DataFrame to excel sheet
writing hyperlinks with xlsxwriter -
Without a clear reproducible example, the problem is not clear. Assume I have two files called tmp.xls and tmp2.xls containing dummy data as in the two screenshots below.
Then pandas can easily, load, concatenate, and convert to .xlsx format without loss of hyperlinks. Here is some demo code and the resulting file:
import pandas as pd
f1 = pd.read_excel('tmp.xls')
f2 = pd.read_excel('tmp2.xls')
f3 = pd.concat([f1, f2], ignore_index=True)
Inspired by #Kunal, I managed to write code that avoids using Pandas libraries. .xls files are read by xlrd, and written to a new excel file by xlwt. Hyperlinks are maintened, and output file was saved as .xlsx format:
import os
import xlwt
from xlrd import open_workbook
# read and combine data
directory = "random_directory"
required_files = os.listdir(directory)
#Define new file and sheet to get files into
new_file = xlwt.Workbook(encoding='utf-8', style_compression = 0)
new_sheet = new_file.add_sheet('Sheet1', cell_overwrite_ok = True)
#Initialize header row, can be done with any file
old_file = open_workbook(directory+"/"+required_files[0], formatting_info=True)
old_sheet = old_file.sheet_by_index(0)
for column in list(range(0, old_sheet.ncols)):
new_sheet.write(0, column, old_sheet.cell(0, column).value) #To create header row
#Add rows from all files present in folder
for file in required_files:
old_file = open_workbook(directory+"/"+file, formatting_info=True)
old_sheet = old_file.sheet_by_index(0) #Define old sheet
hyperlink_map = old_sheet.hyperlink_map #Create map of all hyperlinks
for row in range(1, old_sheet.nrows): #We need all rows except header row
if row-1 < len(hyperlink_map.items()): #Statement to ensure we do not go out of range on the lower side of hyperlink_map.items()
Row_depth=len(new_sheet._Worksheet__rows) #We need row depth to know where to add new row
for col in list(range(old_sheet.ncols)): #For every column we need to add row cell
if col is 1: #We need to make an exception for column 2 being the hyperlinked column
click=list(hyperlink_map.items())[row-1][1].url_or_path #define URL
new_sheet.write(Row_depth, col, xlwt.Formula('HYPERLINK("{}", "{}")'.format(click, old_sheet.cell(row, 1).value)))
else: #If not hyperlinked column
new_sheet.write(Row_depth, col, old_sheet.cell(row, col).value) #Write cell"random_directory/output_file.xlsx")
I assume the same as daedalus in terms of the excel files. Instead of pandas I use openpyxl to read and create a new excel file.
import openpyxl
wb1 = openpyxl.load_workbook('tmp.xlsx')
ws1 = wb.get_sheet_by_name('Sheet1')
wb2 = openpyxl.load_workbook('tmp2.xlsx')
ws2 = wb.get_sheet_by_name('Sheet1')
csvDict = {}
# Go through first sheet to find the hyperlinks and keys.
for (row in ws1.max_row):
hyperlink_dict[ws1.cell(row=row, column=1).value] =
[ws1.cell(row=row, column=2),
ws1.cell(row=row, column=2).value]
# Go Through second sheet to find hyperlinks and keys.
for (row in ws2.max_row):
hyperlink_dict[ws2.cell(row=row, column=1).value] =
[ws2.cell(row=row, column=2),
ws2.cell(row=row, column=2).value]
Now you have all the data so you can create a new workbook and save the values from the dict into it via opnenpyxl.
wb = Workbook(write_only=true)
ws = wb.create_sheet()
for irow in len(csvDict):
#use ws.append() to add the data from the csv.'new_big_file.xlsx')

Using Pandas and xlrd together. Ignoring absence/presence of column headers

I am hoping you can help me - I'm sure its likely a small thing to fix, when one knows how.
In my workshop, neither I nor my colleagues can make 'find and replace all' changes via the front-end of our database. The boss just denies us that level of access. If we need to make changes to dozens or perhaps hundreds of records it must all be done by copy-and-paste or similar means. Craziness.
I am trying to make a workaround to that with Python 2 and in particular libraries such as Pandas, pyautogui and xlrd.
I have researched serval StackOverflow threads and have managed thus far to write some code that works well at reading a given XL file .In production, this will be a file exported from a found data set in the database GUI front-end and will be just a single column of 'Article Numbers' for the items in the computer workshop. This will always have an Excel column header. E.g
All the records numbers are 5 digit numbers.
We also have the means of scanning items with an IR scanner to a 'Workflow' app on the iPad we have and automatically making an XL file out of that list of scanned items.
The XL file here could look something similar to this.
It differs in that there is no column header. All XL files have their data 'anchored' at cell A1 on 'Sheet1" and again just single column will be used. No unnecessary complications here!
Here is the script anyway. When it is fully working system arguments will be supplied to it. For now, let's pretend that we need to change records to have their 'RAM' value changed from
"2GB" to "2 GB".
import xlrd
import string
import re
import pandas as pd
field = "RAM"
value = "2 GB"
myFile = "/Users/me/folder/testArticles.xlsx"
df = pd.read_excel(myFile)
myRegex = "^[0-9]{5}$"
# data collection and putting into lists.
workbook = xlrd.open_workbook(myFile)
sheet = workbook.sheet_by_index(0)
data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in range(sheet.nrows)]
formatted = []
deDuped = []
# removing any possible XL headers, setting all values to strings
# that look like five-digit ints, apply a regex to be sure.
for i in data:
cellValue = str(i)
cellValue = cellValue.translate(None, '\'[u]\'')
# remove the decimal point
# Searching for the header will cause a database front-end problem.
cellValue = cellValue[:-2]
cellValue = cellValue.translate(None, string.letters)
# making sure only valid article numbers get through
# blank rows etc can take a hike
if len(cellValue) != 0:
if re.match(myRegex, cellValue):
# weeding out any possilbe dupes.
for i in formatted:
if i not in deDuped:
#main code block
for i in deDuped:
#lots going on here involving pyauotgui
#making sure of no error running searches, checking for warnings, moving/tabbing around DB front-end etc
#if all goes to plan
#removing that record number from the excel file and saving the change
#so that if we run the script again for the same XL file
#we don't needlessly update an already OK record again.
df = df[~df['ANR'].astype(str).str.startswith(i)]
df.to_excel(myFile, index=False)
What I really would to like to find out is how can I run the script so that "doesn't care" about the presence or absence of the column header.
df = df[~df['ANR'].astype(str).str.startswith(i)]
Appears to be the line of code where this all hangs on. I've made several changes to the line in different combination but my script always crashes.
If a column header, ("ANR") in my case, is essential for this particular 'pandas' method is there a straight-forward way of inserting a column header into an XL file if it lacks one in the first place - i.e the XL files that come from the IR scanner and the 'Workflow' app on the iPad?
Thanks guys!
I've tried as suggested by Patrick implementing some code to check if cell "A1" has a header or not. Partial success. I can put "ANR" in cell A1 if its missing but I lose whatever was there in the first place.
import xlwt
from openpyxl import Workbook, load_workbook
from xlutils.copy import copy
import openpyxl
# data collection
workbook = xlrd.open_workbook(myFile)
sheet = workbook.sheet_by_index(0)
data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in range(sheet.nrows)]
cell_a1 = sheet.cell_value(rowx=0, colx=0)
if cell_a1 == "ANR":
print "has header"
wb = openpyxl.load_workbook(filename= myFile)
ws = wb['Sheet1']
ws['A1'] = "ANE"
#re-open XL file again etc etc.
I found this new block of code over at writing to existing workbook using xlwt. In this instance the contributor actually used openpyxl.
I think I got it fixed for myself.
Still a tiny bit messy but seems to be working. Added an 'if/else' clause to check the value of cell A1 and to take action accordingly. Found most of the code for this at how to append data using openpyxl python to excel file from a specified row? - using the suggestion for openpyxl
import pyperclip
import xlrd
import pyautogui
import string
import re
import os
import pandas as pd
import xlwt
from openpyxl import Workbook, load_workbook
from xlutils.copy import copy
field = "RAM"
value = "2 GB"
myFile = "/Users/me/testSerials.xlsx"
df = pd.read_excel(myFile)
myRegex = "^[0-9]{5}$"
# data collection
workbook = xlrd.open_workbook(myFile)
sheet = workbook.sheet_by_index(0)
data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in range(sheet.nrows)]
cell_a1 = sheet.cell_value(rowx=0, colx=0)
if cell_a1 == "ANR":
print "has header"
headers = ['ANR']
workbook_name = 'myFile'
wb = Workbook()
page =
# page.title = 'companies'
page.append(headers) # write the headers to the first line
workbook = xlrd.open_workbook(workbook_name)
sheet = workbook.sheet_by_index(0)
data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in range(sheet.nrows)]
for records in data:
#then load the data all over again, this time with inserted header
workbook = xlrd.open_workbook(myFile)
sheet = workbook.sheet_by_index(0)
data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in range(sheet.nrows)]
formatted = []
deDuped = []
# removing any possible XL headers, setting all values to strings that look like five-digit ints, apply a regex to be sure.
for i in data:
cellValue = str(i)
cellValue = cellValue.translate(None, '\'[u]\'')
# remove the decimal point
cellValue = cellValue[:-2]
# cellValue = cellValue.translate(None, ".0")
cellValue = cellValue.translate(None, string.letters)
# making sure any valid ANRs get through
if len(cellValue) != 0:
if re.match(myRegex, cellValue):
# ------------------------------------------
# weeding out any possilbe dupes.
for i in formatted:
if i not in deDuped:
# ref -
df = pd.read_excel(myFile)
print df
for i in deDuped:
#pyautogui code is run here...
#if all goes to plan update the XL file
df = df[~df['ANR'].astype(str).str.startswith(i)]
df.to_excel(myFile, index=False)

How to write data into existing '.xlsx' file which has multiple sheets

i have to update/append data into existing xlsx file.
xlsx file contains multiple sheets.
for example i want to append some data into existing sheet 'Sheet1', how to do this
To append a new row of data to an existing spreadsheet, you could use the openpyxl module. This will:
Load the existing workbook from the file.
Determines the last row that is in use using ws.get_highest_row()
Add the new row on the next empty row.
Write the updated spreadsheet back to the file
For example:
import openpyxl
file = 'input.xlsx'
new_row = ['data1', 'data2', 'data3', 'data4']
wb = openpyxl.load_workbook(filename=file)
ws = wb['Sheet1'] # Older method was .get_sheet_by_name('Sheet1')
row = ws.get_highest_row() + 1
for col, entry in enumerate(new_row, start=1):
ws.cell(row=row, column=col, value=entry)
Note, as can be seen in the docs for XlsxWriter:
XlsxWriter is designed only as a file writer. It cannot read or modify
an existing Excel file.
This approach does not require the use of Windows / Excel to be installed but does have some limitations as to the level of support.
Try xlwings (currently available from it is suitable for both reading and writing excel files.
Everything you need is in the quickstart tutorial. Something like this should be what you want.
import xlwings as xw
with open("FileName.xlsx", "w") as file:
wb = xw.Book(file) # Creates a connection with workbook
xw.Range('A1:D1').value = [1,2,3,4]
Selecting a Sheet
In order to read and write data to a specific sheet. You can activate a sheet and then call Range('cell_ref').
Using Range to select cells
To select a single cell on the current worksheet
a = xw.Range('A1').value;
xw.Range('A1').value = float(a)+5;
To explicitly select a range of cells
xw.Range('A1:E8').value = [new_cell_values_as_list_of_lists];
xw.Range('Named range').value = [new_cell_values_as_list_of_lists];
To automatically select a contiguous range of populated cells that start from 'A1' and go right and down... until empty cell found.
It is also possible to just select a row or column using:
Other methods of creating a range object (from the api doc enter link description here)
Range('A1') Range('Sheet1', 'A1') Range(1, 'A1')
Range('A1:C3') Range('Sheet1', 'A1:C3') Range(1, 'A1:C3')
Range((1,2)) Range('Sheet1, (1,2)) Range(1, (1,2))
Range((1,1), (3,3)) Range('Sheet1', (1,1), (3,3)) Range(1, (1,1), (3,3))
Range('NamedRange') Range('Sheet1', 'NamedRange') Range(1, 'NamedRange')

Why won't this xlsx file open?

I'm trying to use the openpyxl module to take a spreadsheet, see if there are empty cells in a certain column (in this case, column E), and then copy the rows that contain those empty cells to a new spreadsheet. The code runs without traceback, but the resulting file won't open. What's going on?
Here's my code:
#import the openpyxl module
import openpyxl
#First create a new workbook & sheet
newwb = openpyxl.Workbook()'TESTINGTHISTHING.xlsx')
newsheet = newwb.get_sheet_by_name('Sheet')
#open the original file
wb = openpyxl.load_workbook('OriginalWorkbook.xlsx')
#create a sheet object
sheet = wb.get_sheet_by_name('Sheet1')
#Find out how many cells of a certain column are left blank,
#and what rows they're in
count = 0
listofrows = []
for row in range(2, sheet.get_highest_row() + 1):
company = sheet['E' + str(row)].value
if company == None:
count += 1
print listofrows
print count
#Put the values of the rows with blank company names into the new sheet
for i in range(len(listofrows)):
j = 0
newsheet['A' + str(i+1)] = sheet['A' + str(listofrows[j])].value
Please help!
I just ran your program with a mock document. I was able to open my output file without problem. Your issues probably relies within your excel or openpyxl version.
Please provide your software versions in addition to your source document so I can look further into the issue.
You can always update openpyxl with:
pip install openpyxl --upgrade

Changing Excel Sheet every time python script runs

I need to change the sheet in an excel workbook, as many times as the code runs..Suppose my python scripts runs the first time and data gets saved in sheet A, next time when some application runs my script data should be saved in sheet B.Sheet A should be as it is in that workbook..
Is it posible ? If yes ,How?
Here is my code:
#!/usr/bin/env python
import subprocess
import xlwt
out,err = process.communicate()
sheet=wb.add_sheet('Sheet_A') #next time it should save in Sheet_B
row = 0
for line in out.split('\n'):
for i,wrd in enumerate(line.split()):
if not wrd.startswith("***"):
print wrd
Any help is appreciated...
I would recommend using openpyxl. It can read and write xlsx files.
If needed, you can always convert them to xls with Excel or Open/LibreOffice,
assuming you have only one big file at the end.
This script creates a new Excel file if none exists and adds a new sheet every time it is run. I use the index + 1 as the sheet name (title) starting with 1. The numerical index starts at 0. You will end up with a file that has sheets named 1, 2, 3 etc. Every time you write your data into the last sheet.
import os
from openpyxl import Workbook
from openpyxl.reader.excel import load_workbook
file_name = 'test.xlsx'
if os.path.exists(file_name):
wb = load_workbook(file_name)
last_sheet = wb.worksheets[-1]
index = int(last_sheet.title)
ws = wb.create_sheet(index)
ws.title = str(index + 1)
wb = Workbook()
ws = wb.worksheets[0]
ws.title = '1'
ws.cell('A2').value= 'new_value'

