I am working on a python prtogram that reads an excel file and based on the information in that file, writes data in the same file
This is my code:
import xlrd
import xlwt
from xlutils import copy
location = "C:\\Users\\adarsh\\Desktop\\Python\\Other\\Blah.xls"
readbook = xlrd.open_workbook(location)
workbook = xlutils.copy(readbook)
sheet = workbook.get_sheet(0)
I get this error when I run my code:
workbook = xlutils.copy(readbook)
AttributeError: module 'xlutils' has no attribute 'copy'
There is an error saying that there is no attribute copy even though online tutorials use that feature
I don't know how to fix this
looks like you haven't imported the right function from the right place. Try this:
from xlutils.copy import copy
Then you can simply call:
copy(readbook)
You imported copy specifically from the module so you shouldn't need the xlutils.copy() it should just be copy()
I am starting in the world of Data analysis and Python and my current job is import large CSV files with tweets and save them as xlsx, with format:Unicode UTF-8. I have been doing it the classic way one by one, but I have over hundreds of them and more will come so I need to automate it.
The process I need to do is the following in order to not loose data.
I have tried to do it with python but so far only managed to do it folder by folder ( improve from file by file) but te code looses some data and I think that is because It only opens as csv the file and saves it as xlsx ( I don't know it exactly because the code is a collection from others in the internet, sorry).
import os
currentDirectory = os.getcwd()
os.chdir (currentDirectory)
import os
import glob
import csv
import openpyxl # from https://pythonhosted.org/openpyxl/ or PyPI (e.g. via pip)
for csvfile in glob.glob(os.path.join('.', '*.csv')):
wb = openpyxl.Workbook()
ws = wb.active
with open(csvfile, 'rt', encoding='UTF-8') as f:
reader = csv.reader(f)
for r, row in enumerate(reader, start=1):
for c, val in enumerate(row, start=1):
ws.cell(row=r, column=c).value = val
wb.save(csvfile.replace ('.csv', '.xlsx')) #.csv' + '.xlsx')
I am trying to improve it adding new things but if someone knows how to do the exact process in Python or VBA or another language I would be so grateful if you could share.
Edit: To answer the comment and to after running some file comparisons it seems that the only difference is the format, but it doesn't seem to be a loss in data itself. However my client is asking me to make it auto but maintaining the format of the first one. The first one is the format I want and the second one is the automicatially generated file:
Thank you
Instead of using openpyxl directly, I would use pandas, which internally uses openpyxl to do the detailed work. Together with the standard library pathlib, this short script will do the same:
from pathlib import Path
import pandas as pd
p = Path('.')
for csvfile in list(p.glob('**/*.csv')):
df = pd.read_csv(csvfile)
excelfile = csvfile.with_suffix('.xlsx')
df.to_excel(excelfile)
print(csvfile.parent, csvfile.name, excelfile.name)
Is there a way to specify which sheet to open within an excel workbook when using a python command to open the application? (ex: using win32 Dispatch or os.system)?
I think the best way would be to activate the focus on the sheet first, then open the workbook.
from openpyxl import load_workbook
wb = load_workbook('my_workbook.xlsx')
sheet_to_focus = 'my_sheet'
for s in range(len(wb.sheetnames)):
if wb.sheetnames[s] == sheet_to_focus:
break
wb.active = s
wb.save('my_workbook.xlsx')
Then you could probably open it (untested code):
import os
os.chdir('C:\\my_folder\\subfolder')
os.system('start excel.exe my_workbook.xlsx')
I find the easiest way to be with pandas:
import pandas as pd
df = pd.read_excel('path/to/sheet.xlsx', 'sheet_name')
You can read the documentation here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html
I am starting on a code that loads and edits excel (the version I am using is office 2017) sheet using openpyxl. Right now I am still trying to wrap my head around how this module works, here's the code
import openpyxl
from openpyxl import load_workbook
from openpyxl import workbook
from openpyxl.compat import range
from openpyxl.utils import get_column_letter
import os
os.chdir("D:\Scripts\Python\Testing Scripts\My Excel Folder")
wb = load_workbook("MyExcel.xlsx")
names = wb.sheetnames()
print(names)
print(type(wb))
and the error I receive is,
TypeError: 'list' object is not callable
For the string of code
names = wb.sheetnames()
wb.get_sheet_names() returns the list of all the sheets in that excel workbook.
print (wb.get_sheet_names())
for the latest openpyxl to avoid warning:
print (wb.sheetnames)
if you want to access a particular sheet
ws = wb.get_sheet_by_name(name = 'Sheet 1')
Use: wb.sheetnames
Example -
names = wb.sheetnames
print(names)
Do not use: get_sheet_names()
If you will use this, you will get this Warning.
DeprecationWarning: Call to deprecated function get_sheet_names (Use wb.sheetnames).
I have a simple excel file:
A1 = 200
A2 = 300
A3 = =SUM(A1:A2)
this file works in excel and shows proper value for SUM, but while using openpyxl module for python I cannot get value in data_only=True mode
Python code from shell:
wb = openpyxl.load_workbook('writeFormula.xlsx', data_only = True)
sheet = wb.active
sheet['A3']
<Cell Sheet.A3> # python response
print(sheet['A3'].value)
None # python response
while:
wb2 = openpyxl.load_workbook('writeFormula.xlsx')
sheet2 = wb2.active
sheet2['A3'].value
'=SUM(A1:A2)' # python response
Any suggestions what am I doing wrong?
It depends upon the provenance of the file. data_only=True depends upon the value of the formula being cached by an application like Excel. If, however, the file was created by openpyxl or a similar library, then it's probable that the formula was never evaluated and, thus, no cached value is available and openpyxl will report None as the value.
I have replicated the issue with Openpyxl and Python.
I am currently using openpyxl version 2.6.3 and Python 3.7.4. Also I am assuming that you are trying to complete an exercise from ATBSWP by Al Sweigart.
I tried and tested Charlie Clark's answer, considering that Excel may indeed cache values. I opened the spreadsheet in Excel, copied and pasted the formula into the same exact cell, and finally saved the workbook. Upon reopening the workbook in Python with Openpyxl with the data_only=True option, and reading the value of this cell, I saw the proper value, 500, instead of the wrong value, the None type.
I hope this helps.
I had the same issue. This may not be the most elegant solution, but this is what worked for me:
import xlwings
from openpyxl import load_workbook
excel_app = xlwings.App(visible=False)
excel_book = excel_app.books.open('writeFormula.xlsx')
excel_book.save()
excel_book.close()
excel_app.quit()
workbook = load_workbook(filename='writeFormula.xlsx', data_only=True)
I have suggestion to this problem. Convert xlsx file to csv :).
You will still have the original xlsx file. The conversion is done by libreoffice (it is that subprocess.call() line).You can use also Pandas for this as a more pythonic way.
from subprocess import call
from openpyxl import load_workbook
from csv import reader
filename="test"
wb = load_workbook(filename+".xlsx")
spread_range = wb['Sheet1']
#what ever function there is in A1 cell to be evaluated
print(spread_range.cell(row=1,column=1).value)
wb.close()
#this line can be done with subprocess or os.system()
#libreoffice --headless --convert-to csv $filename --outdir $outdir
call("libreoffice --headless --convert-to csv "+filename+".xlsx", shell=True)
with open(filename+".csv", newline='') as f:
reader = reader(f)
data = list(reader)
print(data[0][0])
or
# importing pandas as pd
import pandas as pd
# read an excel file and convert
# into a dataframe object
df = pd.DataFrame(pd.read_excel("Test.xlsx"))
# show the dataframe
df
I hope this helps somebody :-)
Yes, #Beno is right. If you want to edit the file without touching it, you can make a little "robot" that edits your excel file.
WARNING: This is a recursive way to edit the excel file. These libraries are depend on your machine, make sure you set time.sleep properly before continuing the rest of the code.
For instance, I use time.sleep, subprocess.Popen, and pywinauto.keyboard.send_keys, just add random character to any cell that you set, then save it. Then the data_only=True is working perfectly.
for more info about pywinauto.keyboard: pywinauto.keyboard
# import these stuff
import subprocess
from pywinauto.keyboard import send_keys
import time
import pygetwindow as gw
import pywinauto
excel_path = r"C:\Program Files\Microsoft Office\root\Office16\EXCEL.EXE"
excel_file_path = r"D:\test.xlsx"
def focus_to_window(window_title=None): # function to focus to window. https://stackoverflow.com/a/65623513/8903813
window = gw.getWindowsWithTitle(window_title)[0]
if not window.isActive:
pywinauto.application.Application().connect(handle=window._hWnd).top_window().set_focus()
subprocess.Popen([excel_path, excel_file_path])
time.sleep(1.5) # wait excel to open. Depends on your machine, set it propoerly
focus_to_window("Excel") # focus to that opened file
send_keys('%{F3}') # excel's name box | ALT+F3
send_keys('AA1{ENTER}') # whatever cell do you want to insert somthing | Type 'AA1' then press Enter
send_keys('Stackoverflow.com') # put whatever you want | Type 'Stackoverflow.com'
send_keys('^s') # save | CTRL+S
send_keys('%{F4}') # exit | ALT+F4
print("Done")
Sorry for my bad english.
As others already mentioned, Openpyxl only reads cashed formula value in data_only mode. I have used PyWin32 to open and save each XLSX file before it's processed by Openpyxl to read the formulas result value. This works for me well, as I don't process large files. This solution will work only if you have MS Excel installed on your PC.
import os
import win32com.client
from openpyxl import load_workbook
# Opening and saving XLSX file, so results for each stored formula can be evaluated and cashed so OpenPyXL can read them.
excel_file = os.path.join(path, file)
excel = win32com.client.gencache.EnsureDispatch('Excel.Application')
excel.DisplayAlerts = False # disabling prompts to overwrite existing file
excel.Workbooks.Open(excel_file )
excel.ActiveWorkbook.SaveAs(excel_file, FileFormat=51, ConflictResolution=2)
excel.DisplayAlerts = True # enabling prompts
excel.ActiveWorkbook.Close()
wb = load_workbook(excel_file)
# read your formula values with openpyxl and do other stuff here
I ran into the same issue. After reading through this thread I managed to fix it by simply opening the excel file, making a change then saving the file again. What a weird issue.