I have a script that opens an xlsm file and an xlsx file. It modifies the xlsm with data from the xlsx, then saves the xlsm file. When i open that xlsm file after the script is run, I get an error shown in the image.
The file then works fine but I get an XML error shown below:
The code I am using is:
import openpyxl
destwb = openpyxl.load_workbook(filename="C:\\627 Data\\winphy\\071-000-022-00 627 data.xlsm", read_only=False, keep_vba=True)
.....Code.....
destwb.save(filename="C:\\627 Data\\winphy\\071-000-022-00 627 data2.xlsm")
I ran into something similar and pieced largely recycled code from this solution by Joost in this question: How to save XLSM file with Macro, using openpyxl
Apparently, openpyxl doesn't read or preserve all of the magic macro parts of an xslm when opening and saving. Since the files are in a zip format, the solution:
Saves your work as an xlsx
Opens the original xlsm as a zip and extracts the key parts
Creates a new zip with the data from your saved xlsx and the above key parts
Renames that as a xlsm
I took the sample code, turned it into a usable replacement for workbook.save(), fixed a missing file (likely Excel change since the original solution), added zip compression and creation of a backup file to the mix. May this do what you need.
def saveXlsm(wb, xlsmname):
'''Some crazy workaround to fix what openpyxl cannot when recreating an xlsm file.
Use as replacement for workbook.save()
'''
import zipfile
from shutil import copyfile
from shutil import rmtree
# Unzip original and tmp into separate dirs
PAD = os.getcwd()
wb.save('tmp.xlsx')
with zipfile.ZipFile(xlsmname, 'r') as z:
z.extractall('./xlsm/')
with zipfile.ZipFile('tmp.xlsx', 'r') as z:
z.extractall('./xlsx/')
# copy pertinent left out macro parts into tmp
copyfile('./xlsm/[Content_Types].xml','./xlsx/[Content_Types].xml')
copyfile('./xlsm/xl/_rels/workbook.xml.rels','./xlsx/xl/_rels/workbook.xml.rels')
copyfile('./xlsm/xl/vbaProject.bin','./xlsx/xl/vbaProject.bin')
copyfile('./xlsm/xl/sharedStrings.xml','./xlsx/xl/sharedStrings.xml')
# create a new tmp zip to rebuild the xlsm
z = zipfile.ZipFile('tmp.zip', 'w', zipfile.ZIP_DEFLATED)
# put all the parts back into the new Frankenstein
os.chdir('./xlsx')
for root, dirs, files in os.walk('./'):
for file in files:
z.write(os.path.join(root, file))
z.close()
os.chdir(PAD)
# humanize Frankenstein
bakname = xlsmname + '.bak'
if os.access(bakname, os.W_OK):
os.remove(bakname)
os.rename(xlsmname, bakname)
os.rename('tmp.zip', xlsmname)
#clean
rmtree('./xlsm/')
rmtree('./xlsx/')
os.remove('./tmp.xlsx')
Related
I need to convert an xlsx file into csv. After googling, I found this satisfying answer :
import pandas as pd
read_file = pd.read_excel("./data/myxlsxfiles.xlsx" )
read_file.to_csv("./data/mycsv.csv", index=None, header=True, sep=";")
This works fine. However, something surprising occurs and I could not find any suitable solution on the internet.
The above code is in a script and each time the script is called, I get a csv from the xlsx file.
Now I correct my excel file, close the excel file, erase the csv file and start again the process. And there it is ! the csv file do not take into account the changes made to the excel file. It seems that the previous version of the excel file was cached somewhere in memory and pandas is using it.
For the time being, the only workaround found is to rename the xlsx file. I don't find this very convenient.
Has one of you an idea of what's happening and how to solve this ?
Thanks in advance
Maybe you can try to clear the read_file variable first, but I think it's a problem on Windows.
Otherwise, you can just duplicate the file into a temp folder, read the duplicate, proceed it and then delete the duplicate. Like this (not tested):
path = "./data/"
temp_path = "./data/temp/"
filename_orginal = path + "myxlsxfiles.xlsx"
filename_temp = temp_path + "myxlsxfiles_" + int(time.time()) + ".xlsx"
#Check if folder exists
if os.path.exists(path) == False:
os.mkdir(temp_path)
#Copy file
shutil.copyfile(filename_orginal, filename_temp)
#Do your stuff
read_file = pd.read_excel(filename_temp)
read_file.to_csv("./data/mycsv.csv", index=None, header=True, sep=";")
#Remove temp file
os.remove(filename_temp)
I try to zip files, I used the example from https://thispointer.com/python-how-to-create-a-zip-archive-from-multiple-files-or-directory/
with ZipFile('sample2.zip', 'w') as zipObj2:
# Add multiple files to the zip
zipObj2.write('sample_file.csv')
sample2.zip is created, but it is empty. Of course that the csv file exists and is not empty.
I run this code from Jupyter Notebook
edit: I'm using relative paths -
input_dir = "../data/example/"
with zipfile.ZipFile(os.path.join(input_dir, 'f.zip'), 'a') as zipObj2:
zipObj2.write(os.path.join(input_dir, 'f.tif'))
you tried to close zip file to save ?
from zipfile import ZipFile
with ZipFile('sample2.zip', 'w') as zipObj2:
zipObj2.write('sample_file.csv')
zipObj2.close()
I'm a little confused by your question, but if I'm correct it sounds like you're trying to place multiple CSV files within a single zipped file? If so, this is what you're looking for:
#initiate files variable that contains the directory from which you wish to zip csv files
files=[f for f in os.listdir("./your_directory") if f.endswith('.csv')]
#initalize empty DataFrame
all_data = pd.DataFrame()
#iterate through the files variable and concatenate them to all_data
for file in files:
df = pd.read_csv('./your_directory' + file)
all_data = pd.concat([all_data, df])
Then call your new DataFrame(all_data) to verify that contents were transferred.
I have Multiple txt file in a folder. I need to insert the data from the txt file into mySql table
I also need to sort the files by modified date before inserting the data into the sql table named TAR.
below is the file inside one of the txt file. I also need to remove the first character in every line
SSerial1234
CCustomer
IDivision
Nat22
nAembly
rA0
PFVT
fchassis1-card-linec
RUnk
TP
Oeka
[06/22/2020 10:11:50
]06/22/2020 10:27:22
My code only reads all the files in the folder and prints the contents of the file. im not sure how to sort the files before reading the files 1 by 1.
Is there also a way to read only a specific file (JPE*.log)
import os
for path, dirs, files in os.walk("C:\TAR\TARS_Source/"):
for f in files:
fileName = os.path.join(path, f)
with open(fileName, "r") as myFile:
print(myFile.read())
Use glob.glob method to get all files using a regex like following...
import glob
files=glob.glob('./JPE*.log')
And you can use following to sort files
sorted_files=sorted(files)
I created a function to read excel sheet read_Excel_file(path) that return a list contains some data from a specific Column.
and in the main code I search all the excel files (where the name start Design) and this excel file should be saved in a folder Design. If I find the excel file, I call the function read_Excel_file.
Please find below the code:
import openpyxl as opx
import os
for r, d, f in os.walk('.'):
for file in f:
if '.xlsx' and 'design' in file:
#print(r)
if r.endswith('\Design'):
print(file)
read_Excel_file(file)
but I get the error :
No such file or directory
even if I am sure that I have this file in my directory
Do you think that I have path problem?
PS: I add print(file) just to check the name of the file, but when read_Excel_file(file) after that I have the error.
Can you help me please?
File is just the name of the file. You are missing the complete adress.
You need to add the root part of the address.
Just do:
filepath = os.path.join(r, file)
read_Excel_file(filepath)
How do you open an xls or csv file in python without having to connect the whole path?
ex: instead of using c:/user/...filename how do you connect it with just filename?
is it possible using pandas? This is in order to transfer the code from on console to another and the code being able to open with ease. From my understanding, if I use the path and send the code to another computer the excel page won't open there. btw the code will be sent with the original excel sheet
In this case, I believe you would have to set your working directory to the absolute path of your .py file.
Note, for the code below, your .csv file should be in the same directory as your .py file.
import os.path
import pandas as pd
base_dir = os.path.dirname(os.path.abspath(__file__)) # set directory to location of .py file
os.chdir(base_dir) # change directory
csv_file = pd.read_csv('file.csv',sep=',') # read .csv
Similar to solution of #Ira H., but instead of changing working directory you can generate full path:
import os.path
import pandas as pd
base_dir = os.path.dirname(
os.path.abspath(__file__)
) # set directory to location of .py file
csv_file = pd.read_csv(f"{base_dir}\\full_paths.csv", sep=",") # read .csv