Excel files created cannot be opened - python

I have an Excel source file in a source folder (*.xlsm) and another file (also *.xlsm) that contain some data. I have to create a third file, that has to be a *.xls file, that is basically the Excel source file that contains some data of the second file. In order to do that I have written this code:
from openpyxl import load_workbook
file1 = "C:\\Users\Desktop\file1.xlsm"
file2 = "C:\\Users\Desktop\file2.xlsm"
file3 = "C:\\Users\Desktop\file3.xls"
wb1 = load_workbook(file1)
sheet1 = wb1["Sheet1"]
wb2 = load_workbook(file2)
sheet2 = wb2["Sheet1"]
sheet1["A1"].value = sheet2["A1"].value
wb1.save(file3)
The code seems to be OK and doesn't return any error, but the I cannot open the created file3.
I don't understand why, I tried to change the extension of the third file but both *.xlsx and *.xlsm show this problem. I also tried to delete the line part
sheet1["A1"].value = sheet2["A1"].value
To understand if the problem was linked to the writing of the sheet, but the problem remains.

First of all please not that your code is not creating any new file but just resaving an existing one.
Also is not clear what you want: do you want to create file3? With what information? Your code is not doing anything of that.
However I tried to run a short version of your code and I got the error:
openpyxl.utils.exceptions.InvalidFileException: openpyxl does not
support .xlsm' file format, please check you can open it with Excel
first. Supported formats are: .xlsx,.xlsm,.xltx,.xltm
Most likely your file format is unsupported. Try to resave your files in the format xlsx. I think the problem are macros: if you don't have any of them in your files then changing the format should not be any issue. If you have I am not sure openpyxl will work in that way (without any workaround at least).
This answer might help. It propose to extract the xlms files (they are zip files), work on the ones that represent the format of your sheet (not the macro) and then put everything together again.

One error might be that the filepath variables require unicode escape's for the \
Thus: the correct version would be
file1 = "C:\\Users\\Desktop\\file1.xlsm"
file2 = "C:\\Users\\Desktop\\file2.xlsm"
file3 = "C:\\Users\\Desktop\\file3.xls"

Related

Python files pdf rename

I have a file .pdf in a folder and I have a .xls with two-column. In the first column I have the filename without extension .pdf and in the second column, I have a value.
I need to open file .xls, match the value in the first column with all filenames in the folder and rename each file .pdf with the value in the second column.
Is it possible?
Thank you for your support
Angelo
You'll want to use the pandas library within python. It has a function called pandas.read_excel that is very useful for reading excel files. This will return a dataframe, which will allow you to use iloc or other methods of accessing the values in the first and second columns. From there, I'd recommend using os.rename(old_name, new_name), where old_name and new_name are the paths to where your .pdf files are kept. A full example of the renaming part looks like this:
import os
# Absolute path of a file
old_name = r"E:\demos\files\reports\details.txt"
new_name = r"E:\demos\files\reports\new_details.txt"
# Renaming the file
os.rename(old_name, new_name)
I've purposely left out a full explanation because you simply asked if it is possible to achieve your task, so hopefully this points you in the right direction! I'd recommend asking questions with specific reproducible code in the future, in accordance with stackoverflow guidelines.
I would encourage you to do this with a .csv file instead of a xls, as is a much easier format (requires 0 formatting of borders, colors, etc.).
You can use the os.listdir() function to list all files and folders in a certain directory. Check os built-in library docs for that. Then grab the string name of each file, remove the .pdf, and read your .csv file with the names and values, and the rename the file.
All the utilities needed are built-in python. Most are the os lib, other are just from csv lib and normal opening of files:
with open(filename) as f:
#anything you have to do with the file here
#you may need to specify what permits are you opening the file with in the open function

Python: Excel file (xlsx) export with variable as the file path, using pandas

I defined an .xlsx file path as the variable output:
print(output)
r'C:\Users\Kev\Documents\Python code.xlsx'
I want to export a pandas dataframe as an .xlxs file, but need to have the file path as the output variable.
I can get the code to work with the file path. I've tried about a dozen ways (copying and/or piecing code together from documentation, stack overflow, blogs, etc.) and getting a variety of errors. None worked. Here is one that worked with the file path:
df = pd.DataFrame(file_list)
df.to_excel(r'C:\Users\Kev\Documents\Python code.xlsx', index=False)
I would want something like:
df.to_excel(output, index=False)
In any form or package, as long as it produces the same xlsx file and won’t need to be edited to change the file path and name (that would be done where the variable output is defined.
I've attempted several iterations on the XlsxWriter site, the openpyxl site, the pandas site, etc.
(with the appropriate python packages). Working in Jupyter Notebook, Python 3.8.
Any resources, packages, or code that will help me to use a variable in place of a file path for an xlsx export from a pandas dataframe?
Why I want it like this is a long story, but basically I'll have several places at the top of the code where myself and other (inexperienced) coders can quickly put file paths in and search for keywords (rather than hunt through code to find where to replace paths). The data itself is file paths that I'll iteratively search through (this is the beginning of a larger project).
try to put the path this way
output = "C://Users//Kev//Documents//Python code.xlsx"
df.to_excel(output , index=False)
Always worked for me
or you can also do like
output = "C://Users//Kev//Documents//"
df.to_excel(output +"Python code.xlsx" , index=False)
os module would be the most useful here:
from os import path
output = path.abspath("your_excel_file.xlsx")
print(output)
this will return the current working directory path plus the file name you've put into the abspath function as a parameter. Also for those interested about why some people use backslash "\" and not forwardslash "/" while writing file paths here is a good stackoverflow answer for it So what IS the right direction of the path's slash (/ or \) under Windows?
You can use format strings with python3
import pandas as pd
df = pd.DataFrame({"a":"b"}, {"c": "d"})
file_name = "filename.xlsx"
df.to_excel(f"/your/path/to/file/{file_name}", index=False)
Assuming that OP's dataframe is df, that OP is using Windows and wants to store the file in the Desktop, OP's username is cowboykevin05, and the filename that one wants is 0001.xlsx, one can use os.path as follows
from os import path
df.to_excel(path.join('C:\\Users\\cowboykevin05\\Desktop', '0001.xlsx'), index=False)

Open a .csv after using Dataframe.to_csv Python

Is there a way to open a .csv file right after using Dataframe.to_csv?
Currently, I am using os.startfile to open the .csv file in a folder (search for .csv file and open it) - but I want to open the specific .csv I just created using df.to_csv.
Here is my current code using os.startfile:
dirName3 = r"\\xx\xx\SourceFolder"
fn2 = [f2 for f2 in os.listdir(dirName3)\
if f2.endswith('.csv') and os.path.isfile(os.path.join(dirName3, f2))][0]
path3 = os.path.join(dirName3, fn2)
open1 = os.startfile(path3)
The above code will open the .csv file I've created but only if it is top of the folder. So if there are others in the folder it may not be at the top and may open a different file.
I also can't specify an absolute path because the .csv name (using df.to_csv) will change day to day based on user input. I also won't be able to search by date because there may be multiple files from the same day in the folder.
Any help appreciated.
Answering my own question after discussion with others in comments above.
Came up with this to solve the problem:
import os
dirName3 = r"\\xx\xx\Source Folder"
fn2 = [f2 for f2 in os.listdir(dirName3)\
if f2.endswith(str(datetime.now().strftime('%d_%m_%y_')) + Qname1 + '.csv') and os.path.isfile(os.path.join(dirName3, f2))][0]
path3 = os.path.join(dirName3, fn2)
open1 = os.startfile(path3)
Using f.endswith - instead of '.csv' as in my original above, I used the same information I used to write the csv (using to_csv function which isn't included here). This really only works because I have included a date stamp in the file names - because the Qname1 (user input) can be similar for different days I need the date to differentiate between files.
Cheers stackoverflow.

Permission denied when pandas dataframe to tempfile csv

I'm trying to store a pandas dataframe to a tempfile in csv format (in windows), but am being hit by:
[Errno 13] Permission denied: 'C:\Users\Username\AppData\Local\Temp\tmpweymbkye'
import tempfile
import pandas
with tempfile.NamedTemporaryFile() as temp:
df.to_csv(temp.name)
Where df is the dataframe. I've also tried changing the temp directory to one I am sure I have write permissions:
tempfile.tempdir='D:/Username/Temp/'
This gives me the same error message
Edit:
The tempfile appears to be locked for editing as when I change the loop to:
with tempfile.NamedTemporaryFile() as temp:
df.to_csv(temp.name + '.csv')
I can write the file in the temp directory, but then it is not automatically deleted at the end of the loop, as it is no longer a temp file.
However, if I change the code to:
with tempfile.NamedTemporaryFile(suffix='.csv') as temp:
training_data.to_csv(temp.name)
I get the same error message as before. The file is not open anywhere else.
I encountered the same error message and the issue was resolved after adding "/df.csv" to file_path.
df.to_csv('C:/Users/../df.csv', index = False)
Check your permissions and, according to this post, you can run your program as an administrator by right click and run as administrator.
We can use the to_csv command to do export a DataFrame in CSV format. Note that the code below will by default save the data into the current working directory. We can save it to a different folder by adding the foldername and a slash to the file
verticalStack.to_csv('foldername/out.csv').
Check out your working directory to make sure the CSV wrote out properly, and that you can open it! If you want, try to bring it back into python to make sure it imports properly.
newOutput = pd.read_csv('out.csv', keep_default_na=False, na_values=[""])
ref
Unlike TemporaryFile(), the user of mkstemp() is responsible for deleting the temporary file when done with it.
With the use of this function may introduce a security hole in your program. By the time you get around to doing anything with the file name it returns, someone else may have beaten you to the punch. mktemp() usage can be replaced easily with NamedTemporaryFile(), passing it the delete=False paramete.
Read more.
After export to CSV you can close your file with temp.close().
with tempfile.NamedTemporaryFile(delete=False) as temp:
df.to_csv(temp.name + '.csv')
temp.close()
Sometimes,you need check the file path that if you have right permission to read and write file. Especially when you use relative path.
xxx.to_csv('%s/file.csv'%(file_path), index = False)
Sometimes, it gives that error simply because there is another file with the same name and it has no permission to delete the earlier file and replace it with the new file.
So either name the file differently while saving it,
or
If you are working on Jupyter Notebook or a other similar environment, delete the file after executing the cell that reads it into memory. So that when you execute the cell which writes it to the machine, there is no other file that exists with that name.
I encountered the same error. I simply had not yet saved my entire python file. Once I saved my python file in VS code as "insertyourfilenamehere".py to documents(which is in my path), I ran my code again and I was able to save my data frame as a csv file.
As per my knowledge, this error pops up when one attempt to save the file that have been saved already and currently open in the background.
You may try closing those files first and then rerun the code.
Just give a valid path and a file name
e.g:
final_df.to_csv('D:\Study\Data Science\data sets\MNIST\sample.csv')

Error: Unsupported format, or corrupt file: Expected BOF record

I am trying to open a xlsx file and just print the contents of it. I keep running into this error:
import xlrd
book = xlrd.open_workbook("file.xlsx")
print "The number of worksheets is", book.nsheets
print "Worksheet name(s):", book.sheet_names()
print
sh = book.sheet_by_index(0)
print sh.name, sh.nrows, sh.ncols
print
print "Cell D30 is", sh.cell_value(rowx=29, colx=3)
print
for rx in range(5):
print sh.row(rx)
print
It prints out this error
raise XLRDError('Unsupported format, or corrupt file: ' + msg)
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found '\xff\xfeT\x00i\x00m\x00'
Thanks
If you use read_excel() to read a .csv you will get the error
XLRDError: Unsupported format, or corrupt file: Expected BOF record;
To read .csv one needs to use read_csv(), like this
df1= pd.read_csv("filename.csv")
There is also a third reason. The case when the file is already open by Excel.
It generates the same error.
The error message relates to the BOF (Beginning of File) record of an XLS file. However, the example shows that you are trying to read an XLSX file.
There are 2 possible reasons for this:
Your version of xlrd is old and doesn't support reading xlsx files.
The XLSX file is encrypted and thus stored in the OLE Compound Document format, rather than a zip format, making it appear to xlrd as an older format XLS file.
Double check that you are in fact using a recent version of xlrd. Opening a new XLSX file with data in just one cell should verify that.
However, I would guess the you are encountering the second condition and that the file is encrypted since you state above that you are already using xlrd version 0.9.2.
XLSX files are encrypted if you explicitly apply a workbook password but also if you password protect some of the worksheet elements. As such it is possible to have an encrypted XLSX file even if you don't need a password to open it.
Update: See #BStew's, third, more probable, answer, that the file is open by Excel.
You can get this error when the xlsx file is actually html; you can open it with a text editor to verify this. When I got this error I solved it using pandas:
import pandas as pd
df_list = pd.read_html('filename.xlsx')
df = pd.DataFrame(df_list[0])
to anyone who is reading this post today, the following solution actually helped me.
https://stackoverflow.com/a/46214958/9642876
The XLSX file that I was trying to read was created by a reporting software and it couldn't be read either by pandas or xlrd, but could open it in Microsoft Excel. I re-saved the file under a different name and now it both xlrd and pandas can read the file.
It may also work if you just re-save with the same name, although I haven't tested this.
In my case, someone gave me an Excel file ending with extension ".xls". I tried parsing it with xlrd, and got this error:
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found "blar blar blar"
After working some time, I found that .xls file actually is a text file. The sender didn't bother to create a real Excel binary file but just put ".xls" to a text file.
Maybe it's worth opening the file with text editor to make sure it is an Excel file. This could have saved me one hour.
In my case, the issue was with the shared folder itself.
CASE IN POINT: I have a shared folder on WIN2012 Server where the user drops the .xlsx file and then uses my python script to load that xlsx file into a database table.
Even though, the user deleted the old file and put in the file that was to be loaded, the BOF error kept mentioning a byte string and the name of the user in the byte string -- no where inside of the xlsx file in any worksheet was there the name of the user. On top of it, when I copied the .xlsx into a newly created folder and ran the script referencing that new folder, it worked.
So in the end, I deleted the shared folder and realized that 5 items got deleted even though only 1 item was visible to me and the user. I think it is down to my lack of windows administration skills but that was the culprit.
I got the same error message. It looks so weird to me because the script works for the xlsx files under another folder and the files are almost the same.
I still don't know why this happened. But finally, I copied all the excel files to another folder and the script worked. An option to try if none of the above suggestions works for you...
This also happens when the file used by script is also open in the background.

Categories

Resources