Writting a program, that shuffels contents in files. All files are almost the same, but it doesn't work for some of them. Can't understand.
for file in allFiles:
print(file)
items = []
fileName = file
fileIndex = 1
directory = os.path.join(path, fileName[:-5].strip())
if not os.path.exists(directory):
os.mkdir(directory)
theFile = openpyxl.load_workbook(file)
allSheetNames = theFile.sheetnames
And after some quantity of files, it shows me these errors:
Traceback (most recent call last):
File "D:\staff\Python\NewProject\glow.py", line 25, in <module>
theFile = openpyxl.load_workbook(file)
File "C:\Users\User\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\reader\excel.py", line 313, in load_workbook
reader = ExcelReader(filename, read_only, keep_vba,
File "C:\Users\User\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\reader\excel.py", line 124, in __init__
self.archive = _validate_archive(fn)
File "C:\Users\User\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\reader\excel.py", line 96, in _validate_archive
archive = ZipFile(filename, 'r')
File "C:\Users\User\AppData\Local\Programs\Python\Python38-32\lib\zipfile.py", line 1269, in __init__
self._RealGetContents()
File "C:\Users\User\AppData\Local\Programs\Python\Python38-32\lib\zipfile.py", line 1336, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
But before that everything worked fine, there was no error. Can someone guess, why? Thanks, everybody.
Looking for files that way:
path = os.getcwd()
sourcePath = os.getcwd() + '\source'
extension = 'xlsx'
os.chdir(sourcePath)
allFiles = glob.glob('*.{}'.format(extension))
You iterate over all files not regarding the filetype. Probably you or a process added a file to the directory which is no xlsx file. This is why openpyxl fails to read it.
Related
I have a couple of excel files I want to merge into one.
I need the second column on all the files to be copied into separate columns in a new Microsoft Excel file.
For this, I am using the openpyxl library in a python script.
This is my code:
import os
from openpyxl import load_workbook
def mergeDataFiles():
path = "C:\\Users\\ethan\\Desktop\\Benzoyl Chloride\\Benzoyl Chloride"
# source excel files
origin_files = list()
for path, subdirs, files in os.walk(path):
for file_index in range(len(files)):
origin_files.append(files[file_index])
# destination excel file
destination_file = path + ".xlsx"
destination_workbook = load_workbook(destination_file)
destination_sheet = destination_workbook["Sheet1"]
# copy data from source files to destination file
for origin_file_index in range(1, len(origin_files)):
origin_workbook = load_workbook(path + "\\" + origin_files[origin_file_index - 1])
origin_sheet = origin_workbook['Data']
destination_sheet.cell(row=1, column=origin_file_index).value = origin_files[origin_file_index - 1]
for i in range(1, 500):
# read cell value from source excel file
data = origin_sheet.cell(row=i, column=2)
# write the value to destination excel file
destination_sheet.cell(row=i + 1, column=origin_file_index).value = data.value
# saving the destination excel file
destination_workbook.save(destination_file)
if __name__ == "__main__":
mergeDataFiles()
When I run the code, I get an error on the last line in the function: OSError: [Errno 9] Bad file descriptor.
Full traceback:
C:\Users\ethan\.venv\Scripts\python.exe "C:/Users/ethan/Coding/Python/Copy Excel Data/main.py"
Traceback (most recent call last):
File "C:\Users\ethan\Coding\Python\Copy Excel Data\main.py", line 32, in <module>
mergeDataFiles()
File "C:\Users\ethan\Coding\Python\Copy Excel Data\main.py", line 28, in mergeDataFiles
destination_workbook.save(destination_file)
File "C:\Users\ethan\.venv\Lib\site-packages\openpyxl\workbook\workbook.py", line 407, in save
save_workbook(self, filename)
File "C:\Users\ethan\.venv\Lib\site-packages\openpyxl\writer\excel.py", line 293, in save_workbook
writer.save()
File "C:\Users\ethan\.venv\Lib\site-packages\openpyxl\writer\excel.py", line 275, in save
self.write_data()
File "C:\Users\ethan\.venv\Lib\site-packages\openpyxl\writer\excel.py", line 67, in write_data
archive.writestr(ARC_APP, tostring(props.to_tree()))
File "C:\Program Files\Python311\Lib\zipfile.py", line 1830, in writestr
with self.open(zinfo, mode='w') as dest:
File "C:\Program Files\Python311\Lib\zipfile.py", line 1204, in close
self._fileobj.seek(self._zinfo.header_offset)
OSError: [Errno 9] Bad file descriptor
Exception ignored in: <function ZipFile.__del__ at 0x000001D101443D80>
Traceback (most recent call last):
File "C:\Program Files\Python311\Lib\zipfile.py", line 1870, in __del__
self.close()
File "C:\Program Files\Python311\Lib\zipfile.py", line 1892, in close
self._fpclose(fp)
File "C:\Program Files\Python311\Lib\zipfile.py", line 1992, in _fpclose
fp.close()
OSError: [Errno 9] Bad file descriptor
Process finished with exit code 1
I have tried changing the file names and locations, having the destination file open and closed, scouring the internet for solutions and at this point I'm not sure what else I can try.
I am running the code on Windows 10 22H2, with an intel i5 cpu.
Please assist me with this issue, if you know how to solve it.
I have this problem, I'm trying to run the script to download Springers free books [https://towardsdatascience.com/springer-has-released-65-machine-learning-and-data-books-for-free-961f8181f189], but many things start to go wrong.
I solved some of the problems but now I'm stuck.
C:\Windows\system32>python C:\Users\loren\Desktop\springer_free_books-master\main.py
Traceback (most recent call last):
File "C:\Users\loren\Desktop\springer_free_books-master\main.py", line 42, in <module>
books.to_excel(table_path)
File "C:\Users\loren\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\core\generic.py", line 2175, in to_excel
formatter.write(
File "C:\Users\loren\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\formats\excel.py", line 738, in write
writer.save()
File "C:\Users\loren\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\excel\_openpyxl.py", line 43, in save
return self.book.save(self.path)
File "C:\Users\loren\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\workbook\workbook.py", line 392, in save
save_workbook(self, filename)
File "C:\Users\loren\AppData\Local\Programs\Python\Python38-32\lib\site-packages\openpyxl\writer\excel.py", line 291, in save_workbook
archive = ZipFile(filename, 'w', ZIP_DEFLATED, allowZip64=True)
File "C:\Users\loren\AppData\Local\Programs\Python\Python38-32\lib\zipfile.py", line 1251, in __init__
self.fp = io.open(file, filemode)
FileNotFoundError: [Errno 2] No such file or directory: 'downloads\\table_v4.xlsx'
This is part of the code, were table_path is introduced.
table_url = 'https://resource-cms.springernature.com/springer-cms/rest/v1/content/17858272/data/v4'
table = 'table_' + table_url.split('/')[-1] + '.xlsx'
table_path = os.path.join(folder, table)
if not os.path.exists(table_path):
books = pd.read_excel(table_url)
# Save table
books.to_excel(table_path)
else:
books = pd.read_excel(table_path, index_col=0, header=0)
Try to create the destination directory before calling .to_excel() to ensure a valid writable directory exists. Make sure the os module is imported:
import os # add to your imports
and replace
books.to_excel(table_path)
with
os.makedirs(folder, exist_ok=True)
books.to_excel(table_path)
I am getting error on opening xlsx extension file in windows 8 using tablib library.
python version - 2.7.14
error is as follows:
python suit_simple_sheet_product.py
Traceback (most recent call last):
File "suit_simple_sheet_product.py", line 19, in <module>
data = tablib.Dataset().load(open(BASE_PATH).read())
File "C:\Python27\lib\site-packages\tablib\core.py", line 446, in load
format = detect_format(in_stream)
File "C:\Python27\lib\site-packages\tablib\core.py", line 1157, in detect_format
if fmt.detect(stream):
File "C:\Python27\lib\site-packages\tablib\formats\_xls.py", line 25, in detect
xlrd.open_workbook(file_contents=stream)
File "C:\Python27\lib\site-packages\xlrd\__init__.py", line 120, in open_workbook
zf = zipfile.ZipFile(timemachine.BYTES_IO(file_contents))
File "C:\Python27\lib\zipfile.py", line 770, in __init__
self._RealGetContents()
File "C:\Python27\lib\zipfile.py", line 811, in _RealGetContents
raise BadZipfile, "File is not a zip file"
zipfile.BadZipfile: File is not a zip file
path location is as follows =
BASE_PATH = 'C:\Users\anju\Downloads\automate\catalog-5090 fabric detail and price list.xlsx'
Excel .xlsx files are actually zip files. In order for the unzip to work correctly, the file must be opened in binary mode, as such your need to open the file using:
import tablib
BASE_PATH = r'c:\my folder\my_test.xlsx'
data = tablib.Dataset().load(open(BASE_PATH, 'rb').read())
print data
Add r before your string to stop Python from trying to interpret the backslash characters in your path.
I've written a program that iterates over all CSV files in a directory and creates a new CSV file based on their contents.
I've written a function ('summary()') that performs these tasks and is called by the following code
cwd = os.getcwd()
csv_list = []
for root, dirs, filenames in os.walk(cwd):
for f in filenames:
if f.endswith('.csv'):
csv_list.append(f)
#for root, dirs, filenames in os.walk(cwd):
summary(csv_list)
Once the file has been loaded into the function, its added to a pandas DF by the following code
df = pd.concat((pd.read_csv(f, parse_dates=True, sep=';') for f in files))
The function creates a output csvfile called 'combined_csv'.
I delete this file between each run (as I currently testing the program).
However I keep running into the following peculiar bug.
FileNotFoundError: File 'combined.csv' does not exist
Even though I deleted the file, the program still parses over it - (where it crashes when it tries to load). Why though? I restart the program after deleting the file, the file should not appear in the 'csv_list' variable at all.
Is the information cached somehow?
I've added the full traceback below.
Traceback (most recent call last):
File "summary.py", line 112, in <module>
summary(csv_list)
File "summary.py", line 17, in summary
df = pd.concat((pd.read_csv(f, parse_dates=True, sep=';') for f in files))
File "/usr/local/lib/python3.5/dist-packages/pandas/core/reshape/concat.py", line 206, in concat
copy=copy)
File "/usr/local/lib/python3.5/dist-packages/pandas/core/reshape/concat.py", line 236, in __init__
objs = list(objs)
File "summary.py", line 17, in <genexpr>
df = pd.concat((pd.read_csv(f, parse_dates=True, sep=';') for f in files))
File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line 655, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line 405, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line 764, in __init__
self._make_engine(self.engine)
File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line 985, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/usr/local/lib/python3.5/dist-packages/pandas/io/parsers.py", line 1605, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 394, in pandas._libs.parsers.TextReader.__cinit__ (pandas/_libs/parsers.c:4209)
File "pandas/_libs/parsers.pyx", line 710, in pandas._libs.parsers.TextReader._setup_parser_source (pandas/_libs/parsers.c:8873)
FileNotFoundError: File b'combined.csv' does not exist
Edit I've simplified the program (the code wasnt relevant to the problem, and changed the code to this. This is all of the code that is run.
I am executing the program from a terminal (Ubuntu 16.04), located in the directory.
$ pwd
returns
/home/jasper/PycharmProjects/AHP_Scanner/PVM/true_run/testsum
$ ls -a /home/jasper/PycharmProjects/AHP_Scanner/PVM/true_run/testsum
returns:
. fixed_10.csv fixed_13.csv fixed_16.csv fixed_19.csv fixed_21.csv fixed_4.csv fixed_7.csv goed
.. fixed_11.csv fixed_14.csv fixed_17.csv fixed_1.csv fixed_2.csv fixed_5.csv fixed_8.csv summary.py
fixed_0.csv fixed_12.csv fixed_15.csv fixed_18.csv fixed_20.csv fixed_3.csv fixed_6.csv fixed_9.csv
As we can see, the file 'combined_csv' does not exist
Yet when I run the following code: (this is all of the code that is run, the rest of summary.py has been commented out)
cwd = os.getcwd()
csv_list = []
for root, dirs, filenames in os.walk(cwd):
for f in filenames:
if f.endswith('.csv'):
print(f)
I get this response:
fixed_8.csv
fixed_10.csv
fixed_4.csv
fixed_11.csv
fixed_9.csv
fixed_7.csv
fixed_0.csv
fixed_12.csv
fixed_2.csv
fixed_5.csv
fixed_20.csv
fixed_18.csv
fixed_14.csv
fixed_6.csv
fixed_15.csv
fixed_3.csv
fixed_1.csv
fixed_17.csv
fixed_13.csv
fixed_19.csv
fixed_16.csv
fixed_21.csv
combined.csv
I am at a loss why this file keeps appearing.
I know there are similar questions but neither of them provided a solution for my problem. I am using the following code:
import os, glob
import zipfile
root = 'E:\\xx\\fashion\\*'
directory = 'E:\\xx\\fashion\\'
extension = ".zip"
date_file_list = []
for folder in glob.glob(root):
if folder.endswith(extension): # check for ".zip" extension
print(folder)
zipfile.ZipFile(os.path.join(directory, folder)).extractall(os.path.join(directory, os.path.splitext(folder)[0]))
os.remove(folder) # delete zipped file_name
And I get the following error:
Traceback (most recent call last):
File "C:/Users/xx/unzip.py", line 12, in <module>
zipfile.ZipFile(os.path.join(directory, folder)).extractall(os.path.join(directory, os.path.splitext(folder)[0]))
File "C:\Users\xx\AppData\Local\Programs\Python\Python35\lib\zipfile.py", line 1026, in __init__
self._RealGetContents()
File "C:\Users\xx\AppData\Local\Programs\Python\Python35\lib\zipfile.py", line 1094, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
Some of the files are compressed in winzip some of them are in 7zip. But there are too many files to unzip.
Anybody know why this error is occurring?