Im using python to open an existing excel file and do some formatting and save and close the file. My code is working good when the file size is small but when excel size is big (apprx. 40MB) I'm getting Serialization I/O error and Im sure it due to memory problem or due to my code. Kindly help.
System Config:
RAM - 8 GB
32 - bit operation
Windows 7
Code:
import numpy as np
from openpyxl import load_workbook
from openpyxl.styles import colors, Font
dest_loc='/Users/abdulr06/Documents/Python Scripts/'
np.seterr(divide='ignore', invalid='ignore')
SRC='TSYS'
YM1='201707'
dest_file=dest_loc+SRC+'_'+''+YM1+'.xlsx'
sheetname = [SRC+''+' GL-Recon']
#Following code is common for rest of the sourc systems
wb=load_workbook(dest_file)
fmtB=Font(color=colors.BLUE)
fmtR=Font(color=colors.RED)
for i in range(len(sheetname)):
sheet1=wb.get_sheet_by_name(sheetname[i])
print(sheetname[i])
last_record=sheet1.max_row+1
for m in range(2,last_record):
if -30 <= sheet1.cell(row=m,column=5).value <=30:
ft=sheet1.cell(row=m,column=5)
ft.font=fmtB
ft.number_format = '_(* #,##0.00_);_(* (#,##0.00);_(* "-"??_);_(#_)'
ft1=sheet1.cell(row=m,column=6)
ft1.number_format = '0.00%'
else:
ft=sheet1.cell(row=m,column=5)
ft.font=fmtR
ft.number_format = '_(* #,##0.00_);_(* (#,##0.00);_(* "-"??_);_(#_)'
ft1=sheet1.cell(row=m,column=6)
ft1.number_format = '0.00%'
wb.save(filename=dest_file)
Exception:
Traceback (most recent call last):
File "<ipython-input-17-fc16d9a46046>", line 6, in <module>
wb.save(filename=dest_file)
File "C:\Users\abdulr06\AppData\Local\Continuum\Anaconda3\lib\site-packages\openpyxl\workbook\workbook.py", line 263, in save
save_workbook(self, filename)
File "C:\Users\abdulr06\AppData\Local\Continuum\Anaconda3\lib\site-packages\openpyxl\writer\excel.py", line 239, in save_workbook
writer.save(filename, as_template=as_template)
File "C:\Users\abdulr06\AppData\Local\Continuum\Anaconda3\lib\site-packages\openpyxl\writer\excel.py", line 222, in save
self.write_data(archive, as_template=as_template)
File "C:\Users\abdulr06\AppData\Local\Continuum\Anaconda3\lib\site-packages\openpyxl\writer\excel.py", line 80, in write_data
self._write_worksheets(archive)
File "C:\Users\abdulr06\AppData\Local\Continuum\Anaconda3\lib\site-packages\openpyxl\writer\excel.py", line 163, in _write_worksheets
xml = sheet._write(self.workbook.shared_strings)
File "C:\Users\abdulr06\AppData\Local\Continuum\Anaconda3\lib\site-packages\openpyxl\worksheet\worksheet.py", line 776, in _write
return write_worksheet(self, shared_strings)
File "C:\Users\abdulr06\AppData\Local\Continuum\Anaconda3\lib\site-packages\openpyxl\writer\worksheet.py", line 263, in write_worksheet
xf.write(worksheet.page_breaks.to_tree())
File "serializer.pxi", line 1016, in lxml.etree._FileWriterElement.__exit__ (src\lxml\lxml.etree.c:141944)
File "serializer.pxi", line 904, in lxml.etree._IncrementalFileWriter._write_end_element (src\lxml\lxml.etree.c:140137)
File "serializer.pxi", line 999, in lxml.etree._IncrementalFileWriter._handle_error (src\lxml\lxml.etree.c:141630)
File "serializer.pxi", line 195, in lxml.etree._raiseSerialisationError (src\lxml\lxml.etree.c:131006)
SerialisationError: IO_WRITE
Why do you allocate font at each loop?
fmt=Font(color=colors.BLUE)
Or red, create two fonts red and blue, once and then use it, each time you are allocating Font, you are using more memory.
Optimise your code at first. Less code -> less errors, for example:
mycell = sheet1.cell(row=m,column=5)
if -30 <= mycell.value <=30:
mycell.font = redfont
This should ensure that you do not have the issue again (hopefully)
Related
I am making a program that reads all my s3 bucket. As I have a lot of them, I want to run it on an EC2 instance. My program works fine on pycharm, but as soon as I try to run it on my ubuntu instance I get this error: :
File "/home/ubuntu/DataRecap/main.py", line 72, in <module>
create_table()
File "/home/ubuntu/DataRecap/main.py", line 43, in create_table
small_column = get_column()
File "/home/ubuntu/DataRecap/main.py", line 32, in get_column
df = pd.read_parquet(buffer)
File "/home/ubuntu/.local/lib/python3.10/site-packages/pandas/io/parquet.py", line 493, in read_parquet
return impl.read(
File "/home/ubuntu/.local/lib/python3.10/site-packages/pandas/io/parquet.py", line 347, in read
result = parquet_file.to_pandas(columns=columns, **kwargs)
File "/home/ubuntu/.local/lib/python3.10/site-packages/fastparquet/api.py", line 751, in to_pandas
self.read_row_group_file(rg, columns, categories, index,
File "/home/ubuntu/.local/lib/python3.10/site-packages/fastparquet/api.py", line 361, in read_row_group_file
core.read_row_group(
File "/home/ubuntu/.local/lib/python3.10/site-packages/fastparquet/core.py", line 608, in read_row_group
read_row_group_arrays(file, rg, columns, categories, schema_helper,
File "/home/ubuntu/.local/lib/python3.10/site-packages/fastparquet/core.py", line 580, in read_row_group_arrays
read_col(column, schema_helper, file, use_cat=name+'-catdef' in out,
File "/home/ubuntu/.local/lib/python3.10/site-packages/fastparquet/core.py", line 466, in read_col
dic2 = convert(dic2, se)
File "/home/ubuntu/.local/lib/python3.10/site-packages/fastparquet/converted_types.py", line 249, in convert
parquet_thrift.ConvertedType._VALUES_TO_NAMES[ctype]) # pylint:disable=protected-access
KeyError: 24
I have no idea why it does not work. Here is my code :
buffer = io.BytesIO()
object = s3.Object(bucket, parquet_name)
object.download_fileobj(buffer)
df1 = pd.read_parquet(buffer)
Any idea?
Thanks you very much in advance
I`m trying to download and then open excel file (report) generated by marketplace with openpyxl.
import requests
import config
import openpyxl
link = 'https://api.telegram.org/file/bot' + config.TOKEN + '/documents/file_66.xlsx'
def save_open(link):
filename = link.split('/')[-1]
r = requests.get(link)
with open(filename, 'wb') as new_file:
new_file.write(r.content)
wb = openpyxl.open ('file_66.xlsx')
ws = wb.active
cell = ws['B2'].value
print (cell)
save_open(link)
After running this code I got the above:
Traceback (most recent call last):
File "C:\Python 3.9\lib\site-packages\openpyxl\descriptors\base.py", line 55, in _convert
value = expected_type(value)
TypeError: Fill() takes no arguments
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\Home\Documents\myPython\bot_WB\main.py", line 20, in <module>
save_open(link)
File "C:\Users\Home\Documents\myPython\bot_WB\main.py", line 14, in save_open
wb = openpyxl.open ('file_66.xlsx')
File "C:\Python 3.9\lib\site-packages\openpyxl\reader\excel.py", line 317, in load_workbook
reader.read()
File "C:\Python 3.9\lib\site-packages\openpyxl\reader\excel.py", line 281, in read
apply_stylesheet(self.archive, self.wb)
File "C:\Python 3.9\lib\site-packages\openpyxl\styles\stylesheet.py", line 198, in apply_stylesheet
stylesheet = Stylesheet.from_tree(node)
File "C:\Python 3.9\lib\site-packages\openpyxl\styles\stylesheet.py", line 103, in from_tree
return super(Stylesheet, cls).from_tree(node)
File "C:\Python 3.9\lib\site-packages\openpyxl\descriptors\serialisable.py", line 103, in from_tree
return cls(**attrib)
File "C:\Python 3.9\lib\site-packages\openpyxl\styles\stylesheet.py", line 74, in __init__
self.fills = fills
File "C:\Python 3.9\lib\site-packages\openpyxl\descriptors\sequence.py", line 26, in __set__
seq = [_convert(self.expected_type, value) for value in seq]
File "C:\Python 3.9\lib\site-packages\openpyxl\descriptors\sequence.py", line 26, in <listcomp>
seq = [_convert(self.expected_type, value) for value in seq]
File "C:\Python 3.9\lib\site-packages\openpyxl\descriptors\base.py", line 57, in _convert
raise TypeError('expected ' + str(expected_type))
TypeError: expected <class 'openpyxl.styles.fills.Fill'>
[Finished in 1.6s]
If you run file properties/details you can see that this file was generated by "Go Exelize" (author: xuri). To run this file you need to separate code in two parts. First: download file. Then you need to manually open it with MS Excel, save file and close it (after this "Go Excelize" switch to "Microsoft Excel"). And only after that you can run the second part of the code correctly with no errors. Can anyone help me to handle this problem?
I had the same problem, "TypeError('expected ' + str(expected_type))", using pandas.read_excel, which uses openpyxl. If I open the file, save and close it, it will work with both, pandas and openpyxl.
Upon further attempts I could open the file using the "read_only=True" in openpyxl, but while iterating over the rows I would still get the error, but only when all the rows ended, in the end of the file.
I belive it could be something in the EOF (end of file) and openpyxl don't have ways of treating it.
Here is the code that I used to test and worked for me:
import openpyxl
wb = openpyxl.load_workbook(my_file_name, read_only=True)
ws = wb.worksheets[0]
lis = []
try:
for row in ws.iter_rows():
lis.append([cell.value for cell in row])
except TypeError:
print('Skip error in EOF')
Used openpyxl==3.0.10
I am trying to write a script that will open/look through every excel workbook inside a folder, pull specific values from each of those workbooks, and then paste those values into a new csv.
My script (see below), and all 49 workbooks are located in the following path: C:\Users\user.name\Desktop\Excel Test.
import pandas
import os
info_headers = ['Production Name', 'Data size (GBs)', 'Billable Data size (GBs)']
info = []
files = [file for file in os.listdir('C:\\Users\\user.name\\Desktop\\Excel Test')]
for file in files:
df = pandas.read_excel(file)
size = df['Unnamed: 2'].loc[df['QC Checklist'] == 'Data Size (GB):'].values[0]
name = df['Unnamed: 2'].loc[df['QC Checklist'] == 'Production Volume Name'].values[0]
bill_size = df['Unnamed: 2'].loc[df['QC Checklist'] == 'Billable Data Size (GB):'].values[0]
info.append([name, size, bill_size])
output = pandas.DataFrame(info, columns=info_headers)
output.to_csv('C:\\Users\\user.name\\Excel Test') # This will output a csv in your current directory
I receive the following error when trying to run this:
Traceback (most recent call last):
File "C:/Users/user.name/Desktop/Excel Test/exceltest.py", line 9, in <module>
df = pandas.read_excel(file)
File "C:\Users\user.name\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\util\_decorators.py", line 188, in wrapper
return func(*args, **kwargs)
File "C:\Users\user.name\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\util\_decorators.py", line 188, in wrapper
return func(*args, **kwargs)
File "C:\Users\user.name\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel.py", line 350, in read_excel
io = ExcelFile(io, engine=engine)
File "C:\Users\user.name\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel.py", line 653, in __init__
self._reader = self._engines[engine](self._io)
File "C:\Users\user.name\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\excel.py", line 424, in __init__
self.book = xlrd.open_workbook(filepath_or_buffer)
File "C:\Users\user.name\AppData\Local\Programs\Python\Python37\lib\site-packages\xlrd\__init__.py", line 157, in open_workbook
ragged_rows=ragged_rows,
File "C:\Users\user.name\AppData\Local\Programs\Python\Python37\lib\site-packages\xlrd\book.py", line 92, in open_workbook_xls
biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
File "C:\Users\user.name\AppData\Local\Programs\Python\Python37\lib\site-packages\xlrd\book.py", line 1278, in getbof
bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])
File "C:\Users\user.name\AppData\Local\Programs\Python\Python37\lib\site-packages\xlrd\book.py", line 1272, in bof_error
raise XLRDError('Unsupported format, or corrupt file: ' + msg)
xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'import p'
There could be a couple of problems. It may be trying to open non-excel files in the directory. After your os.listdir() call, try to filter for excel files only.
Or your excel file may be formatted incorrectly.
We have a model that uses Dataportal of Pyomo to read parameter from several csv files. On a Windows laptop we are running into the following error while this is not replicable on another computer. Any ideas what might be missing in this setting?
Traceback (most recent call last):
File "", line 1, in
runfile('C:/Users/stianbac/OneDrive - NTNU/EMPIRE/EMPIRE in Pyomo/EMPIRE_Pyomo_version_4/Empire_draft4.py',
wdir='C:/Users/stianbac/OneDrive - NTNU/EMPIRE/EMPIRE in
Pyomo/EMPIRE_Pyomo_version_4')
File
"C:\Users\stianbac\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py",
line 710, in runfile
execfile(filename, namespace)
File
"C:\Users\stianbac\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py",
line 101, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/stianbac/OneDrive - NTNU/EMPIRE/EMPIRE in
Pyomo/EMPIRE_Pyomo_version_4/Empire_draft4.py", line 107, in
instance = model.create_instance(data)
File
"C:\Users\stianbac\AppData\Local\Continuum\anaconda3\lib\site-packages\pyomo\core\base\DataPortal.py",
line 138, in load
self.connect(**kwds)
File
"C:\Users\stianbac\AppData\Local\Continuum\anaconda3\lib\site-packages\pyomo\core\base\DataPortal.py",
line 98, in connect
self._data_manager.open()
File
"C:\Users\stianbac\AppData\Local\Continuum\anaconda3\lib\site-packages\pyomo\core\plugins\data\sheet.py",
line 54, in open
self.sheet = ExcelSpreadsheet(self.filename, ctype=self.ctype)
File
"C:\Users\stianbac\AppData\Local\Continuum\anaconda3\lib\site-packages\pyutilib\excel\spreadsheet.py",
line 79, in new
return ExcelSpreadsheet_win32com(*args, **kwds)
File
"C:\Users\stianbac\AppData\Local\Continuum\anaconda3\lib\site-packages\pyutilib\excel\spreadsheet_win32com.py",
line 59, in init
self.open(filename, worksheets, default_worksheet)
File
"C:\Users\stianbac\AppData\Local\Continuum\anaconda3\lib\site-packages\pyutilib\excel\spreadsheet_win32com.py",
line 80, in open
self._ws[wsid] = self.wb.Worksheets.Item(wsid)
File
"C:\Users\stianbac\AppData\Local\Continuum\anaconda3\lib\site-packages\win32com\client\dynamic.py",
line 516, in getattr
ret = self.oleobj.Invoke(retEntry.dispid,0,invoke_type,1)
com_error: (-2147418111, 'Call was rejected by callee.', None, None)
Here is the entry of the code:
from __future__ import division
from pyomo.environ import *
#from pyomo.core.expr import current as EXPR
#import numpy as np
import math
import csv
model = AbstractModel()
model.Nodes = Set()
model.Generators = Set() #g
...
data = DataPortal()
data.load(filename='Sets.xlsx',range='B1:B53',using='xlsx',format="set", set=model.Generators)
data.load(filename='Sets.xlsx',range='nodes',using='xlsx',format="set", set=model.Nodes)
...
instance = model.create_instance(data)
...
I am creating and filling a PyTables Carray the following way:
#a,b = scipy.sparse.csr_matrix
f = tb.open_file('../data/pickle/dot2.h5', 'w')
filters = tb.Filters(complevel=1, complib='blosc')
out = f.create_carray(f.root, 'out', tb.Atom.from_dtype(a.dtype),
shape=(l, n), filters=filters)
bl = 2048
l = a.shape[0]
for i in range(0, l, bl):
out[:,i:min(i+bl, l)] = (a.dot(b[:,i:min(i+bl, l)])).toarray()
The script was running fine for nearly two days (I estimated that it would need at least 4 days more).
However, suddenly I received this error stack trace:
File "prepare_data.py", line 168, in _tables_dot
out[:,i:min(i+bl, l)] = (a.dot(b[:,i:min(i+bl, l)])).toarray()
File "/home/psinger/venv/local/lib/python2.7/site-packages/tables/array.py", line 719, in __setitem__
self._write_slice(startl, stopl, stepl, shape, nparr)
File "/home/psinger/venv/local/lib/python2.7/site-packages/tables/array.py", line 809, in _write_slice
self._g_write_slice(startl, stepl, countl, nparr)
File "hdf5extension.pyx", line 1678, in tables.hdf5extension.Array._g_write_slice (tables/hdf5extension.c:16287)
tables.exceptions.HDF5ExtError: HDF5 error back trace
File "../../../src/H5Dio.c", line 266, in H5Dwrite
can't write data
File "../../../src/H5Dio.c", line 671, in H5D_write
can't write data
File "../../../src/H5Dchunk.c", line 1840, in H5D_chunk_write
error looking up chunk address
File "../../../src/H5Dchunk.c", line 2299, in H5D_chunk_lookup
can't query chunk address
File "../../../src/H5Dbtree.c", line 998, in H5D_btree_idx_get_addr
can't get chunk info
File "../../../src/H5B.c", line 362, in H5B_find
can't lookup key in subtree
File "../../../src/H5B.c", line 340, in H5B_find
unable to load B-tree node
File "../../../src/H5AC.c", line 1322, in H5AC_protect
H5C_protect() failed.
File "../../../src/H5C.c", line 3567, in H5C_protect
can't load entry
File "../../../src/H5C.c", line 7957, in H5C_load_entry
unable to load entry
File "../../../src/H5Bcache.c", line 143, in H5B_load
wrong B-tree signature
End of HDF5 error back trace
Internal error modifying the elements (H5ARRAYwrite_records returned errorcode -6)
I am really clueless what the problem is as it was running fine for about a quarter of the dataset. Disk space is available.