zipfile.LargeZipFile: Filesize would require ZIP64 extensions - python

I am creating an Excel file and writing some rows to it. Here is what I have written:
import string
import xlsxwriter
workbook = xlsxwriter.Workbook('DataSet.xlsx')
worksheet = workbook.add_worksheet()
df2 = pd.read_csv ('d.csv', low_memory=False)
from nltk.tokenize import word_tokenize
count = 0
for index, row in df2.iterrows():
if row['source_id'] == 'EN':
count += 1
print(count)
text = row['text']
new_string = text.translate(str.maketrans('', '', string.punctuation))
new_string = word_tokenize(new_string)
sentence = ''
tokens = ''
for word in new_string:
sample_len = len(new_string)
count_len = 0
sentence += word
sentence += ' '
tokens += word
if count_len != sample_len:
tokens += ', '
worksheet.write(count, 3, tokens)
worksheet.write(count, 2, sentence)
worksheet.write(count, 1, 'Discrimination')
worksheet.write(count, 0, count)
workbook.close()
However, after the row number 94165, it gives me the following error and won't proceed anymore:
Traceback (most recent call last):
File "/Users/PycharmProjects/pythonProject/venv/lib/python3.9/site-packages/xlsxwriter/workbook.py", line 323, in close
self._store_workbook()
File "/Users/PycharmProjects/pythonProject/venv/lib/python3.9/site-packages/xlsxwriter/workbook.py", line 745, in _store_workbook
raise e
File "/Users/PycharmProjects/pythonProject/venv/lib/python3.9/site-packages/xlsxwriter/workbook.py", line 739, in _store_workbook
xlsx_file.write(os_filename, xml_filename)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/zipfile.py", line 1761, in write
with open(filename, "rb") as src, self.open(zinfo, 'w') as dest:
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/zipfile.py", line 1505, in open
return self._open_to_write(zinfo, force_zip64=force_zip64)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/zipfile.py", line 1597, in _open_to_write
self._writecheck(zinfo)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/zipfile.py", line 1712, in _writecheck
raise LargeZipFile(requires_zip64 +
zipfile.LargeZipFile: Filesize would require ZIP64 extensions
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/PycharmProjects/pythonProject/Python file.py", line 64, in <module>
workbook.close()
File "/Users/PycharmProjects/pythonProject/venv/lib/python3.9/site-packages/xlsxwriter/workbook.py", line 327, in close
raise FileSizeError("Filesize would require ZIP64 extensions. "
xlsxwriter.exceptions.FileSizeError: Filesize would require ZIP64 extensions. Use workbook.use_zip64().
Does anyone know why this has occurred and how it can be solved?

The issue is caused by the fact that the resulting file, or components of it are greater than 4GB in size. This requires an additional parameter to be passed by xlsxwriter to the Python standard library zipfile.py in order to support larger zip file sizes.
The answer/solution is buried in the exception message:
xlsxwriter.exceptions.FileSizeError: Filesize would require ZIP64 extensions.
Use workbook.use_zip64().
You can either add it as a constructor option or workbook method:
workbook = xlsxwriter.Workbook(filename, {'use_zip64': True})
# Same as:
workbook = xlsxwriter.Workbook(filename)
workbook.use_zip64()
See the docs on the Workbook Constructor and workbook.use_zip64() including the following Note:
Note:
When using the use_zip64() option the zip file created by the Python standard library zipfile.py may cause Excel to issue a warning about repairing the file. This warning is annoying but harmless. The “repaired” file will contain all of the data written by XlsxWriter, only the zip container will be changed.

Related

read clm chunk from wav file using python wavfile

i am using the enhanced wavfile.py library, and i want to use it to read serum-style wavetables. i know that these files use a 'clm' block to store cue points, but i am having trouble with reading these using the library
right now i'm just trying to read the file (i'll do something with it later); here is my code:
import wavfile as wf
wf.read('wavetable.wav')
when i run the script, i get this error:
[my dir]/wavfile.py:223: WavFileWarning: Chunk b'clm ' skipped
warnings.warn("Chunk " + str(chunk_id) + " skipped", WavFileWarning)
[my dir]/wavfile.py:223: WavFileWarning: Chunk b'' skipped
warnings.warn("Chunk " + str(chunk_id) + " skipped", WavFileWarning)
Traceback (most recent call last):
File "[my dir]/./test.py", line 5, in <module>
wf.read('wavetable.wav')
File "[my dir]/wavfile.py", line 228, in read
_skip_unknown_chunk(fid)
File "[my dir]/wavfile.py", line 112, in _skip_unknown_chunk
size = struct.unpack('<i', data)[0]
struct.error: unpack requires a buffer of 4 bytes
is it even possible to do this using the library? if not, how could i modify the library to make this work?
bear with me, i'm new to working with files and python in general
UPDATE:
here's the output after i add madison courto's code:
Traceback (most recent call last):
File "[my dir]/./test.py", line 5, in <module>
wf.debug('wavetable.wav')
File "[my dir]/wavfile.py", line 419, in debug
format_str = format.decode("utf-8")
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 1: invalid start byte
and here is the wavetable i'm testing; hopefully sndup left it intact
Adding these conditions to the read function returns a dict of markers, it seems that one of the markers is currupt so I added except pass, it's a bit janky but works.
elif chunk_id == b'':
break
elif chunk_id == b'clm ':
str1 = fid.read(8)
size, numcue = struct.unpack('<ii', str1)
for c in range(numcue):
try:
str1 = fid.read(24)
idx, position, datachunkid, chunkstart, blockstart, sampleoffset = struct.unpack(
'<iiiiii', str1)
# _cue.append(position)
_markersdict[idx][
'position'] = position # needed to match labels and markers
except:
pass

Occasional PermissionError when dict-writing to csv

I've written a tkinter app in Python 3.7, which extracts from main CSV, pre-filtered by user, information into smaller reports. My issue is that while writing filtered data from main CSV into report I occasionally get a PermissionError:
Exception in Tkinter callback
Traceback (most recent call last):
File "C:\Users\mdiakuly\AppData\Roaming\Python\Python37\lib\tkinter\__init__.py", line 1705, in __call__
return self.func(*args)
File "D:/PycharmProjects/report_extractor_hil_sil/report_extractor.py", line 286, in report_generation_prep
self.report_generation(names_list, relevant_params, specific_value, extracted_file)
File "D:/PycharmProjects/report_extractor_hil_sil/report_extractor.py", line 344, in report_generation
processing_data(rd,column)
File "D:/PycharmProjects/report_extractor_hil_sil/report_extractor.py", line 336, in processing_data
writing_new_report(gathering_row)
File "D:/PycharmProjects/report_extractor_hil_sil/report_extractor.py", line 299, in writing_new_report
with open(extracted_file, 'a+', newline='') as write_in:
PermissionError: [Errno 13] Permission denied: 'C:/Users/myusername/Desktop/reporter/Partial_report_14_10_2021T15-13-12.csv'
Once it has extracted only one row and through an error, another time it did whole extraction with no error, and few other times it extracted thousands of rows and through an error.
CSV file which is being written into was never opened during info extraction.
Has anyone faced the same issue or maybe has an idea how to fix it?
def writing_new_report(complete_row):
with open(extracted_file, 'a+', newline='') as write_in:
wt = csv.DictWriter(write_in, delimiter=';', fieldnames=relevant_params)
if self.debug:
print(complete_row)
wt.writerow(complete_row)
def processing_data(r_d,column='def'):
for idx, row in enumerate(r_d): #looping through major csv
self.progress.update()
if self.debug:
print(f'{idx:}',end=' / ')
gathering_row = {}
if column != 'def':
if row[column] not in names_list:
continue
else:
names_list.remove(row[column])
pass
else:
pass
for param, value in zip(relevant_params,specific_value):
self.progress.update()
if self.debug:
print(f'{row[param]:}',end=' / ')
gathering_row[param] = row[param]
if value == '---- All ----':
pass
elif value != row[param]:
if self.debug:
print(f'{row[param]:} - Skipped')
break
if param == relevant_params[len(relevant_params)-1]:
if self.debug:
print(f'{row[param]:} - Written')
writing_new_report(gathering_row)

Open an image file when the path is made from os.path.join

I use two different scripts. In the first one, there is something like this:
f = open(filename, 'r')
file, file_ext = os.path.splitext(filename)
thumb=open(file +"_thumb.txt","w")
for line in f:
array = line.split(',')
a = str(array[0])
t=a[11:14]+ "\\" + a[15:19] + "\\" + (a[11:])+".jpg" +"\n"
thumb.write(t)
thumb.close()
In the second one:
Dirname = str(self.lneDirIn1.text())
f=open(file +"_thumb.txt","r")
for line in f:
line=str(line)
print(line)
cl_img_path=os.path.normpath((os.path.join(Dirname,line)))
print(cl_img_path)
cl_img=Image.open(str(cl_img_path))
When I run the second one, there is an error because os.path.join actually joins the "\n" of the line, so cl_img cannot be opened. However, When I print the "line" alone, it doesn't display the '\n'
Here is the error:
Traceback (most recent call last):
File "./midas/mds_central_line_thumbs.py", line 118, in pbtOKClicked
self.process()
File "./midas/mds_central_line_thumbs.py", line 105, in process
cl_img=Image.open(str(cl_img_path))
File "C:\0adtoolsv2\libs\Python27\lib\site-packages\PIL\Image.py", line 1952, in open
fp = __builtin__.open(fp, "rb")
IOError: [Errno 22] invalid mode ('rb') or filename: 'k:\\SBU_3\\USA\\PIO2015\\04-TEST-SAMPLES\\USCASFX1608\\D16MMDD\\B3\\Images\\051\\0151\\051_0151_00021466.jpg\n'
I'd like that my second script doesn't take the "\n" (necessary in the first script) into account when opening the file
Thank you very much, Guillaume.
What about stripping the "\n" when reading the line?
line=str(line).strip()
Or when joining the path?
cl_img_path=os.path.normpath((os.path.join(Dirname, line.strip())))
Or when openning the image?
cl_img=Image.open(str(cl_img_path).strip())
You could simply use :
lines = file.read().splitlines()
for line in lines :
print line #Wouhou, no \n

Python Openpyxl Append issue

I have hundreds of XML files that I need to extract two values from and ouput in an Excel or CSV file. This is the code I currently have:
#grabs idRoot and typeId root values from XML files
import glob
from openpyxl import Workbook
from xml.dom import minidom
import os
wb = Workbook()
ws = wb.active
def typeIdRoot (filename):
f = open(filename, encoding = "utf8")
for xml in f:
xmldoc = minidom.parse(f)
qmd = xmldoc.getElementsByTagName("MainTag")[0]
typeIdElement = qmd.getElementsByTagName("typeId")[0]
root = typeIdElement.attributes["root"]
global rootValue
rootValue = root.value
print ('rootValue =' ,rootValue,)
ws.append([rootValue])
wb.save("some.xlsx")
wb = Workbook()
ws = wb.active
def idRoot (filename):
f = open(filename, encoding = "utf8")
for xml in f:
xmldoc = minidom.parse(f)
tcd = xmldoc.getElementsByTagName("MainTag")[0]
activitiesElement = tcd.getElementsByTagName("id")[0]
sport = activitiesElement.attributes["root"]
sportName = sport.value
print ('idRoot =' ,sportName,)
ws.append([idRoot])
wb.save("some.xlsx")
for file in glob.glob("*.xml"):
typeIdRoot (file)
for file in glob.glob("*.xml"):
idRoot (file)
The first value follows a 1.11.111.1.111111.1.3 format. The second mixes letters and numbers. I believe this is the reason for the error:
Traceback (most recent call last):
File "C:\Python34\Scripts\xml\good.py", line 64, in <module>
idRoot (file)
File "C:\Python34\Scripts\xml\good.py", line 54, in idRoot
ws.append([idRoot])
File "C:\Python34\lib\site-packages\openpyxl\worksheet\worksheet.py", line 754, in append
cell = self._new_cell(col, row_idx, content)
File "C:\Python34\lib\site-packages\openpyxl\worksheet\worksheet.py", line 376, in _new_cell
cell = Cell(self, column, row, value)
File "C:\Python34\lib\site-packages\openpyxl\cell\cell.py", line 131, in __init__
self.value = value
File "C:\Python34\lib\site-packages\openpyxl\cell\cell.py", line 313, in value
self._bind_value(value)
File "C:\Python34\lib\site-packages\openpyxl\cell\cell.py", line 217, in _bind_value
raise ValueError("Cannot convert {0} to Excel".format(value))
ValueError: Cannot convert <function idRoot at 0x037D24F8> to Excel
I would like the result to add both values on the same row. So then I would have a new row for each file in the directory. I need to add the second value to the second row.
as such:
Value 1 Value 2
1.11.111.1.111111.1.3 10101011-0d10-0101-010d-0dc1010e0101
idRoot is the name of your FUNCTION.
So when you write
ws.append([idRoot])
you probably mean:
ws.append([sportName])
Of course, you can write something like:
ws.append([rootValue, sportName])
providing both variables are defined with reasonable values.
One last thing, you should save your file only once.

Replace given line in files in Python

I have several files, and I need to replace third line in them:
files = ['file1.txt', 'file2.txt']
new_3rd_line = 'new third line'
What is the best way to do this?
Files are big enough, several 100mb's files.
I used this solution: Search and replace a line in a file in Python
from tempfile import mkstemp
from shutil import move
from os import remove, close
def replace_3_line(file):
new_3rd_line = 'new_3_line\n'
#Create temp file
fh, abs_path = mkstemp()
new_file = open(abs_path,'w')
old_file = open(file)
counter = 0
for line in old_file:
counter = counter + 1
if counter == 3:
new_file.write(new_3rd_line)
else:
new_file.write(line)
#close temp file
new_file.close()
close(fh)
old_file.close()
#Remove original file
remove(file)
#Move new file
move(abs_path, file)
replace_3_line('tmp.ann')
But it does not work with files that contains non English charecters.
Traceback (most recent call last):
File "D:\xxx\replace.py", line 27, in <module>
replace_3_line('tmp.ann')
File "D:\xxx\replace.py", line 12, in replace_3_line
for line in old_file:
File "C:\Python31\lib\encodings\cp1251.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x98 in position 32: character maps to <undefined>
That is bad. Where's python unicode? (file is utf8, python3).
File is:
фвыафыв
sdadf
试试
阿斯达а
阿斯顿飞

Categories

Resources