This question already has answers here:
Python: prevent mixed tabs/spaces on module import
(1 answer)
Indentation Error in Python [duplicate]
(7 answers)
Closed 5 years ago.
The following code is confusing the mess out of me. I've got a zip file which I am opening in a context manager. I'm trying to extract the contents of this zip file to a temporary directory. However, when I execute this code block, it tells me that there was an "Attempt to read ZIP archive that was already closed". I find this very strange, as the zip file in question was opened in (with?) a context manager! I've inserted several print statements for calls to methods/properties associated with the object at hand. They return successfully.
Where have I gone wrong? Why does the file believe itself closed?
Any help would be appreciated!
(Edit) Please find the traceback below.
Also, is there a better way to check if a zipfile is in fact open? Other than checking if .fp is True/False?
if config.get('settings', 'new_quarter') == "Yes":
#This gets the latest zip file, by year and quarter
new_statements_path = os.path.join(config.get('cleaning', 'historic_dir'), 'sql_files')
for directory,dirnames, filenames in os.walk(new_statements_path):
zips = [f for f in filenames if ".zip" in f]
highest_quarter = max([z.split('Q')[1].split('.')[0] for z in zips])
print 'Targeting this quarter for initial tables: %s' % (highest_quarter)
for z in zips:
if 'sql_files' in f:
if z.split('Q')[1].split('.')[0] == highest_quarter:
with zipfile.ZipFile(os.path.join(directory,z), 'r') as zip_f:
print zip_f.fp
initial_tables = tempfile.mkdtemp()
print 'initial tables', initial_tables, os.path.exists(initial_tables)
#Ensure the file is read/write by the creator only
saved_umask = os.umask(0077)
try:
print zip_f.namelist()
print zip_f.fp
zip_f.printdir()
zip_f.extractall(path=initial_tables)
except:
print traceback.format_exc()
os.umask(saved_umask)
if os.path.exists(initial_tables) == True:
shutil.rmtree(initial_tables)
Traceback:
Traceback (most recent call last):
File "/Users/n/GitHub/s/s/s/extract/extract.py", line 60, in extract_process
zip_f.extractall(path=initial_tables)
File "/Users/n/anaconda/lib/python2.7/zipfile.py", line 1043, in extractall
self.extract(zipinfo, path, pwd)
File "/Users/n/anaconda/lib/python2.7/zipfile.py", line 1031, in extract
return self._extract_member(member, path, pwd)
File "/Users/n/anaconda/lib/python2.7/zipfile.py", line 1085, in _extract_member
with self.open(member, pwd=pwd) as source, \
File "/Users/n/anaconda/lib/python2.7/zipfile.py", line 946, in open
"Attempt to read ZIP archive that was already closed"
RuntimeError: Attempt to read ZIP archive that was already closed
(SECOND EDIT)
Here's the (reasonably) minimal & complete version. In this case, the code runs fine. Which makes sense, there's nothing fancy going on. What's interesting is I placed the full example (the one below) immediately above the previous example (above). The code below still executes just fine, but the code above still produces the same error. The only difference however is the new_statements_path variable. In the code above, this string comes from a config file. Surely, this isn't the root of the error. But I can't see any other differences.
import traceback
import os
import zipfile
import tempfile
import shutil
new_statements_path = '/Users/n/Official/sql_files'
for directory,dirnames, filenames in os.walk(new_statements_path):
zips = [f for f in filenames if ".zip" in f]
highest_quarter = max([z.split('Q')[1].split('.')[0] for z in zips])
print 'Targeting this Quarter for initial tables: %s' % (highest_quarter)
for z in zips:
if 'sql_files' in f:
if z.split('Q')[1].split('.')[0] == highest_quarter:
with zipfile.ZipFile(os.path.join(directory,z), 'r') as zip_f:
print zip_f.fp
initial_tables = tempfile.mkdtemp()
print 'initial tables', initial_tables, os.path.exists(initial_tables)
#Ensure the file is read/write by the creator only
saved_umask = os.umask(0077)
try:
print zip_f.namelist()
print zip_f.fp
zip_f.printdir()
zip_f.extractall(path=initial_tables)
except:
print traceback.format_exc()
os.umask(saved_umask)
if os.path.exists(initial_tables) == True:
shutil.rmtree(initial_tables)
if os.path.exists(initial_tables) == True:
shutil.rmtree(initial_tables)
Related
This question already has answers here:
Is close() necessary when using iterator on a Python file object [duplicate]
(8 answers)
Closed 1 year ago.
Source
import os, re
directory = os.listdir('C:/foofolder8')
os.chdir('C:/foofolder8')
for file in directory:
open_file = open(file,'r')
read_file = open_file.read()
regex = re.compile('jersey')
read_file = regex.sub('york', read_file)
write_file = open(file, 'w')
write_file.write(read_file)
The script replaces "jersey" in all the files in C:/foofolder8 with "york". I tried it with three files in the folder and it works. The source notes though that "you may find an error in the last file", which I'm indeed encountering - all text in the last file is simply deleted.
Why does the script break for the last file? The fact that only the last file breaks makes it seem like there's something wrong with the for loop, but I don't see what could be wrong. Calling directory shows there are three files in the directory as well, which is correct.
Try:
import os, re
directory = os.listdir('C:/foofolder8')
os.chdir('C:/foofolder8')
for file in directory:
with open(file,'r') as open_file:
read_file = open_file.read()
regex = re.compile('jersey')
read_file = regex.sub('york', read_file)
with open (file, 'w') as write_file:
write_file.write(read_file)
This question already has answers here:
How do I check whether a file exists without exceptions?
(40 answers)
Closed 1 year ago.
I am trying to write a block of code which opens a new file every time a Python3 script is run.
I am constructing the filename using an incrementing number.
For example, the following are some examples of valid filenames which should be produced:
output_0.csv
output_1.csv
output_2.csv
output_3.csv
On the next run of the script, the next filename to be used should be output_4.csv.
In C/C++ I would do this in the following way:
Enter an infinite loop
Try to open the first filename, in "read" mode
If the file is open, increment the filename number and repeat
If the file is not open, break out of the loop and re-open the file in "write" mode
This doesn't seem to work in Python 3, as opening a non-existing file in read mode causes an exception to be raised.
One possible solution might be to move the open file code block inside a try-catch block. But this doesn't seem like a particularly elegant solution.
Here is what I tried so far in code
# open a file to store output data
filename_base = "output"
filename_ext = "csv"
filename_number = 0
while True:
filename_full = f"{filename_base}_{filename_number}.{filename_ext}"
with open(filename_full, "r") as f:
if f.closed:
print(f"Writing data to {filename_full}")
break
else:
print(f"File {filename_full} exists")
filename_number += 1
with open(filename_full, "w") as f:
pass
As explained above this code crashes when trying to open a file which does not exist in "read" mode.
Using pathlib you can check with Path.is_file() which returns True when it encounters a file or a symbolic link to a file.
from pathlib import Path
filename_base = "output"
filename_ext = "csv"
filename_number = 0
filename_full = f"{filename_base}_{filename_number}.{filename_ext}"
p = Path(filename_full)
while p.is_file() or p.is_dir():
filename_number += 1
p = Path(f"{filename_base}_{filename_number}.{filename_ext}")
This loop should exit when the file isn’t there so you can open it for writing.
you can check if a file exists prior using
os.path.exists(filename)
You could use the OS module to check if the file path is a file, and then open it:
import os
file_path = './file.csv'
if(os.path.isfile(file_path)):
with open(file_path, "r") as f:
This should work:
filename_base = "output"
filename_ext = "csv"
filename_number = 0
while True:
filename_full = f"{filename_base}_{filename_number}.{filename_ext}"
try:
with open(filename_full, "r") as f:
print(f"File {filename_full} exists")
filename_number += 1
except FileNotFoundError:
print("Creating new file")
open(filename_full, 'w');
break;
You might os.path.exists to check if file already exists for example
import os
print(os.path.exists("output_0.csv"))
or harness fact that your names
output_0.csv
output_1.csv
output_2.csv
output_3.csv
are so regular, exploit glob.glob like so
import glob
existing = glob.glob("output_*.csv")
print(existing) # list of existing files
This question already has answers here:
open() gives FileNotFoundError / IOError: '[Errno 2] No such file or directory'
(8 answers)
Closed 7 years ago.
I have a text file named hsp.txt in C:\Python27\Lib\site-packages\visual\examples and used the following code.
def file():
file = open('hsp.txt', 'r')
col = []
data = file.readlines()
for i in range(1,len(data)-1):
col.append(int(float(data[i].split(',')[5])))
return col
def hist(col):
handspan = []
for i in range(11):
handspan.append(0)
for i in (col):
handspan[i] += 1
return handspan
col = file()
handspan = hist(col)
print(col)
print(handspan)
But when I run it it says that that the file doesn't exist.
Traceback (most recent call last):
File "Untitled", line 17
col = file()
File "Untitled", line 2, in file
file = open('hsp.txt', 'r')
IOError: [Errno 2] No such file or directory: 'hsp.txt'
How do I fix this?
Also how do I output the mean and variance?
Have you thought where your path leads to? You need to supply the complete path to the file.
opened_file = open("C:/Python27/Lib/site-packages/visual/examples/hsp.txt")
A couple other things:
Don't use file as a variable name. The system already uses that name.
Use a with statement. It's considered better practice.
with open("C:/Python27/Lib/site-packages/visual/examples/hsp.txt"):
# do something
When the with block ends, the file is automatically closed. In your code, the file is left open until the file function is closed (and hence saved) with the .close() method.
When your specifying just the following line
file = open('hsp.txt', 'r')
It is trying to use your current directory, that is where ever you launched python from. So if you, from a command prompt were at C:\temp and executed python test.py it woud look for your hsp.txt in C:\temp\hsp.txt. You should specify the full path when your not trying to load files from your current directory.
file = open(r'C:\Python27\Lib\site-packages\visual\examples\hsp.txt')
Ok, so I have a zip file that contains gz files (unix gzip).
Here's what I do --
def parseSTS(file):
import zipfile, re, io, gzip
with zipfile.ZipFile(file, 'r') as zfile:
for name in zfile.namelist():
if re.search(r'\.gz$', name) != None:
zfiledata = zfile.open(name)
print("start for file ", name)
with gzip.open(zfiledata,'r') as gzfile:
print("done opening")
filecontent = gzfile.read()
print("done reading")
print(filecontent)
This gives the following result --
>>>
start for file XXXXXX.gz
done opening
done reading
Then stays like that forever until it crashes ...
What can I do with filecontent?
Edit : this is not a duplicate since my gzipped files are in a zipped file and i'm trying to avoid extracting that zip file to disk. It works with zip files in a zip file as per How to read from a zip file within zip file in Python? .
I created a zip file containing a gzip'ed PDF file I grabbed from the web.
I ran this code (with two small changes):
1) Fixed indenting of everything under the def statement (which I also corrected in your Question because I'm sure that it's right on your end or it wouldn't get to the problem you have).
2) I changed:
zfiledata = zfile.open(name)
print("start for file ", name)
with gzip.open(zfiledata,'r') as gzfile:
print("done opening")
filecontent = gzfile.read()
print("done reading")
print(filecontent)
to:
print("start for file ", name)
with gzip.open(name,'rb') as gzfile:
print("done opening")
filecontent = gzfile.read()
print("done reading")
print(filecontent)
Because you were passing a file object to gzip.open instead of a string. I have no idea how your code is executing without that change, but it was crashing for me until I fixed it.
EDIT: Adding link to GZIP docs from James R's answer --
Also, see here for further documentation:
http://docs.python.org/2/library/gzip.html#examples-of-usage
END EDIT
Now, since my gzip'ed file is small, the behavior I observe is that is pauses for about 3 seconds after printing done reading, then outputs what is in filecontent.
I would suggest adding the following debugging line after your print "done reading" -- print len(filecontent). If this number is very, very large, consider not printing the entire file contents in one shot.
I would also suggest reading this for more insight into what I expect is your problem: Why is printing to stdout so slow? Can it be sped up?
EDIT 2 - an alternative if your system does not handle file io on zip files, causing no such file errors in the above:
def parseSTS(afile):
import zipfile
import zlib
import gzip
import io
with zipfile.ZipFile(afile, 'r') as archive:
for name in archive.namelist():
if name.endswith('.gz'):
bfn = archive.read(name)
bfi = io.BytesIO(bfn)
g = gzip.GzipFile(fileobj=bfi,mode='rb')
qqq = g.read()
print qqq
parseSTS('t.zip')
Most likely your problem lies here:
if name.endswith(".gz"): #as goncalopp said in the comments, use endswith
#zfiledata = zfile.open(name) #don't do this
#print("start for file ", name)
with gzip.open(name,'rb') as gzfile: #gz compressed files should be read in binary and gzip opens the files directly
#print("done opening") #trust in your program, luke
filecontent = gzfile.read()
#print("done reading")
print(filecontent)
See here for further documentation:
http://docs.python.org/2/library/gzip.html#examples-of-usage
Just wrote my first python program! I get zip files as attachment in mail which is saved in local folder. The program checks if there is any new file and if there is one it extracts the zip file and based on the filename it extracts to different folder. When i run my code i get the following error:
Traceback (most recent call last): File "C:/Zip/zipauto.py", line 28, in for file in new_files: TypeError: 'NoneType' object is not iterable
Can anyone please tell me where i am going wrong.
Thanks a lot for your time,
Navin
Here is my code:
import zipfile
import os
ROOT_DIR = 'C://Zip//Zipped//'
destinationPath1 = "C://Zip//Extracted1//"
destinationPath2 = "C://Zip//Extracted2//"
def check_for_new_files(path=ROOT_DIR):
new_files=[]
for file in os.listdir(path):
print "New file found ... ", file
def process_file(file):
sourceZip = zipfile.ZipFile(file, 'r')
for filename in sourceZip.namelist():
if filename.startswith("xx") and filename.endswith(".csv"):
sourceZip.extract(filename, destinationPath1)
elif filename.startswith("yy") and filename.endswith(".csv"):
sourceZip.extract(filename, destinationPath2)
sourceZip.close()
if __name__=="__main__":
while True:
new_files=check_for_new_files(ROOT_DIR)
for file in new_files: # fails here
print "Unzipping files ... ", file
process_file(ROOT_DIR+"/"+file)
check_for_new_files has no return statement, and therefore implicitely returns None. Therefore,
new_files=check_for_new_files(ROOT_DIR)
sets new_files to None, and you cannot iterate over None.
Return the read files in check_for_new_files:
def check_for_new_files(path=ROOT_DIR):
new_files = os.listdir(path)
for file in new_files:
print "New file found ... ", file
return new_files
Here is the answer to your NEXT 2 questions:
(1) while True:: your code will loop forever.
(2) your function check_for_new_files doesn't check for new files, it checks for any files. You need to either move each incoming file to an archive directory after it's been processed, or use some kind of timestamp mechanism.
Example, student_grade = dict(zip(names, grades)) make sure names and grades are lists and both having at least more than one item to iterate with. This has helped me