I have a Python script that opens a lot (over 2 million) small text files in a for loop. However, it stops when I reach approximately 150'000 files, which indicates for me that I reached the default limit of open files in the Linux kernel.
But, I'm closing the files, so I'm not sure why I hit that limit. The interesting part breaks down to that:
import os
files = os.listdir('/var/tmp/files')
for file in files:
fd = open('/var/tmp/files/{}'.format(file), 'r')
content = fd.readlines()
# Doing stuff
fd.close()
The code works, but apparently it doesn't close files. At first i tried the better with open() statement, but that didn't work either.
Why doesn't Python close the files?
Thanks guys. The problem was that my user had no access to one specific file. So, Python did everything as it should.
I expected that it had to do with Linux' max number of open files as the number of processed files were really near to that max value. It was a coincidence, though.
Thanks for all your help, and sorry for the noise.
I don't know if this will solve the problem, but try. It may be that the program is opening multiple files in the same variable or loop prevents the program to close the files.
import os
files = os.listdir('/var/tmp/files')
fd = list()
for file in files:
if files > 100000:
break
fd.append(open('/var/tmp/files/{}'.format(file), 'r'))
content = fd[file].readlines()
# Doing stuff
for file in files:
if files > 100000:
break
fd[files].close()
I think that you use multiprocess or something in your "Doing Stuff" block. You can assume that you will always face problems related file descriptor when you use multiprocess, so you need more attention.
To solve this problem, simply don't open file before you start another process. You should open file after another process started.
There must be something else happening in your code. Check the status of your file object before you reopen the file:
import os
files = os.listdir('/var/tmp/files')
fileClosed=True
for file in files:
if fileClosed:
with open('/var/tmp/files/{}'.format(file), 'r') as fd:
content = fd.readlines()
## DO Stuff
else:
print "File not closed properly"
break
fileClosed = fd.closed
Related
I want to read a file, but without any lock on it.
with open(source, "rb") as infile:
data = infile.read()
Can the code above lock the source file?
This source file can be updated at any time with new rows (during my script running for example).
I think not because it is only in reading mode ("rb"). But I found that we can use Windows API to read it without lock. I did not find an simple answer for my question.
My script runs locally but the source file and the script/software which appends changes on it are not (network drive).
Opening a file does not put a lock on it. In fact, if you needed to ensure that separate processes did not access a file simultaneously, all these processes would have to to cooperatively take special steps to ensure that only a single process accessed the file at one time (see Locking a file in Python). This can also be demonstrated by the following small program that purposely takes its time in reading a file to give another process (namely me with a text editor) a chance to append some data to the end of the file while the program is running. This program reads and outputs the file one byte at a time pausing .1 seconds between each read. During the running of the program I added some additional text to the end of the file and the program printed the additional text:
import time
with open('test.txt', "rb") as infile:
while True:
data = infile.read(1)
if data == b'':
break
time.sleep(.1)
print(data.decode('ascii'), end='', flush=True)
You can read your file in pieces and then join these pieces together if you need one single byte string. But this will not be as memory efficient as reading the file with a single read:
BLOCKSIZE = 64*1024 # or some other value depending on the file size
with open(source, "rb") as infile:
blocks = []
while True:
data = infile.read(BLOCKSIZE)
if data == b'':
break
blocks.append(data)
# if you need the data in one piece (otherwise the pieces are in blocks):
data = b''.join(blocks)
One alternative is to make a copy of the file temporarily and read the copy.
You can use the shutil package for such a task:
import os
import time
from shutil import copyfile
def read_file_non_blocking(file):
temp_file = f"{filename}-{time.time()}" # Stores it in the local directory
copyfile(file, temp_file)
with open(temp_file, 'r') as my_file:
# Do Something cool
my_file.close()
os.remove(temp_file)
Windows is weird in how it handles files if you, like myself, are used to Posix style file handling. I have run into this issue numerous times and I have been luck enough to avoid solving it. However in this case, if I had to solve it, I would look at the flags that can passed to os.open and see if any of those can disable to locking.
https://docs.python.org/3/library/os.html#os.open
I would do a little testing but I don't have a non-production critical Windows workstation to test on.
I have a noob python question... so bear with me.
Can I open multiple files before closing the previos.
So... can I run
import os
files=[]
for file in os.listdir(os.curdir):
files.append(open(file,'w'))
Then edit each file as I want and finish with
for file in files:
file.close()
Thanks in advance
Seems legit and works fine.
Doing operations would be hard for you this way. the list "files" doesn't contain the filenames. You would not know which file is what.
It is perfectly fine to open each file using open and later close all of them. However, you will want to make sure all of your files are closed properly.
Normally, you would do this for one file:
with open(filename,'w') as f:
do_something_with_the_file(f)
# the file is closed here, regardless of what happens
# i.e. even in case of an exception
You could do the same with multiple files:
with open(filename1,'w') as f1, open(filename2,'w') as f2:
do_something_with_the_file(f)
# both files are closed here
Now, if you have N files, you could write your own context manager, but that would probably be an overkill. Instead, I would suggest:
open_files = []
try:
for filename in list_of_filenames:
open_files.append(open(filename, 'w'))
# do something with the files here
finally:
for file in open_files:
file.close()
BTW, your own code deltes the contents of all files in the current directory. I am not sure you wanted that:
for file in os.listdir(os.curdir):
files.append(open(file,'w')) # open(file,'w') empties the file!!!
Maybe you wanted open(file, 'r') or open(file, 'a') instead? (see https://docs.python.org/2/library/functions.html#open)
Your solution will certainly work but the recommended way would be to use contextmanager so that the files gets handled seamlessly. For example
for filename in os.listdir(os.curdir):
with open(filename, 'w') as f:
# do some actions on the file
The with statement will take care of closing the file for you.
I'm using python 2.7.8. I'm working with 30 open docx files simultaneously.
Is there some way with python code to close all the files simultaneously instead closing every file separately ?
UPDATE:
I'm using different files every day so the names of the files change every time. My code must generally without specific names files (if it is possible)
I suggest is using the with statement when opening files:
with open('file1.txt', 'w') as file1, open('file2.txt', 'w') as file2:
file1.write('stuff to write')
file2.write('stuff to write')
...do other stuff...
print "Both files are closed because I'm out of the with statement"
When you leave the with statement, your file closes. You can even open all of your files on one line, but it's not recommended unless you are actively using all 20 files at once.
You need to find the pid of your word files and then use kill method to terminate the word file's process. e.g
import os
os.kill(pid, signal.SIGTERM)
first you have to append all opened file object in list.
l = []
f1 = open('f1.txt'):
#...do something
l.append(f1)
f2 = open('f2.txt'):
#...do something
l.append(f2)
Now get all files object from list and close them.
for files in l:
files.close()
The general gist of this question: if there is even a remote possibility that something could go wrong, should I catch the possible error? Specifically:
I have an app that reads and writes the previous history of the program to a .txt file. Upon initialization, the program reads the history file to determine what operations it should and should not do. If no history file yet exists, it creates one. Like so:
global trackList
try:
# Open history of downloaded MP3s and transfer it to trackList
with open('trackData.txt', 'r') as f:
trackrackList = f.readlines()
except Exception, e: #if file does not exist, create a blank new one
with open('trackData.txt', 'w') as f:
f.write("")
The program then proceeds to download MP3s based on whether or not they were in the txt file. Once it has downloaded an MP3, it adds it to the txt file. Like so:
mp3File = requests.get(linkURL)
with open('trackData.txt', 'a') as f:
f.write(linkURL + '\n')
Now, it is almost 100% percent certain that the txt file will remain since the time it was created in the first function. We're dealing with downloading a few MP3s here -- the program will never run for more than a few minutes. However, there is the remote possibility that the history txt file will have been deleted by the user or otherwise corrupted in while the MP3 was downloaded, in which case the program will crash because there is no error handling.
Would a good programmer wrap that last block of code in a try ... except block that creates the history txt file if it does not exist, or is that just needless paranoia and wasted space? It's trivial to implement, but keep in mind that I have programs like this where there are literally hundreds of opportunities for users to delete/corrupt a previously created txt file in a tiny window of time. My normally flat Python code would turn into a nested try ... except minefield.
A safer solution would be to open the file and keep it open while you are still downloading. The user will then not be able to delete it. After everything is downloaded and logged, close the file. This will also result in better performance.
Why are you creating an empty file on application startup? Just do nothing if the file isn't present on startup - open('trackData.txt', 'a') will still create a new file.
I have written a small program in python where I need to open many files and close it at a later stage, I have stored all the file handles in a list so that I can refer to it later for closing.
In my program I am storing all the file handles (fout) in the list foutList[]
for cnt in range(count):
fileName = "file" + `cnt` + ".txt"
fullFileName = path + fileName
print "opening file " + fullFileName
try:
fout = open(fullFileName,"r")
foutList.append(fout)
except IOError as e:
print "Cannot open file: %s" % e.strerror
break
Some people suggested me that do no store it in a List, but did not give me the reason why. Can anyone explain why it is not recommended to store it in a List and what is the other possible way to do the same ?
I can't think of any reasons why this is really evil, but possible objections to doing this might include:
It's hard to guarantee that every single file handle will be closed when you're done. Using the file handle with a context manager (see the with open(filename) as file_handle: syntax) always guarantees the file handle is closed, even if something goes wrong.
Keeping lots of files open at the same time may be impolite if you're going to have them open for a long time, and another program is trying to access the files.
This said - why do you want to keep a whole bunch of files open for writing? If you're writing intermittently to a bunch of files, a better way to do this is to open the file, write to it, and then close it until you're ready to write again.
All you have to do is open the file in append mode - open(filename,'a'). This lets you write to the end of an existing file without erasing what's already there (like the 'w' mode.)
Edit(1) I slightly misread your question - I thought you wanted to open these files for writing, not reading. Keeping a bunch of files open for reading isn't too bad.
If you have the files open because you want to monitor the files for changes, try using your platform's equivalent of Linux's inotify, which will tell you when a file has changed (without you having to look at it repeatedly.)
If you don't store them at all, they will eventually be garbage collected, which will close them.
If you really want to close them manually, use weak references to hold them, which will not prevent garbage collection: http://docs.python.org/library/weakref.html