File closes before async call finishes, causing IO error - python

I wrote a package that includes a function to upload something asynchronously. The intent is that the user can use my package, open a file, and upload it async. The problem is, depending on how the user writes their code, I get an IO error.
# EXAMPLE 1
with open("my_file", "rb") as my_file:
package.upload(my_file)
# I/O operation on closed file error
#EXAMPLE 2
my_file = open("my_file", "rb")
package.upload(my_file)
# everything works
I understand that in the first example the file is closing immediately because the call is async. I don't know how to fix this though. I can't tell the user they can't open files in the style of example 1. Can I do something in my package.upload() implementation to prevent the file from closing?

You can use os.dup to duplicate the file descriptor and shield the async process from a close in the caller. The duplicated handle shares other characteristics of the original such as the current file position, so you are not completely shielded from bad things the caller can do.
This also limits your process to things that have file descriptors. If you stick to using the standard file calls, then a user can hand in any file-like object instead of just a file on disk.
def upload(my_file):
my_file = os.fdopen(os.dup(my_file.fileno()))
# ...queue for async

If you are using with to open files, it will close when code block execution finishes inside with. In your case, just pass filename and open inside asynchronus function

Related

Is file closing necessary in this situation?

if I have:
fdata = open(pathf, "r").read().splitlines()
Will the file automatically close after getting the data? If not how can I close it since fdata is not a handle?
Thank you
Use
with open(pathf, "r") as r:
fdata = r.read().splitlines()
# as soon as you leave the with-scope, the file is autoclosed, even if exceptions happen.
Its not only about auto-closing, but also about correct closing in case of exceptions.
Doku: methods of file objects
It is good practice to use the with keyword when dealing with file
objects. The advantage is that the file is properly closed after its
suite finishes, even if an exception is raised at some point. Using
with is also much shorter than writing equivalent try-finally blocks:
If you’re not using the with keyword, then you should call f.close()
to close the file and immediately free up any system resources used by
it.
If you don’t explicitly close a file, Python’s garbage collector
will eventually destroy the object and close the open file for you,
but the file may stay open for a while. Another risk is that different
Python implementations will do this clean-up at different times.
The file will be automatically closed during exit or garbage collection. But as best practices matter, the better approach would be to use a context manager such as below:
with open(pathf, "r") as f:
fdata = f.read().splitlines()
Thank you.
If you use this:
with open(pathf, 'r') as f:
fdata = f.read().splitlines()
Then you don't have to close your file, it is done automatically. It's always good practice to have your files closed after you are done using them (reduces the risk of memory leaks, etc...)
Will the file automatically close after getting the data?
In your example, fdata is actually a list, not a file object. The file object is what returned by open().
If you had a name bound to a file object or fdata were a file object, the answer would be it depends.
If the file object, does not have any reference present i.e. it's reference count reaches 0, it will be garbage collected and hence will be destroyed in the process.
If not how can I close it since fdata is not a handle?
You can't as fdata is not a file object (like you mentioned) and you don't have any reference to the file object returned by open() either.
If you had a file object, you could explicitly call close() on it:
f_object.close()
Better yet, as the open is a context manager, use the with ... construct to let it close automatically upon the block end:
with open('file.txt') as f_object:
...
One added advantage is that the file will be closed in case of an exception too. If you are interested, check the __enter__ and __exit__ special methods of open.

Create temporary file in Python that will be deleted automatically after sometime

Is it possible to create temporary files in python that will be deleted after some time? I checked tempfile library which generates temporary files and directories.
tempfile.TemporaryFile : This function creates tempfile which will be destroyed as soon as closed.
tempfile.NamedTemporaryFile: This function accepts a parameter delete. If delete is given the value false while calling the function the file will not be deleted on close.
What I need is a temp file which I should be able to read with its name for some time even after I close it. However, It should be deleted automatically after some time.
What is the easiest way to do this in Python 2.7?
If you only need to run the file at intervals within your Python program, you can use it as a context manager:
def main(file_object):
# Can read and write to file_object at any time within this function
if __name__ == '__main__':
with tempfile.NamedTemporaryFile() as file_object:
main(file_object)
# Can still use file_object
# file_object is referencing a closed and deleted file here
If you need to be able to open it outside of the Python program, say after main is over, like in a process that runs asyncronously, you should probably look for a more permenant solution (e.g. allowing the user to specify the file name.)
If you really needed to have a temporary auto-deleting file, you can start a threading.Timer to delete it after some time, but make sure that your program deletes it even if it is stopped before the timer ends.

Concurrent access to a data file in Python

I have a small web server doing some operations on a POST request. It reads a data file, does some checks, and then resaves the file adding some informations from the POST in it.
The issue I have is that if two clients are doing a POST request at almost the same time, both will read the same file, then one will write the file containing the new information, and then the other client will write the file containing its new information, but without the information from the other client, since that part wasn't in the file when it was read.
f = open("foo.txt", "r+")
tests_data = yaml.safe_load(f)
post_data = json.loads(web.data())
#Some checks
f.write(json.dumps(tests_data))
f.close()
I wanted the script to "wait", without giving an error, at the "open" line if the file is already opened by another process of the same code, then read the file when the other process is done and has closed the file.
Or something else if other solutions exist.
Would a standard lock not suit your needs? The lock would need to be at the module level.
from threading import Lock
# this needs to be module level variable
lock = Lock
with lock:
# do your stuff. only one thread at a time can
# work in this space...

Make Python wait until a file exists before continuing

In my code, I write a file to my hard disk. After that, I need to import the generated file and then continue processing it.
for i in xrange(10):
filename=generateFile()
# takes some time, I wish to freeze the program here
# and continue once the file is ready in the system
file=importFile(filename)
processFile(file)
If I run the code snippet in one go, most likely file=importFile(filename) will complain that that file does not exist, since the generation takes some time.
I used to manually run filename=generateFile() and wait before running file=importFile(filename).
Now that I'm using a for loop, I'm searching for an automatic way.
You could use time.sleep and I would expect that if you are loading a module this way you would need to reload rather than import after the first import.
However, unless the file is very large why not just generate the string and then eval or exec it?
Note that since your file generation function is not being invoked in a thread it should be blocking and will only return when it thinks it has finished writing - possibly you can improve things by ensuring that the file writer ends with outfile.flush() then outfile.close() but on some OSs there may still be a time when the file is not actually available.
for i in xrange(10):
(filename, is_finished)=generateFile()
while is_finished:
file=importFile(filename)
processFile(file)
continue;
I think you should use a flag to test if the file is generate.

Python: Reread contents of a file

I have a file that an application updates every few seconds, and I want to extract a single number field in that file, and record it into a list for use later. So, I'd like to make an infinite loop where the script reads a source file, and any time it notices a change in a particular figure, it writes that figure to an output file.
I'm not sure why I can't get Python to notice that the source file is changing:
#!/usr/bin/python
import re
from time import gmtime, strftime, sleep
def write_data(new_datapoint):
output_path = '/media/USBHDD/PythonStudy/torrent_data_collection/data_one.csv'
outfile = open(output_path, 'a')
outfile.write(new_datapoint)
outfile.close()
forever = 0
previous_data = "0"
while forever < 1:
input_path = '/var/lib/transmission-daemon/info/stats.json'
infile = open(input_path, "r")
infile.seek(0)
contents = infile.read()
uploaded_bytes = re.search('"uploaded-bytes":\s(\d+)', contents)
if uploaded_bytes:
current_time = strftime("%Y-%m-%d %X", gmtime())
current_data = uploaded_bytes.group(1)
if current_data != previous_data:
write_data(","+ current_time + "$" + uploaded_bytes.group(1))
previous_data = uploaded_bytes.group(1)
infile.close()
sleep(5)
else:
print "couldn't write" + strftime("%Y-%m-%d %X", gmtime())
infile.close()
sleep(60)
As is now, the (messy) script writes once correctly, and then I can see that although my source file (stats.json) file is changing, my script never picks up on any changes. It keeps on running, but my output file doesn't grow.
I thought that an open() and a close() would do the trick, and then tried throwing in a .seek(0).
What file method am I missing to ensure that python re-opens and re-reads my source file, (stats.json)?
Unless you are implementing some synchronization mechanism or could guarantee somehow atomic read and write, I think you are calling for race condition and subtle bugs here.
Imagine the "reader" accessing the file whereas the "writer" hasn't completed its write cycle. There is a risk of reading incomplete/inconsistent data. In "modern" systems, you could also hit the cache -- and not seeing file modifications "live" as they appends.
I can think of two possible solutions:
You forgot the parentheses on the close in the else of the infinite loop.
infile.close --> infile.close()
The program that is changing the JSON file is not closing the file, and therefore it is not actually changing.
Two problems I see:
Are you sure your file is really updated on filesystem? I do not know on what operating system you are playing with your code, but caching may kick your a$$ in this case, if the files is not flushed by producer.
Your problem is worth considering pipe instead of file, however I cannot guarantee what transmission will do if it stuck on writing to pipe if your consumer is dead.
Answering your problems, consider using one of the following:
pynotifyu
watchdog
watcher
These modules are intended to monitor changes on filesystem and then call proper actions. Method in your example is primitive, has big performance penalty and couple other problems mentioned already in other answers.
Ilya, would it help to check(os.path.getmtime), whether stats.json changed before you process the file?
Moreover, i'd suggest to make advantage of the fact it's JSON file:
import json
import os
import sys
dir_name ='/home/klaus/.config/transmission/'
# stats.json of daemon might be elsewhere
file_name ='stats.json'
full_path = os.path.join(dir_name, file_name)
with open(full_path) as fp:
json.load(fp)
data = json.load(fp)
print data['uploaded-bytes']
Thanks for all the answers, unfortunately my error was in the shell, and not in the script with Python.
The cause of the problem turned out to be the way I was putting the script in the background. I was doing: Ctrl+Z which I thought would put the task in the background. But it does not, Ctrl+Z only suspends the task and returns you to the shell, a subsequent bg command is necessary for the script to run on infinite loop in the background

Categories

Resources