File flush needed after process exit? - python

I'm writing files from one process using open and write (i.e. direct kernel calls.) After the write, I simply close and exit the application without flushing. Now, the application is started from a Python-Wrapper which immediately after the application exits reads the files. Sometimes however, the Python wrapper reads incorrect data, as if I'm still reading an old version of the file (i.e. the wrapper reads stale data)
I thought that no matter whether the file metadata and contents are written to disk, the user visible contents would be always valid & consistent (i.e. buffers get flushed to memory at least, so subsequent reads get the same content, even though it might not be committed to disk.) What's going on here? Do I need to sync on close in my application; or can I simply issue a sync command after running my application from the Python script to guarantee that everything has been written correctly? This is running on ext4.
On the Python side:
# Called for lots of files
o = subprocess.check_output (['./App.BitPacker', inputFile]) # Writes indices.bin and dict.bin
indices = open ('indices.bin', 'rb').read ()
dictionary = open ('dict.bin', 'rb').read ()
with open ('output-file', 'wb') as output:
output.write (dictionary) # Invalid content in output-file ...
# output-file is a placeholder, one output-file per inputFile or course

I've never had your problem and always found a call to close() to be sufficient. However, from the man entry on close(2):
A successful close does not guarantee that the data has been successfully saved to disk, as the kernel defers writes. It is not common for a file system to flush the buffers when the stream is closed. If you need to be sure that the data is physically stored use fsync(2). (It will depend on the disk hardware at this point.)
As, at time of writing, you haven't included code for the write processes I can only suggest adding a call to fsync in that process and see if this makes a difference.

Related

How to solve OdbError in Abaqus Python script?

I am running a 3D solid model in Abaqus Python script, which is supposed to be analyzed for 200 times as the model has been arranged in a for loop (for i in range(0,199):). Sometimes, I receive the following error and then the analysis terminates. I can't realize the reason.
Odb_0=session.openOdb(name='Job-1'+'.odb')
odberrror: the .lck file for the output database D:/abaqus/Model/Job-1.odb indicates that the analysis Input File Processor is currently modifying the database. The database cannot be opened at this time.
It is noted that all the variables including "Odb_0" and .... are deleted at the end of each step of the loop prior to starting the further one.
I don't believe your problem will be helped by a change in element type.
The message and the .lck file say that there's an access deadlock in the database. The output file lost out and cannot update the .odb database.
I'm not sure what database Abaqus uses. I would have guessed that the input stream would have scanned the input file and written whatever records were necessary to the database before the solution and output processing began.
From the Abaqus documentation
The lock file (job_name.lck) is written whenever an output database file is opened with write access, including when an analysis is running and writing output to an output database file. The lock file prevents you from having simultaneous write permission to the output database from multiple sources. It is deleted automatically when the output database file is closed or when the analysis that creates it ends.
When you are deleting your previous analysis you should be sure that all processes connected with that simulation have been terminated. There are several possibilities to do so:
Launching simulation through subprocess.popen could give you much more control over the process (e.g. waiting until it ends, writing of a specific log, etc.);
Naming your simulations differently (e.g. 'Job-1', 'Job-2', etc.) and deleting old ones with a delay (e.g. deleting 'Job-1' while 'Job-3' has started);
Less preferable: using the time module

py-execute-line sends the whole buffer to the python process

I started to play with python-mode in Emacs (latest Emacs, latest python-mode.el)
When I try to send of line of code to the process via py-execute-line or send function definition via py-execute-def-or-class, it grabs the whole buffers, saves it in a temporary file and sends exec(compile(open(some_temp_file_name).read()...) string for execution to the process.
My question is why does it has to be that way?
Why can't we just (comint-send-string proc string) to the process where the string is one line of code or a block (or at least avoid saving a temp file every time)?
Can't reproduce with current trunk.
Please file a complete bug-report at:
https://gitlab.com/python-mode-devs/python-mode/issues

What do I need to be concerned with when forking a process and reading files?

I am fairly new to forking and I have over 10,000 files stored in a folder that I am reading by doing the following:
#loop over all xFiles in a list of files
try:
f = open(xFile, 'r', encoding="utf8")
#search through file for terms
#do other stuff
except:
#Someone removed the file cannot be found.
print("\tFile no longer exists:", xFile)
f.close()
Because my script takes around 45 minutes to run, and due to the nature of the project I am working on, it is possible and very common that a file in the list of files I am searching through sometimes are moved or deleted. That is why my reading is wrapped in a try statement.
Where the following comment "search through file for terms" is inserted I am running an algorithm over thousands of patterns. I wish to fork my process before the loop so that I have two processes with different sets of patterns (different sizes) searching (or reading) through the same list of files.
Some of my concerns:
If one process is reading a file and it's child process tries to read that file, the except will execute and the search algorithm will never execute for that particular file.
I won't be able to read the same file simultaneously with both processes.
So here is my question after providing the context, What do I need to be concerned with when forking a process and reading the same files between both processes?
I'm assuming a *nix type system.
When you fork a process the file descriptors are accessible by both. That means the single kernel data is being accessed by two processes. This is bad in your situation. You would want to open the files after the fork so each process had its own access (and offset into) the file.
You would want to open a file descriptor before forking typically when the child process is changing permissions, such as a web server. The parent process opens a socket on port 80 as the root user, but then forks a child process that will listen on the file descriptor that points to the open port 80. The child process can then drop to a normal user (not root) and continue accepting new connections.
Hope that helps!

Python: Lock a file

I have a Python app running on Linux. It is called every minute from cron. It checks a directory for files and if it finds one it processes it - this can take several minutes. I don't want the next cron job to pick up the file currently being processed so I lock it using the code below which calls portalocker. The problem is it doesn't seem to work. The next cron job manages to get a file handle returned for the file all ready being processed.
def open_and_lock(full_filename):
file_handle = open(full_filename, 'r')
try:
portalocker.lock(file_handle, portalocker.LOCK_EX
| portalocker.LOCK_NB)
return file_handle
except IOError:
sys.exit(-1)
Any ideas what I can do to lock the file so no other process can get it?
UPDATE
Thanks to #Winston Ewert I checked through the code and found the file handle was being closed way before the processing had finished. It seems to be working now except the second process blocks on portalocker.lock rather than throwing an exception.
After fumbling with many schemes, this works in my case. I have a script that may be executed multiple times simultaneously. I need these instances to wait their turn to read/write to some files. The lockfile does not need to be deleted, so you avoid blocking all access if one script fails before deleting it.
import fcntl
def acquireLock():
''' acquire exclusive lock file access '''
locked_file_descriptor = open('lockfile.LOCK', 'w+')
fcntl.lockf(locked_file_descriptor, fcntl.LOCK_EX)
return locked_file_descriptor
def releaseLock(locked_file_descriptor):
''' release exclusive lock file access '''
locked_file_descriptor.close()
lock_fd = acquireLock()
# ... do stuff with exclusive access to your file(s)
releaseLock(lock_fd)
You're using the LOCK_NB flag which means that the call is non-blocking and will just return immediately on failure. That is presumably happening in the second process. The reason why it is still able to read the file is that portalocker ultimately uses flock(2) locks, and, as mentioned in the flock(2) man page:
flock(2) places advisory locks only;
given suitable permissions on a file,
a process is free to ignore the use of
flock(2) and perform I/O on the file.
To fix it you could use the fcntl.flock function directly (portalocker is just a thin wrapper around it on Linux) and check the returned value to see if the lock succeeded.
Don't use cron for this. Linux has inotify, which can notify applications when a filesystem event occurs. There is a Python binding for inotify called pyinotify.
Thus, you don't need to lock the file -- you just need to react to IN_CLOSE_WRITE events (i.e. when a file opened for writing was closed). (You also won't need to spawn a new process every minute.)
An alternative to using pyinotify is incron which allows you to write an incrontab (very much in the same style as a crontab), to interact with the inotify system.
what about manually creating an old-fashioned .lock-file next to the file you want to lock?
just check if it’s there; if not, create it, if it is, exit prematurely. after finishing, delete it.
I think fcntl.lockf is what you are looking for.

What would happen if I abruptly close my script while it's still doing file I/O operations?

here's my question: I'm writing a script to check if my website running all right, the basic idea is to get the server response time and similar stuff every 5 minutes or so, and the script will log the info each time after checking the server status. I know it's no good to close the script while it's in the middle of checking/writing logs, but I'm curious if there are lots of server to check and also you have to do the file I/O pretty frequently, what would happen if you abruptly close the script?
OK, here's an example:
while True:
DemoFile = open("DemoFile.txt", "a")
DemoFile.write("This is a test!")
DemoFile.close()
time.sleep(30)
If I accidentally close the script while this line DemoFile.write("This is a test!") is running, what would I get in the DemoFile.txt? Do I get "This i"(an incomplete line) or the complete line or the line not even added?
Hopefully somebody knows the answer.
According to the python io documentation, buffering is handled according to the buffering parameter to the open function.
The default behavior in this case would be either the device's block size or io.DEFAULT_BUFFER_SIZE if the block size can't be determined. This is probably something like 4096 bytes.
In short, that example will write nothing. If you were writing something long enough that the buffer was written once or twice, you'd have multiples of the buffer size written. And you can always manually flush the buffer with flush().
(If you specify buffering as 0 and the file mode as binary, you'd get "This i". That's the only way, though)
As #sven pointed out, python isn't doing the buffering. When the program is terminated, all open file descriptors are closed and flushed by the operating system.

Categories

Resources