I am writing a script that is required to perform safe-writes to any given file i.e. append a file if no other process is known to be writing into it. My understanding of the theory was that concurrent writes were prevented using write locks on the file system but it seems not to be the case in practice.
Here's how I set up my test case:
I am redirecting the output of a ping command:
ping 127.0.0.1 > fileForSafeWrites.txt
On the other end, I have the following python code attempting to write to the file:
handle = open('fileForSafeWrites.txt', 'w')
handle.write("Probing for opportunity to write")
handle.close()
Running concurrently both processes gracefully complete. I see that fileForSafeWrites.txt has turned into a file with binary content, instead of a write lock issued by the first process that protects it from being written into by the Python code.
How do I force either or both of my concurrent processes not to interfere with each other? I have read people advise the ability to get a write file handle as evidence for the file being write to safe, such as in https://stackoverflow.com/a/3070749/1309045
Is this behavior specific to my Operating System and Python. I use Python2.7 in an Ubuntu 12.04 environment.
Use the lockfile module as shown in Locking a file in Python
Inspired from a solution described for concurrency checks, I came up with the following snippet of code. It works if one is able to appropriately predict the frequency at which the file in question is written. The solution is through the use of file-modification times.
import os
import time
'''Find if a file was modified in the last x seconds given by writeFrequency.'''
def isFileBeingWrittenInto(filename,
writeFrequency = 180, overheadTimePercentage = 20):
overhead = 1+float(overheadTimePercentage)/100 # Add some buffer time
maxWriteFrequency = writeFrequency * overhead
modifiedTimeStart = os.stat(filename).st_mtime # Time file last modified
time.sleep(writeFrequency) # wait writeFrequency # of secs
modifiedTimeEnd = os.stat(filename).st_mtime # File modification time again
if 0 < (modifiedTimeEnd - modifiedTimeStart) <= maxWriteFrequency:
return True
else:
return False
if not isFileBeingWrittenInto('fileForSafeWrites.txt'):
handle = open('fileForSafeWrites.txt', 'a')
handle.write("Text written safely when no one else is writing to the file")
handle.close()
This does not do true concurrency checks but can be combined with a variety of other methods for practical purposes to safely write into a file without having to worry about garbled text. Hope it helps the next person searching for a way to do this.
EDIT UPDATE:
Upon further testing, I encountered a high-frequency write process that required the conditional logic to be modified from
if 0 < (modifiedTimeEnd - modifiedTimeStart) < maxWriteFrequency
to
if 0 < (modifiedTimeEnd - modifiedTimeStart) <= maxWriteFrequency
That makes a better answer, in theory and in practice.
Related
I'm integrating MicroPython into a microcontroller and I want to add a debug step-by-step execution mode to my product (via a connection to a PC).
Thankfully, MicroPython includes a REPL aka Python shell functionality: I can feed it one line at a time and execute.
I want to use this feature to single-step on the PC-side and send in the lines in the Python script one-by-one.
Is there ANY difference, besides possibly timing, between running a Python script one line at a time vs python my_script.py?
Passing one line of code at a time on stdin is a completely unacceptable alternative to a proper debugger.
Let's say that you want to debug the following:
def foo(): # 1
for i in range(10): # 2
if i == 5: # 3
raise Exception("Argh!") # 4
# 5
foo() # 6
...in a proper step-by-step debugger, the user could use it like so:
break 4
run
Now, how are you going to do that? If you enter the function in a REPL, the function is defined as one operation, and it runs as one operation. It doesn't stop at line 6. It doesn't let you proceed line-by-line. The same is true of the for loop: Entering the text of the for loop one line at a time doesn't let you step it before the exception is thrown.
If you eliminate the function, and eliminate the loop (generating the code _something = iter(range(10)); i=_something.next(), maybe?), then you need to emulate the effects of scoping. It means you have a hugely different language than the one you're purportedly "debugging".
I don't know whether MicroPython has compile() and exec() built-in.
But when embeded Python has them and when MCU has enough RAM, then I do the following:
Send a line to embeded shell to start a creation of variable with multiline string.
'_code = """\'
Send the code I wish executed (line by line or however)
Close the multiline string with """
Send exec command to run the transfered code stored in the variable on MCU and pick up the output.
If your RAM is small and you cannot transfer whole code at once, you should transfer it in blocks that would be executed. Like functions, loops, etc.
If you can compile bytecode for MicroPython on a PC, then you should be able to transfer it and prepare it for execution. This would use a lot less of RAM.
But whether you can inject the raw bytecode into shell and run it depends on how much MicroPython resembles CPython.
And yep, there are differences. As explained in another answer line by line execution can be tricky. So blocks of code is your best bet.
Is there ANY difference ...
Yes.
The code below, for example, works in .py file, but is a SyntaxError in the interactive interpreter:
x = 1
if x == 1:
pass
x = 2
There are many other differences, but this alone should be enough to scare you away from the idea.
I am using seek function to extract new lines in an updated file. My code looks like this:
read_data=open('path-to-myfile','r')
read_data.seek(0,2)
while True:
time.sleep(sometime)
new_data=read_data.readlines()
do something with new_data
myfile is a csv file that will be constantly updated
The problem is that usually after several loops inside the while, new_data return nothing. It can be different loop numbers. While I checked myfile, it is still updating..... So any problem I have on my code ? Or is there any other way to do this ?
Any help appreciated !!
You have two programs accessing the same file on disk? If that is the case, then the resource may be locking. I set up an example script that writes to a file, and another file that reads for changes based on the code you provided.
So in one instance of python:
import time
while True:
time.sleep(2)
with open('test.txt','a') as read_data:
read_data.seek(0,2)
read_data.write("bibbity boopity\n")
And in another instance of python
import time
read_data=open('test.txt','r')
read_data.seek(0,2)
while True:
time.sleep(1)
new_data=read_data.readlines()
print(new_data)
In this case, the resource is updating slower than its being read, so changes printed by the bottom prog will be blank. But if I speed up the changes per second, well I still see them. But there are some instances where not all the updates are seen.
You may want to use asynchronous file reading to catch all the changes. Python 3 asyncio library doesn't support async file read/write, but curio does.
See also this question
In my code, I write a file to my hard disk. After that, I need to import the generated file and then continue processing it.
for i in xrange(10):
filename=generateFile()
# takes some time, I wish to freeze the program here
# and continue once the file is ready in the system
file=importFile(filename)
processFile(file)
If I run the code snippet in one go, most likely file=importFile(filename) will complain that that file does not exist, since the generation takes some time.
I used to manually run filename=generateFile() and wait before running file=importFile(filename).
Now that I'm using a for loop, I'm searching for an automatic way.
You could use time.sleep and I would expect that if you are loading a module this way you would need to reload rather than import after the first import.
However, unless the file is very large why not just generate the string and then eval or exec it?
Note that since your file generation function is not being invoked in a thread it should be blocking and will only return when it thinks it has finished writing - possibly you can improve things by ensuring that the file writer ends with outfile.flush() then outfile.close() but on some OSs there may still be a time when the file is not actually available.
for i in xrange(10):
(filename, is_finished)=generateFile()
while is_finished:
file=importFile(filename)
processFile(file)
continue;
I think you should use a flag to test if the file is generate.
I wrote a script in python that takes a few files, runs a few tests and counts the number of total_bugs while writing new files with information for each (bugs+more).
To take a couple files from current working directory:
myscript.py -i input_name1 input_name2
When that job is done, I'd like the script to 'return total_bugs' but I'm not sure on the best way to implement this.
Currently, the script prints stuff like:
[working directory]
[files being opened]
[completed work for file a + num_of_bugs_for_a]
[completed work for file b + num_of_bugs_for_b]
...
[work complete]
A bit of help (notes/tips/code examples) could be helpful here.
Btw, this needs to work for windows and unix.
If you want your script to return values, just do return [1,2,3] from a function wrapping your code but then you'd have to import your script from another script to even have any use for that information:
Return values (from a wrapping-function)
(again, this would have to be run by a separate Python script and be imported in order to even do any good):
import ...
def main():
# calculate stuff
return [1,2,3]
Exit codes as indicators
(This is generally just good for when you want to indicate to a governor what went wrong or simply the number of bugs/rows counted or w/e. Normally 0 is a good exit and >=1 is a bad exit but you could inter-prate them in any way you want to get data out of it)
import sys
# calculate and stuff
sys.exit(100)
And exit with a specific exit code depending on what you want that to tell your governor.
I used exit codes when running script by a scheduling and monitoring environment to indicate what has happened.
(os._exit(100) also works, and is a bit more forceful)
Stdout as your relay
If not you'd have to use stdout to communicate with the outside world (like you've described).
But that's generally a bad idea unless it's a parser executing your script and can catch whatever it is you're reporting to.
import sys
# calculate stuff
sys.stdout.write('Bugs: 5|Other: 10\n')
sys.stdout.flush()
sys.exit(0)
Are you running your script in a controlled scheduling environment then exit codes are the best way to go.
Files as conveyors
There's also the option to simply write information to a file, and store the result there.
# calculate
with open('finish.txt', 'wb') as fh:
fh.write(str(5)+'\n')
And pick up the value/result from there. You could even do it in a CSV format for others to read simplistically.
Sockets as conveyors
If none of the above work, you can also use network sockets locally *(unix sockets is a great way on nix systems). These are a bit more intricate and deserve their own post/answer. But editing to add it here as it's a good option to communicate between processes. Especially if they should run multiple tasks and return values.
I have a file that an application updates every few seconds, and I want to extract a single number field in that file, and record it into a list for use later. So, I'd like to make an infinite loop where the script reads a source file, and any time it notices a change in a particular figure, it writes that figure to an output file.
I'm not sure why I can't get Python to notice that the source file is changing:
#!/usr/bin/python
import re
from time import gmtime, strftime, sleep
def write_data(new_datapoint):
output_path = '/media/USBHDD/PythonStudy/torrent_data_collection/data_one.csv'
outfile = open(output_path, 'a')
outfile.write(new_datapoint)
outfile.close()
forever = 0
previous_data = "0"
while forever < 1:
input_path = '/var/lib/transmission-daemon/info/stats.json'
infile = open(input_path, "r")
infile.seek(0)
contents = infile.read()
uploaded_bytes = re.search('"uploaded-bytes":\s(\d+)', contents)
if uploaded_bytes:
current_time = strftime("%Y-%m-%d %X", gmtime())
current_data = uploaded_bytes.group(1)
if current_data != previous_data:
write_data(","+ current_time + "$" + uploaded_bytes.group(1))
previous_data = uploaded_bytes.group(1)
infile.close()
sleep(5)
else:
print "couldn't write" + strftime("%Y-%m-%d %X", gmtime())
infile.close()
sleep(60)
As is now, the (messy) script writes once correctly, and then I can see that although my source file (stats.json) file is changing, my script never picks up on any changes. It keeps on running, but my output file doesn't grow.
I thought that an open() and a close() would do the trick, and then tried throwing in a .seek(0).
What file method am I missing to ensure that python re-opens and re-reads my source file, (stats.json)?
Unless you are implementing some synchronization mechanism or could guarantee somehow atomic read and write, I think you are calling for race condition and subtle bugs here.
Imagine the "reader" accessing the file whereas the "writer" hasn't completed its write cycle. There is a risk of reading incomplete/inconsistent data. In "modern" systems, you could also hit the cache -- and not seeing file modifications "live" as they appends.
I can think of two possible solutions:
You forgot the parentheses on the close in the else of the infinite loop.
infile.close --> infile.close()
The program that is changing the JSON file is not closing the file, and therefore it is not actually changing.
Two problems I see:
Are you sure your file is really updated on filesystem? I do not know on what operating system you are playing with your code, but caching may kick your a$$ in this case, if the files is not flushed by producer.
Your problem is worth considering pipe instead of file, however I cannot guarantee what transmission will do if it stuck on writing to pipe if your consumer is dead.
Answering your problems, consider using one of the following:
pynotifyu
watchdog
watcher
These modules are intended to monitor changes on filesystem and then call proper actions. Method in your example is primitive, has big performance penalty and couple other problems mentioned already in other answers.
Ilya, would it help to check(os.path.getmtime), whether stats.json changed before you process the file?
Moreover, i'd suggest to make advantage of the fact it's JSON file:
import json
import os
import sys
dir_name ='/home/klaus/.config/transmission/'
# stats.json of daemon might be elsewhere
file_name ='stats.json'
full_path = os.path.join(dir_name, file_name)
with open(full_path) as fp:
json.load(fp)
data = json.load(fp)
print data['uploaded-bytes']
Thanks for all the answers, unfortunately my error was in the shell, and not in the script with Python.
The cause of the problem turned out to be the way I was putting the script in the background. I was doing: Ctrl+Z which I thought would put the task in the background. But it does not, Ctrl+Z only suspends the task and returns you to the shell, a subsequent bg command is necessary for the script to run on infinite loop in the background