I've encountered this strange problem with opening/closing files in python. I am trying to do the same thing in python that i was doing successfully in matlab, and i am getting a problem communicating with some software through text files. I've come up with a strange workaround to solve the problem, but i dont understand why it works.
I have software that communicates with a some lab equipment. To communicate with this software, i write a file ('wavefile.txt') to a specific folder, containing parameters to send to the device. I then write another file named 'request.txt' containing the location of this first file ('wavefile.txt') which contains the parameters to send to the device. The software is constantly checking this folder to find the file named 'request.txt' and once it finds it, it will read the parameters in the file which is specified by the text in 'request.txt' and then delete 'request.txt'. The software/equipment developer instructs to give a 50 ms second delay before closing the 'request.txt' file.
original matlab code that works:
home = cd;
cd \\CREOL-FAST-01\data
fileID = fopen('request.txt', 'wt');
proj = 'C:\\dazzler\\data\\wavefile.txt';
fprintf(fileID, proj);
pause(0.05);
fclose('all');
cd(home);
original python code that does not work:
home = os.getcwd()
os.chdir(r'\\CREOL-FAST-01\data')
with open('request.txt', 'w') as file:
proj = r'C:\dazzler\data\wavefile.txt'
file.write(proj)
time.sleep(0.05)
os.chdir(home)
Every time the device program reads the 'request.txt' when its working with matlab, it deletes it immediately after matlab closes it. When i run that code with python it works SOMETIMES, maybe 1 in every 5 tries will be successful and the parameters are sent. The 'request.txt' file is always deleted with the python code above,but the parameters i've input are clearly not sent to my lab device. My guess is that when i write the file in python, the device program is able to read it before python writes the text to it, so its just opening the blank file, not applying any parameters, and then deleting it.
My workaround in python:
home = os.getcwd()
os.chdir(r'\\CREOL-FAST-01\data')
fileh = open('request.txt', 'w+')
proj = r'C:\dazzler\data\wavefile.txt'
fileh.write(proj)
time.sleep(0.05)
print(fileh.read())
time.sleep(0.05)
fileh.close()
This method in python seems to work 100% of the time. I open the file in w+ mode, and using fileh.read() is absolutely necessary. if i delete that line and still include the extra sleeptime, it will again work about 1 in 5 tries. This seems really strange to me. Any explanation, or better solutions?
My guess (which could be wrong) is that the file is being read before it is completely flushed. I would try using the flush() method after the write to make sure that the complete data is written to the file. You might also need the os.fsync() method to make sure the data is flushed properly. Try something like this:
home = os.getcwd()
os.chdir(r'\\CREOL-FAST-01\data')
with open('request.txt', 'w') as file:
proj = r'C:\dazzler\data\wavefile.txt'
file.write(proj)
file.flush()
os.fsync()
time.sleep(0.05)
os.chdir(home)
Not knowing any details about the particular equipment and other software you are using it's hard to say. One guess is the difference in buffering on write calls.
From this blog post on Matlab's fwrite: "The default behavior for fprintf and fwrite is to flush the file buffer after each call to either of these functions"
Whereas for Python's open:
When no buffering argument is given, the default buffering policy
works as follows:
Binary files are buffered in fixed-size chunks; the size of the buffer
is chosen using a heuristic trying to determine the underlying
device’s “block size” and falling back on io.DEFAULT_BUFFER_SIZE. On
many systems, the buffer will typically be 4096 or 8192 bytes long.
“Interactive” text files (files for which isatty() returns True) use
line buffering. Other text files use the policy described above for
binary files.
To test this guess change:
with open('request.txt', 'w') as file:
proj = r'C:\dazzler\data\wavefile.txt'
to:
with open('request.txt', 'w', buffer=1) as file:
proj = 'C:\\dazzler\\data\\wavefile.txt\n'
The problem is probably that you are doing the delay while the file is still open and thus not written to disk. Try something like this:
home = os.getcwd()
os.chdir(r'\\CREOL-FAST-01\data')
with open('request.txt', 'w') as file:
proj = r'C:\dazzler\data\wavefile.txt'
file.write(proj)
time.sleep(0.05)
os.chdir(home)
The only difference here is that the sleep is done after the file is closed (the file is closed when the with block ends), and thus the delay doesn't happen until after the text is written to disk.
To put it in words, what you are doing is:
Open (and create) file
Write text to a buffer (in memory, not on disk)
Wait 50 ms
Close (and write) the file
What you want to do is:
Open (and create) file
Write text to a buffer (in memory, not on disk)
Close (and write) the file
Wait 50 ms
So what you end up with is a period of at least 50 ms where the text file has been created, but where there is nothing in it because the text is sitting in your computer memory not on disk.
To be honest, I would put as little in the with block as possible, to avoid issues like this. So I would write it like so:
home = os.getcwd()
os.chdir(r'\\CREOL-FAST-01\data')
proj = r'C:\dazzler\data\wavefile.txt'
with open('request.txt', 'w') as file:
file.write(proj)
time.sleep(0.05)
os.chdir(home)
Also keep in mind that you also can't do the opposite: assume that no text is written until you close. For small files like this, that will probably happen. But when the file is written to disk depends on a lot of factors. When a file is closed, it is written, but it may be written before that too.
Related
How can I load an exe file—stored as a base64 encoded string—into memory and execute it without writing it to disk?
The point being, to put some kind of control/password/serial system in place and compile it with py2exe. Then I could execute that embedded file when ever I want in my code.
All of the mechanisms Python has for executing a child process require a filename.
And so does the underlying CreateProcess function in the Win32 API, so there's not even an easy way around it by dropping down to that level.
There is a way to do this by dropping down to ZwCreateProcess/NtCreateProcess. If you know how to use the low-level NT API, this post should be all you need to understand it. If you don't… it's way too much to explain in an SO answer.
Alternatively, of course, you can create or use a RAM drive, or even simulate a virtual filesystem, but that's getting a little silly as an attempt to avoid creating a file.
So, the right answer is to write the exe to a file, then execute it. For example, something like this:
fd, path = tempfile.mkstemp(suffix='.exe')
code = base64.b64decode(encoded_code)
os.write(fd, code)
os.fchmod(fd, 0o711)
os.close(fd)
try:
result = subprocess.call(path)
finally:
os.remove(path)
This should work on both Windows and *nix, but it's completely untested, and will probably have bugs on at least one platform.
Obviously, if you want to execute it multiple times, don't remove it until you're done with it. Or just use some appropriate persistent directory, and write it only if it's missing or out of date.
encode exe :
import base64
#encode exe file in base64 data
with open("Sample.exe", 'rb') as f:
read_exe_to_basae64 = base64.b64encode(f.read())
#encoded data will be like (really big text, don't worry) for e.g.:
b'TVqQAAMAAAAEAAAA//8AALgAAAAAAAAAQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAyAAAAA4fug4AtAnNIbgBTM0hVGhpcyBwcm9ncmFtIGNhbm5vdCBiZSBydW4gaW4gRE9TIG1vZGUuDQ0KJAAAAAAAAAA9AHveeWEVjXlhFY15YRWN+n0bjXhhFY0QfhyNfmEVjZB+GI14YRWNUmljaHlhFY0AAAAAAAAAAAAAAA'
#decode exe file:
with open("Sample2.exe", 'wb') as f:
f.write(base64.b64decode(read_exe_to_basae64))
exe file will be created in folder. If you don't want users to see it, just decode it in any random folder and delete it after use.
How do I truncate a csv log file that is being used as std out pipe destination from another process without generating a _csv.Error: line contains NULL byte error?
I have one process running rtlamr > log/readings.txt that is piping radio signal data to readings.txt. I don't think it matters what is piping to the file--any long-running pipe process will do.
I have a file watcher using watchdog (Python file watcher) on that file, which triggers a function when the file is changed. The function read the files and updates a database.
Then I try to truncate readings.txt so that it doesn't grow infinitely (or back it up).
file = open(dir_path+'/log/readings.txt', "w")
file.truncate()
file.close()
This corrupts readings.txt and generates the error (the start of the file contains garbage characters).
I tried moving the file instead of truncating it, in the hopes that rtlamr will recreate a fresh file, but that only has the effect of stopping the pipe.
EDIT
I noticed that the charset changes from us-ascii to binary but attempting to truncate the file with file = open(dir_path+'/log/readings.log', "w",encoding="us-ascii") does not do anything.
If you truncate a file1 while another process has it open in w mode, that process will continue to write to the same offsets, making the file sparse. Low offsets will thus be read as 0s.
As per x11 - Concurrent writing to a log file from many processes - Unix & Linux Stack Exchange and Can two Unix processes simultaneous write to different positions in a single file?, each process that has a file open has its own offset in it, and a ftruncate() doesn't change that.
If you want the other process to react to truncation, it needs to have it open in a mode.
Your approach has principal bugs, too. E.g. it's not atomic: you may (=will, eventually) truncate the file after the producer has added data but before you have read it so it would get lost.
Consider using dedicated data buffering utilities instead like buffer or pv as per Add a big buffer to a pipe between two commands.
1Which is superfluous because open(mode='w') already does that. Either truncate or reopen, no need to do both.
I have a homebrew web based file system that allows users to download their files as zips; however, I found an issue while dev'ing on my local box not present on the production system.
In linux this is a non-issue (the local dev box is a windows system).
I have the following code
algo = CipherType('AES-256', 'CBC')
decrypt = DecryptCipher(algo, cur_share.key[:32], cur_share.key[-16:])
file = open(settings.STORAGE_ROOT + 'f_' + str(cur_file.id), 'rb')
temp_file = open(temp_file_path, 'wb+')
data = file.read(settings.READ_SIZE)
while data:
dec_data = decrypt.update(data)
temp_file.write(dec_data)
data = file.read(settings.READ_SIZE)
# Takes a dump right here!
# error in cipher operation (wrong final block length)
final_data = decrypt.finish()
temp_file.write(final_data)
file.close()
temp_file.close()
The above code opens a file, and (using the key for the current file share) decrypts the file and writes it to a temporary location (that will later be stuffed into a zip file).
My issue is on the file = open(settings.STORAGE_ROOT + 'f_' + str(cur_file.id), 'rb') line. Since windows cares a metric ton about binary files if I don't specify 'rb' the file will not read to end on the data read loop; however, for some reason since I am also writing to temp_file it never completely reads to the end of the file...UNLESS i add a + after the b 'rb+'.
if i change the code to file = open(settings.STORAGE_ROOT + 'f_' + str(cur_file.id), 'rb+') everything works as desired and the code successfully scrapes the entire binary file and decrypts it. If I do not add the plus it fails and cannot read the entire file...
Another section of the code (for downloading individual files) reads (and works flawlessly no matter the OS):
algo = CipherType('AES-256', 'CBC')
decrypt = DecryptCipher(algo, cur_share.key[:32], cur_share.key[-16:])
file = open(settings.STORAGE_ROOT + 'f_' + str(cur_file.id), 'rb')
filename = smart_str(cur_file.name, errors='replace')
response = HttpResponse(mimetype='application/octet-stream')
response['Content-Disposition'] = 'attachment; filename="' + filename + '"'
data = file.read(settings.READ_SIZE)
while data:
dec_data = decrypt.update(data)
response.write(dec_data)
data = file.read(settings.READ_SIZE)
# no dumps to be taken when finishing up the decrypt process...
final_data = decrypt.finish()
temp_file.write(final_data)
file.close()
temp_file.close()
Clarification
The cipher error is likely because the file was not read in its entirety. For example, I have a 500MB file I am reading in at 64*1024 bytes at a time. I read until I receive no more bytes, when I don't specify b in windows it cycles through the loop twice and returns some crappy data (because python thinks it is interacting with a string file not a binary file).
When I specify b it takes 10-15 seconds to completely read in the file, but it does it succesfully, and the code completes normally.
When I am concurrently writing to another file as i read in from the source file (as in the first example) if I do not specify rb+ it displays the same behavior as not even specifying b which is, that it only reads a couple segments from the file before closing the handle and moving on, i end up with an incomplete file and the decryption fails.
I'm going to take a guess here:
You have some other program that's continually replacing the files you're trying to read.
On linux, this other program works by atomically replacing the file (that is, writing to a temporary file, then moving the temporary file to the path). So, when you open a file, you get the version from 8 seconds ago. A few seconds later, someone comes along and unlinks it from the directory, but that doesn't affect your file handle in any way, so you can read the entire file at your leisure.
On Windows, there is no such thing as atomic replacement. There are a variety of ways to work around that problem, but what many people do is to just rewrite the file in-place. So, when you open a file, you get the version from 8 seconds ago, start reading it… and then suddenly someone else blanks the file to rewrite it. That does affect your file handle, because they've rewritten the same file. So you hit an EOF.
Opening the file in r+ mode doesn't do anything to solve the problem, but it adds a new problem that hides it: You're opening the file with sharing settings that prevent the other program from rewriting the file. So, now the other program is failing, meaning nobody is interfering with this one, meaning this one appears to work.
In fact, it could be even more subtle and annoying than this. Later versions of Windows try to be smart. If I try to open a file while someone else has it locked, instead of failing immediately, it may wait a short time and try again. The rules for exactly how this works depend on the sharing and access you need, and aren't really documented anywhere. And effectively, whenever it works the way you want, it means you're relying on a race condition. That's fine for interactive stuff like dragging a file from Explorer to Notepad (better to succeed 99% of the time instead of 10% of the time), but obviously not acceptable for code that's trying to work reliably (where succeeding 99% of the time just means the problem is harder to debug). So it could easily work differently between r and r+ modes for reasons you will never be able to completely figure out, and wouldn't want to rely on if you could…
Anyway, if any variation of this is your problem, you need to fix that other program, the one that rewrites the file, or possibly both programs in cooperation, to properly simulate atomic file replacement on Windows. There's nothing you can do from just this program to solve it.*
* Well, you could do things like optimistic check-read-check and start over whenever the modtime changes unexpectedly, or use the filesystem notification APIs, or… But it would be much more complicated than fixing it in the right place.
I'm a win7-user.
I accidentally read about redirections (like command1 < infile > outfile) in *nix systems, and then I discovered that something similar can be done in Windows (link). And python is also can do something like this with pipes(?) or stdin/stdout(?).
I do not understand how this happens in Windows, so I have a question.
I use some kind of proprietary windows-program (.exe). This program is able to append data to a file.
For simplicity, let's assume that it is the equivalent of something like
while True:
f = open('textfile.txt','a')
f.write(repr(ctime()) + '\n')
f.close()
sleep(100)
The question:
Can I use this file (textfile.txt) as stdin?
I mean that the script (while it runs) should always (not once) handle all new data, ie
In the "never-ending cycle":
The program (.exe) writes something.
Python script captures the data and processes.
Could you please write how to do this in python, or maybe in win cmd/.bat or somehow else.
This is insanely cool thing. I want to learn how to do it! :D
If I am reading your question correctly then you want to pipe output from one command to another.
This is normally done as such:
cmd1 | cmd2
However, you say that your program only writes to files. I would double check the documentation to see if their isn't a way to get the command to write to stdout instead of a file.
If this is not possible then you can create what is known as a named pipe. It appears as a file on your filesystem, but is really just a buffer of data that can be written to and read from (the data is a stream and can only be read once). Meaning your program reading it will not finish until the program writing to the pipe stops writing and closes the "file". I don't have experience with named pipes on windows so you'll need to ask a new question for that. One down side of pipes is that they have a limited buffer size. So if there isn't a program reading data from the pipe then once the buffer is full the writing program won't be able to continue and just wait indefinitely until a program starts reading from the pipe.
An alternative is that on Unix there is a program called tail which can be set up to continuously monitor a file for changes and output any data as it is appended to the file (with a short delay.
tail --follow=textfile.txt --retry | mycmd
# wait for data to be appended to the file and output new data to mycmd
cmd1 >> textfile.txt # append output to file
One thing to note about this is that tail won't stop just because the first command has stopped writing to the file. tail will continue to listen to changes on that file forever or until mycmd stops listening to tail, or until tail is killed (or "sigint-ed").
This question has various answers on how to get a version of tail onto a windows machine.
import sys
sys.stdin = open('textfile.txt', 'r')
for line in sys.stdin:
process(line)
If the program writes to textfile.txt, you can't change that to redirect to stdin of your Python script unless you recompile the program to do so.
If you were to edit the program, you'd need to make it write to stdout, rather than a file on the filesystem. That way you can use the redirection operators to feed it into your Python script (in your case the | operator).
Assuming you can't do that, you could write a program that polls for changes on the text file, and consumes only the newly written data, by keeping track of how much it read the last time it was updated.
When you use < to direct the output of a file to a python script, that script receives that data on it's stdin stream.
Simply read from sys.stdin to get that data:
import sys
for line in sys.stdin:
# do something with line
i have made a python script that performs a nagios check. The functionality of the script is pretty simple it just parses a log and matches some info witch is used to construct the nagios check output. The log is a snmptrapd log witch records the traps from other servers and logs them in /var/log/snmptrapd after witch i just parse them with the script. In order to have the latest traps i erase the log from python each time after reading it. In order to preserve the info i have made a cron job that copies the content of the log into another log at an time interval a bit smaller than the nagios check interval. The thing that i don't understand is why is the log growing so much (i mean the messages log which has i guess 1000 times more info is smaller). From what i've seen in the log there are a lot of special characters like ^# and i think that this is done by the way i'm manipulating the file from pyton but seeing that i olny have like three weeks of experience with it I can't seem to figure out the problem.
The script code is the following:
import sys, os, re
validstring = "OK"
filename = "/var/log/snmptrapd.log"
if os.stat(filename)[6] == 0:
print validstring
sys.exit()
else:
f = open(filename,"r")
sharestring = ""
line1 = []
patte0 = re.compile("[0-9]+-[0-9]+-[0-9]+")
patte2 = re.compile("NG: [a-zA-Z\s=0-9]+.*")
for line in f:
line1 = line.split(" ")
if re.search(patte0,line1[0]):
sharestring = sharestring + line1[1] + " "
continue
result2 = re.search(patte2,line)
if result2:
result22 = result2.group()
result22 = result22.replace("NG:","")
sharestring = sharestring + result22 + " "
f.close()
f1 = open(filename,"w")
f1.close()
print sharestring
sys.exit(2)
~
The log looks like:
2012-07-11 04:17:16 Some IP(via UDP: [this is an ip]:port) TRAP, SNMP v1, community somestring
SNMPv2-SMI::enterprises.OID Some info which is not necesarry
SNMPv2-MIB::sysDescrOID = STRING: info which i'm matching
I'm pretty sure that it has something to do with the my way of erasing the file but i can't figure it out. If you have some idea i would be really interested. Thank you.
As an information about the size i have 93 lines(so says Vim) and the log occupies 161K and that is not ok because the lines are quite short.
OK it has nothing to do with the way i read and erased the file. Is something in the snmptrapd daemon that is doing this when i'm erasing it's log file. I have modified my code and now i send SIGSTOP to snmptrapd reight before i open the file, and i make my modifications to the file and then i send SIGCONT after i'm done but it seem i experience the same behavior. The new code looks like(the different parts):
else:
command = "pidof snmptrapd"
p=subprocess.Popen(shlex.split(command),stdout=subprocess.PIPE)
pidstring = p.stdout.readline()
patte1 = re.compile("[0-9]+")
pidnr = re.search(patte1,pidstring)
pid = pidnr.group()
os.kill(int(pid), SIGSTOP)
time.sleep(0.5)
f = open(filename,"r+")
sharestring = ""
and
sharestring = sharestring + result22 + " "
f.truncate(0)
f.close()
time.sleep(0.5)
os.kill(int(pid), SIGCONT)
print sharestring
I'm thinking of stopping the daemon erasing the file and after that recreating it with the proper permissions and starting the daemon.
I don't think you can, but here are some things to try
Truncating a File
f1 = open(filename, 'w')
f1.close()
is a hacky side effect way of deleting a files contents and will probably be causing undesired side effects depending on the underlying OS if other applications have that file open.
Using the File Object method truncate()
truncate([size])
Truncate the file's size. If the optional size argument is present,
the file is truncated to (at most) that size. The size defaults to the
current position. The current file position is not changed. Note that
if a specified size exceeds the file's current size, the result is
platform-dependent: possibilities include that the file may remain
unchanged, increase to the specified size as if zero-filled, or
increase to the specified size with undefined new content.
Availability: Windows, many Unix variants.
Probably the only determinist way to do this is
stop the snmptrapd process at the start of the script, use the proper os module function remove and then recreate the file and restart the snmptrapd daemon at the end of the script.
os.remove(path)
Remove (delete) the file path. If path is a directory, OSError is
raised; see rmdir() below to remove a directory. This is identical to
the unlink() function documented below. On Windows, attempting to
remove a file that is in use causes an exception to be raised; on
Unix, the directory entry is removed but the storage allocated to the
file is not made available until the original file is no longer in
use.
Shared resource concern
You still might have problems with having two processes trying to fight for writing to a single file without some kind of locking mechanism and having non-deterministic things happening to the file. I bet you can send a SIGINT or something similar to your daemon process and get it to re-read the file or something, check your documentation.
Manipulating shared resources, especially file resources without exclusive locking is going to be trouble, especially with filesystem caching and application caching of data.