my python script is supposed to write to /dev/xconsole. It works as expected, when I am reading from /dev/xconsole, such as with tail -F /dev/xconsole. But if I don't have tail running, my script hangs and waits.
I am opening the file as follows:
xconsole = open('/dev/xconsole', 'w')
and writing to it:
for line in sys.stdin:
xconsole.write(line)
Why does my script hang, when nobody is reading the output from /dev/xconsole ?
/dev/xconsole is a named pipe and it is on demand FIFO pipe.
So when you use it, it stores data in memory as Linux provides an object of fixed size. If the application doesn't read the data timely, then the buffer becomes full and the application hangs.
In order to avoid this, you'll need to Write > Read > Write and so on. Just ensure that it doesn't fill up. For a Linux system it's around 64KB usually.
#Vishnudev summarized this nicely already and should be accepted as the correct answer. I'll just add to his answer with the following code to resize your FIFO memory buffer:
import fcntl
F_SETPIPE_SZ = 1031
F_GETPIPE_SZ = 1032
fifo_fd = open("/path/to/fifo", "rb")
print(f"fifo buffer size before: {fcntl.fcntl(fifo_fd, F_GETPIPE_SZ))}"
fcntl.fcntl(fifo_fd, F_SETPIPE_SZ, 1000000)
print(f"fifo buffer size after: {fcntl.fcntl(fifo_fd, F_GETPIPE_SZ))}"
Related
is there any chance to read outputs from running terminals?
I have running processes in /dev/pts/# and I want to read all of them (and then maybe save into a file).
This is working quite well, but only for one/first process:
import subprocess
with open("path_to_file_with_devices_list") as f:
content = f.read().splitlines()
for x in content:
subprocess.check_call(['cat', x]
I have whole output from first /dev/pts/# in terminal and I understand this stuck, because script capture first /dev/pts/# and I can see only this output.
How to handle this? I mean, how to capture output from other/next terminal in /dev/pts/#?
Somehow run in other terminal each next process? Or force this script to end each reading and move on further to next one.
Any ideas?
I've encountered this strange problem with opening/closing files in python. I am trying to do the same thing in python that i was doing successfully in matlab, and i am getting a problem communicating with some software through text files. I've come up with a strange workaround to solve the problem, but i dont understand why it works.
I have software that communicates with a some lab equipment. To communicate with this software, i write a file ('wavefile.txt') to a specific folder, containing parameters to send to the device. I then write another file named 'request.txt' containing the location of this first file ('wavefile.txt') which contains the parameters to send to the device. The software is constantly checking this folder to find the file named 'request.txt' and once it finds it, it will read the parameters in the file which is specified by the text in 'request.txt' and then delete 'request.txt'. The software/equipment developer instructs to give a 50 ms second delay before closing the 'request.txt' file.
original matlab code that works:
home = cd;
cd \\CREOL-FAST-01\data
fileID = fopen('request.txt', 'wt');
proj = 'C:\\dazzler\\data\\wavefile.txt';
fprintf(fileID, proj);
pause(0.05);
fclose('all');
cd(home);
original python code that does not work:
home = os.getcwd()
os.chdir(r'\\CREOL-FAST-01\data')
with open('request.txt', 'w') as file:
proj = r'C:\dazzler\data\wavefile.txt'
file.write(proj)
time.sleep(0.05)
os.chdir(home)
Every time the device program reads the 'request.txt' when its working with matlab, it deletes it immediately after matlab closes it. When i run that code with python it works SOMETIMES, maybe 1 in every 5 tries will be successful and the parameters are sent. The 'request.txt' file is always deleted with the python code above,but the parameters i've input are clearly not sent to my lab device. My guess is that when i write the file in python, the device program is able to read it before python writes the text to it, so its just opening the blank file, not applying any parameters, and then deleting it.
My workaround in python:
home = os.getcwd()
os.chdir(r'\\CREOL-FAST-01\data')
fileh = open('request.txt', 'w+')
proj = r'C:\dazzler\data\wavefile.txt'
fileh.write(proj)
time.sleep(0.05)
print(fileh.read())
time.sleep(0.05)
fileh.close()
This method in python seems to work 100% of the time. I open the file in w+ mode, and using fileh.read() is absolutely necessary. if i delete that line and still include the extra sleeptime, it will again work about 1 in 5 tries. This seems really strange to me. Any explanation, or better solutions?
My guess (which could be wrong) is that the file is being read before it is completely flushed. I would try using the flush() method after the write to make sure that the complete data is written to the file. You might also need the os.fsync() method to make sure the data is flushed properly. Try something like this:
home = os.getcwd()
os.chdir(r'\\CREOL-FAST-01\data')
with open('request.txt', 'w') as file:
proj = r'C:\dazzler\data\wavefile.txt'
file.write(proj)
file.flush()
os.fsync()
time.sleep(0.05)
os.chdir(home)
Not knowing any details about the particular equipment and other software you are using it's hard to say. One guess is the difference in buffering on write calls.
From this blog post on Matlab's fwrite: "The default behavior for fprintf and fwrite is to flush the file buffer after each call to either of these functions"
Whereas for Python's open:
When no buffering argument is given, the default buffering policy
works as follows:
Binary files are buffered in fixed-size chunks; the size of the buffer
is chosen using a heuristic trying to determine the underlying
device’s “block size” and falling back on io.DEFAULT_BUFFER_SIZE. On
many systems, the buffer will typically be 4096 or 8192 bytes long.
“Interactive” text files (files for which isatty() returns True) use
line buffering. Other text files use the policy described above for
binary files.
To test this guess change:
with open('request.txt', 'w') as file:
proj = r'C:\dazzler\data\wavefile.txt'
to:
with open('request.txt', 'w', buffer=1) as file:
proj = 'C:\\dazzler\\data\\wavefile.txt\n'
The problem is probably that you are doing the delay while the file is still open and thus not written to disk. Try something like this:
home = os.getcwd()
os.chdir(r'\\CREOL-FAST-01\data')
with open('request.txt', 'w') as file:
proj = r'C:\dazzler\data\wavefile.txt'
file.write(proj)
time.sleep(0.05)
os.chdir(home)
The only difference here is that the sleep is done after the file is closed (the file is closed when the with block ends), and thus the delay doesn't happen until after the text is written to disk.
To put it in words, what you are doing is:
Open (and create) file
Write text to a buffer (in memory, not on disk)
Wait 50 ms
Close (and write) the file
What you want to do is:
Open (and create) file
Write text to a buffer (in memory, not on disk)
Close (and write) the file
Wait 50 ms
So what you end up with is a period of at least 50 ms where the text file has been created, but where there is nothing in it because the text is sitting in your computer memory not on disk.
To be honest, I would put as little in the with block as possible, to avoid issues like this. So I would write it like so:
home = os.getcwd()
os.chdir(r'\\CREOL-FAST-01\data')
proj = r'C:\dazzler\data\wavefile.txt'
with open('request.txt', 'w') as file:
file.write(proj)
time.sleep(0.05)
os.chdir(home)
Also keep in mind that you also can't do the opposite: assume that no text is written until you close. For small files like this, that will probably happen. But when the file is written to disk depends on a lot of factors. When a file is closed, it is written, but it may be written before that too.
I need to parse the output produced by an external program (third party, I have no control over it) which produces large amounts of data. Since the size of the output greatly exceeds the available memory, I would like to parse the output while the process is running
and remove from the memory the data that have already been processed.
So far I do something like this:
import subprocess
p_pre = subprocess.Popen("preprocessor",stdout = subprocess.PIPE)
# preprocessor is an external bash script that produces the input for the third-party software
p_3party = subprocess.Popen("thirdparty",stdin = p_pre.stdout, stdout = subprocess.PIPE)
(data_to_parse,can_be_thrown) = p_3party.communicate()
parsed_data = myparser(data_to_parse)
When "thirdparty" output is small enough, this approach works. But as stated in the Python documentation:
The data read is buffered in memory, so do not use this method if the data size is large or unlimited.
I think a better approach (that could actually make me save some time),
would be to start processing data_to_parse while it is being produces,
and when the parsing has been done correctly "clear" data_to_parse removing
the data that have already been parsed.
I have also tried to use a for cycle like:
parsed_data=[]
for i in p_3party.stdout:
parsed_data.append(myparser(i))
but it gets stuck and can't understand why.
So I would like to know what it is the best approach to accomplish this? What are the issues to be aware of?
You can use the subprocess.Popen() to create a steam from which you read lines.
import subprocess
stream = subprocess.Popen(stdout=subprocess.PIPE).stdout
for line in stream:
#parse lines as you recieve them.
print line
You could pass the lines to your myparser() method, or append them to a list until you are ready to use them.. whatever.
In your case, using two sub-processes, it would work something like this:
import subprocess
def method(stream, retries=3):
while retries > 0:
line = stream.readline()
if line:
yield line
else:
retries -= 1
pre_stream = subprocess.Popen(cmd, stdout=subprocess.PIPE).stdout
stream = subprocess.Popen(cmd, stdin=pre_stream, stdout=subprocess.PIPE).stdout
for parsed in method(stream):
# do what you want with the parsed data.
parsed_data.append(parsed)
Iterating over a file as in for i in p_3party.stdout: uses a read-ahead buffer. The readline() method may be more reliable with a pipe -- AFAIK it reads character by character.
while True:
line = p_3party.stdout.readline()
if not line:
break
parsed_data.append(myparser(line))
I'm a win7-user.
I accidentally read about redirections (like command1 < infile > outfile) in *nix systems, and then I discovered that something similar can be done in Windows (link). And python is also can do something like this with pipes(?) or stdin/stdout(?).
I do not understand how this happens in Windows, so I have a question.
I use some kind of proprietary windows-program (.exe). This program is able to append data to a file.
For simplicity, let's assume that it is the equivalent of something like
while True:
f = open('textfile.txt','a')
f.write(repr(ctime()) + '\n')
f.close()
sleep(100)
The question:
Can I use this file (textfile.txt) as stdin?
I mean that the script (while it runs) should always (not once) handle all new data, ie
In the "never-ending cycle":
The program (.exe) writes something.
Python script captures the data and processes.
Could you please write how to do this in python, or maybe in win cmd/.bat or somehow else.
This is insanely cool thing. I want to learn how to do it! :D
If I am reading your question correctly then you want to pipe output from one command to another.
This is normally done as such:
cmd1 | cmd2
However, you say that your program only writes to files. I would double check the documentation to see if their isn't a way to get the command to write to stdout instead of a file.
If this is not possible then you can create what is known as a named pipe. It appears as a file on your filesystem, but is really just a buffer of data that can be written to and read from (the data is a stream and can only be read once). Meaning your program reading it will not finish until the program writing to the pipe stops writing and closes the "file". I don't have experience with named pipes on windows so you'll need to ask a new question for that. One down side of pipes is that they have a limited buffer size. So if there isn't a program reading data from the pipe then once the buffer is full the writing program won't be able to continue and just wait indefinitely until a program starts reading from the pipe.
An alternative is that on Unix there is a program called tail which can be set up to continuously monitor a file for changes and output any data as it is appended to the file (with a short delay.
tail --follow=textfile.txt --retry | mycmd
# wait for data to be appended to the file and output new data to mycmd
cmd1 >> textfile.txt # append output to file
One thing to note about this is that tail won't stop just because the first command has stopped writing to the file. tail will continue to listen to changes on that file forever or until mycmd stops listening to tail, or until tail is killed (or "sigint-ed").
This question has various answers on how to get a version of tail onto a windows machine.
import sys
sys.stdin = open('textfile.txt', 'r')
for line in sys.stdin:
process(line)
If the program writes to textfile.txt, you can't change that to redirect to stdin of your Python script unless you recompile the program to do so.
If you were to edit the program, you'd need to make it write to stdout, rather than a file on the filesystem. That way you can use the redirection operators to feed it into your Python script (in your case the | operator).
Assuming you can't do that, you could write a program that polls for changes on the text file, and consumes only the newly written data, by keeping track of how much it read the last time it was updated.
When you use < to direct the output of a file to a python script, that script receives that data on it's stdin stream.
Simply read from sys.stdin to get that data:
import sys
for line in sys.stdin:
# do something with line
I am using tail -F log | python parse.py to monitor and parse a growing log file, but some parsing errors occur that may be caused by reading incomplete lines from the log file.
Is it possible that tail emits incomplete lines?
In the parser, I am reading rows with code like the following:
import csv
import sys
reader = csv.reader(sys.stdin)
for row in reader
# process
It is possible that tail can emit 'unparsable lines' - but only if invalid lines are written to the file. Kind of a circular bit of argument, but here's an example of how it could happen:
You tail -f on /var/log/syslog
syslog-ng dies in the middle of a block-spanning write (sectors are 512 bytes, your filesystem block size is most likely larger, altho probably not much larger than 4096.. so, syslog has 9k of data buffered up to write out, it gets through 4k byte page and before it can write the next 4k+1k syslog dies. On ext2 at least, that'll end up as a partial write even after fsck. ext3? .. heh. I've been doing embedded for so long I can't remember... But, I'd HOPE not.. But who's to say that the data you're writing is always going to be correct? You might get a non-fatal string formatting error that doesn't include a newline that you're expecting.
You'll then have a partial line that isn't terminated with a newline (or even a \0), and the next time syslog starts up and starts appending it'll just be appending on the end of of the file with no notion of 'valid' records. So the first new record will be garbage, but the next one will be ok.
This is easy to exercise..
In one window
tail -f SOMEFILE
In another window
echo FOO >>SOMEFILE
echo BAR >>SOMEFILE
printf NO_NEWLINE >>SOMEFILE
echo I_WILL_HAVE_THE_LAST_LINE_PREFIXED_TO_ME_CAUSING_NERD_RAGE >>SOMEFILE
Since Linux's tail uses inotify by default, whatever is reading will get the last line without a newline and wait until the next newline comes along appending NO_NEWLINE to the beginning of what it considers 'the latest line'.
If you want to do this 'the pythonic' way, if you're using Linux - use inotify, if you're using OSX or BSD use 'knotty' and negate the use of 'tail' as an input pipe and just watch the file yourself.
Tail might do weird things if you use the 'resync on truncate' too - i.e. if the file gets zeroed and restarted in the middle of a read, you might get some weird amount of data on the read since 'tail' will close the previously opened file handle in exchange for the new one.
Since you changed up the question on me.. new answer! :p
reader = csv.reader(sys.stdin)
for row in reader:
try:
validate_row_data_somehow(row)
do_things_with_valid_row(row)
except:
print "failed to process row", `row`