Passing output from an external program back to it after processing - python

I have a Fortran code that takes a file as input and writes the output into stdout. To avoid read/write cycles, I'd like to run the code inside python and convert the output to a numpy array. I can do this using the following function:
def run_fortran(infile):
import subprocess
import numpy as np
cmd = ['./output.e', infile]
p = subprocess.Popen(cmd, stdout = subprocess.PIPE)
out, err = p.communicate()
if p.returncode == 0:
return np.array(out.split(),dtype=int)
Now I take the array, modify it and write it into a file. The new file is again passed into run_fortran(infile). Can I avoid this step and somehow use the output of run_fortran instead of passing a filename?
I tried two cases with no success:
(1) converting to string:
arr = run_fortran('input.txt')
new_arr = str(arr).replace("[","").replace("]","")
run_fortran(new_arr)
This returns an empty array.
(2) converting to a file type object using StringIO:
from cStringIO import StringIO
run_fortran(StringIO(new_arr))
This returns an error: TypeError: execv() arg 2 must contain only strings which makes sense.

In fortran the read(*,*), read(5,*) or read*, statement will read from from standard input, and if that has the right format, it will work. If you are dealing with formatted data, that is, anything that is human readable, and not a binary file, then you probably need a read loop like:
do line=1,numlines
read(*,*) array(line,:)
enddo
No open or close statements needed. So if what you were writing to a file is passed directly, you should be able to remove the those statements and change the file unit to 5 or *.
Now there are more efficient ways to do this kind of communication, but any solution is a good solution if it suit your purpose.

If your Fortran program (it's that './output.e' AFAICS) can read from stdin, not only from a regular file, you can do without temporary files by passing it stdin=subprocess.PIPE (other valid values are "an existing file descriptor (a positive integer) [or] an existing file object"). In UNIX, there's always /dev/stdin to put on the command line btw and in Windows, there's con.
Still, if the program can only work in "sessions" due to the nature of processing (i.e. cannot run continuously and be fed new data as it's available), you'll have to invoke it repeatedly.
Note that you have to handle different pipes in different threads to avoid deadlocks. So, either use communicate() (but then the program cannot run continuously) or spawn an stdout/stderr thread manually (that's what communicate() does; not stdin because the output reading code has to be running by the time you start writing to stdin or the external program may choke with "no space left on device" while writing).
Here's sample code for running the program continuously:
p=subprocess.Popen(argv,stdin=subprocess.PIPE,stdout=subprocess.PIPE)
while True:
#the only more elegant way to return a value from a thread is with a Thread subclass,
# see http://stackoverflow.com/questions/6893968/how-to-get-the-return-value-from-a-thread-in-python
output_container=[]
ot=threading.Thread(target=_outputreaderthread,args=(p.stdout,output_container,data))
ot.start()
p.stdin.write(data)
ot.join()
output=output_container[0]
data=process_output(output)
if no_more_processing_needed(data): break
p.stdin.close()
if p.wait()!=0:raise subprocess.CalledProcessError(p.returncode,argv)
def _outputreaderthread(stream,container,data):
#since there's no process termination as an end mark here,
# you need to read exactly the right number of bytes to avoid getting blocked.
#use whatever technique is appropriate for your data (e.g. readline()'s)
output=stream.read(calculate_output_size(data))
container.append(output)

Related

How to send STDIN twice to Popen process, each time with EOF?

I have this part of a code:
for stdin in stdins:
p.stdin.write(stdin)
which writes string stdin to process p's STDIN.
The challenge is: the process p expects to see EOF before it goes to next STDIN.
With the loop above, the problem is that subsequent p.stdin.write(stdin) will be considered by the process p as input of the 1st STDIN input collection. Because, as said earlier, p expect to see an EOF before moving to subsequent fields.
So, my question is: how to solve this problem in Python? The process needs to see something like:
for stdin in stdins:
p.stdin.write(stdin)
p.stdin.send_eof()
Constraints: solution must not use pexpect.
EOF is not a character, it just means there is no more data to read.
As such I don't believe what you're after is possible in Python or most other languages.
I met the same problem when I try to make a asynchronous render in python with several subprocess, which need to communicate with the main process in low delay.
When I use subprocess.popen() with stdin=subprocess.PIPE, I found that the subprocess can't get any content until stdin.close() happen or the main process exit, both sending an EOF signal but making the PIPE disposable. Surely I tried stdin.writelines(), stdin.flush(), pickle.dump() etc, but none of them worked.
But there is a way to repeatedly communicate with subprocess with NumPy.
ndarray.tofile can directly send an array to a file object. Although the document declare that it is equivalent to file.write(a.tobytes()), it make sense indeed. I'm confused until I read this in the end of the doc page:
When fid is a file object, array contents are directly written to the file, bypassing the file object’s write method. As a result, tofile cannot be used with files objects supporting compression (e.g., GzipFile) or file-like objects that do not support fileno() (e.g., BytesIO).
Actually, I think it is file.write()'s fault. Any function with calls write() method is inevitably unable to send EOF, unless we bypass the write() method, which is impossible without using some C extensions like NumPy.
To send general data throuth the PIPE now comes to two ways:
NumPy support dtype=object, which means you can pack your message into an object array directly. See also numpy.lib.format.
Stores object arrays, i.e. arrays containing elements that are arbitrary Python objects. Files with object arrays are not to be mmapable, but can be read and written to disk.
You can state a Struct as a dtype to pack your message if it has a regular pattern, which is my situation. Here is my example.
task = np.dtype([( "index", np.uint8 ),
( "text", np.unicode_, 128 ),
( "color", np.uint8, 2 ),
( "size", np.uint8 )])
for i in range(123):
np.empty(1, dtype=task).tofile(s.stdin) # s is the subprocess' name.
time.sleep(1)
Then I successfully got the message in the subprocess 123 times separately.
I really hope this can help you. Because it takes me nearly 4 days to find this solution. I was almost considering to use a real file on disk to accomplish the communication between processes — which is supposed to be slower — but thanks to NumPy, my debugging finally comes to an end...
Plus, I think np.save() make no sense to send EOF. You can try this in a python console.
>>> import numpy as np
>>> import sys
>>> a = np.arange(100).reshape(10,10)
>>> a.tofile(sys.stdout.buffer)
... some garbled characters ...
>>> a.tofiler(sys.stdout)
... some garbled characters ...
>>> np.save(sys.stdout.buffer, a)
... some garbled characters ...
>>> np.save(sys.stdout, a)
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "<__array_function__ internals>", line 5, in save
...
TypeError: write() argument must be str, not bytes
The reason is that sys.stdout.buffer.write() accept bytes parameter while sys.stdout.write() accept str. So using array.tofile write sys.stdout didn't cause any error reveal that it didn't called write() method while np.save() did. This raise a problem that it seems np.fromfile doesn't support dtype=object mode. Feel sorry about that. Maybe it's really hard to transfer dynamic typed data through processes through a pipe, but I've heard that inside ctype module there's some way to share RAM between processes, which may be of some help.
Mention that I failed to run the script above in the terminal (io.UnsupportedOperation: seek), but it runs well in PyCharm's python console. I have no idea about it. Maybe PyCharm's python console actually has a proxy of sys.stdin too.
Plus, it seems subprocess.PIPE has a maximum buffer size, so transfering rendered images is impossible. As my experimental result, dividing them into blocks dosen't help.

Calling J from Python

I have some J code that I'd like to run on an array variable from my Python script. The Python array is simply a variable with 200 floating point numbers. I am aware of memory mapped files, but this seems very low level and technical.
Is there a simple way to call a J function or script from Python without dropping out to the shell, as in:
import subprocess
subprocess.check_output(["echo function.ijs | ijconsole"])
Using this method, I first need to save out my variable into a temporary file, and the J program needs to load that file. Is there a more elegant way?
If you have a single string to pass data to a subprocess's input, and want to read its output all at once, use Popen.communicate().
j_process = subprocess.Popen(["ijconsole"],
stdin=subprocess.PIPE, stdout=subprocess.PIPE)
j_output, _ = j_process.communicate(j_input)
If the interaction is more complex, you may use communicate with Popen.stdin/Popen.stdout, but be careful - it's possible to deadlock due to buffering.

How can I redirect outputs from a Python process into a Rust process?

I am trying to spawn a Rust process from a Python program and redirect Python's standard output into its standard input. I have used the following function:
process = subprocess.Popen(["./target/debug/mypro"], stdin=subprocess.PIPE)
and tried to write to the subprocess using:
process.stdin.write(str.encode(json.dumps(dictionnaire[str(index)]))) #Write bytes of Json representation of previous track
I am not getting any errors but standard input in Rust doesn't seem to take any input and standard output isn't printing anything at all.
Here's the version of the Rust code I am currently running:
extern crate rustc_serialize;
use rustc_serialize::json::Json;
use std::fs::File;
use std::io;
use std::env;
use std::str;
fn main(){
let mut buffer = String::new();
let stdin = io::stdin();
//stdin.lock();
stdin.read_line(&mut buffer).unwrap();
println!{"{}", buffer};
println!{"ok"};
}
process.stdin.write(str.encode(json.dumps(dictionnaire[str(index)])) does not add a newline character by default, so on the Rust side I was never getting to the end of the line which was making the process block on read_line.
Adding it manually made everything work smoothly.
process.stdin.write(str.encode(json.dumps(dictionnaire[str(index)])+ "\n") )
This may be a problem on the Python side
subprocess.run(["cargo run -- " + str(r)], shell=True)
This assumes that you have a numeric file descriptor that remains open across fork and exec. Spawning processes may close file descriptors either because they're marked as CLOEXEC or due to explicit cleanup code before exec.
Before attempting to pass a numeric file descriptor as a string argument, you should make sure that they will remain valid in the new process.
A better approach is to use some process spawning API that allows you to explicitly map the file descriptors in the new process to open handles or an API that spawns a process with stdin/out tied to pipes.

How to process lines of standard input interactively in Python?

I'd like to use Python as an external process and communicate with it by standard input/standard output. Specifically, I want to
for line in sys.stdin:
result = compute_something(line)
print result
sys.stdout.flush()
The output flush is to get the result back right away, without buffering. I want to do the same with the input--- to process each line right away, without buffering. However, the above code does not respond to each line individually; it waits until a large amount of data is accumulated in the standard input and then processes everything at once.
This is true even if the calling program flushes its standard output with every line. It's also true even if I'm running the above directly on a console. The buffer is in Python, not the calling program.
Moreover, I found that control-D on the console makes Python flush its standard input buffer. (And then I can continue to send more input afterward!) However, that's not useful to me because the calling program can't send the equivalent of control-D at the end of each line.
One more thing: for line in sys.stdin.xreadlines() appears to be equivalent to for line in sys.stdin: they both buffer.
So my question is, how can I write a Python script that does not buffer its input, so that it processes each line of input right away?
(I solved the problem before posting the question, but I think I should share it anyway--- others might encounter this problem and I'm still interested in any comments on why this is happening and what we should all know about how to control Python's or Linux's implicit buffering.)
Here's one way to avoid input buffering:
while True:
line = sys.stdin.readline()
result = compute_something(line)
print result
sys.stdout.flush()
Apparently, .readline() avoids the input buffer while direct iteration and .xreadlines() do not.

subprocess.Popen.stdout - reading stdout in real-time (again)

Again, the same question.
The reason is - I still can't make it work after reading the following:
Real-time intercepting of stdout from another process in Python
Intercepting stdout of a subprocess while it is running
How do I get 'real-time' information back from a subprocess.Popen in python (2.5)
catching stdout in realtime from subprocess
My case is that I have a console app written in C, lets take for example this code in a loop:
tmp = 0.0;
printf("\ninput>>");
scanf_s("%f",&tmp);
printf ("\ninput was: %f",tmp);
It continuously reads some input and writes some output.
My python code to interact with it is the following:
p=subprocess.Popen([path],stdout=subprocess.PIPE,stdin=subprocess.PIPE)
p.stdin.write('12345\n')
for line in p.stdout:
print(">>> " + str(line.rstrip()))
p.stdout.flush()
So far whenever I read form p.stdout it always waits until the process is terminated and then outputs an empty string. I've tried lots of stuff - but still the same result.
I tried Python 2.6 and 3.1, but the version doesn't matter - I just need to make it work somewhere.
Trying to write to and read from pipes to a sub-process is tricky because of the default buffering going on in both directions. It's extremely easy to get a deadlock where one or the other process (parent or child) is reading from an empty buffer, writing into a full buffer or doing a blocking read on a buffer that's awaiting data before the system libraries flush it.
For more modest amounts of data the Popen.communicate() method might be sufficient. However, for data that exceeds its buffering you'd probably get stalled processes (similar to what you're already seeing?)
You might want to look for details on using the fcntl module and making one or the other (or both) of your file descriptors non-blocking. In that case, of course, you'll have to wrap all reads and/or writes to those file descriptors in the appropriate exception handling to handle the "EWOULDBLOCK" events. (I don't remember the exact Python exception that's raised for these).
A completely different approach would be for your parent to use the select module and os.fork() ... and for the child process to execve() the target program after directly handling any file dup()ing. (Basically you'd be re-implement parts of Popen() but with different parent file descriptor (PIPE) handling.
Incidentally, .communicate, at least in Python's 2.5 and 2.6 standard libraries, will only handle about 64K of remote data (on Linux and FreeBSD). This number may vary based on various factors (possibly including the build options used to compile your Python interpreter, or the version of libc being linked to it). It is NOT simply limited by available memory (despite J.F. Sebastian's assertion to the contrary) but is limited to a much smaller value.
Push reading from the pipe into a separate thread that signals when a chunk of output is available:
How can I read all availably data from subprocess.Popen.stdout (non blocking)?
The bufsize=256 argument prevents 12345\n from being sent to the child process in a chunk smaller than 256 bytes, as it will be when omitting bufsize or inserting p.stdin.flush() after p.stdin.write(). Default behaviour is line-buffering.
In either case you should at least see one empty line before blocking as emitted by the first printf(\n...) in your example.
Your particular example doesn't require "real-time" interaction. The following works:
from subprocess import Popen, PIPE
p = Popen(["./a.out"], stdin=PIPE, stdout=PIPE)
output = p.communicate(b"12345")[0] # send input/read all output
print output,
where a.out is your example C program.
In general, for a dialog-based interaction with a subprocess you could use pexpect module (or its analogs on Windows):
import pexpect
child = pexpect.spawn("./a.out")
child.expect("input>>")
child.sendline("12345.67890") # send a number
child.expect(r"\d+\.\d+") # expect the number at the end
print float(child.after) # assert that we can parse it
child.close()
I had the same problem, and "proc.communicate()" does not solve it because it waits for process terminating.
So here is what is working for me, on Windows with Python 3.5.1 :
import subprocess as sp
myProcess = sp.Popen( cmd, creationflags=sp.CREATE_NEW_PROCESS_GROUP,stdout=sp.PIPE,stderr=sp.STDOUT)
while i<40:
i+=1
time.sleep(.5)
out = myProcess.stdout.readline().decode("utf-8").rstrip()
I guess creationflags and other arguments are not mandatory (but I don't have time to test), so this would be the minimal syntax :
myProcess = sp.Popen( cmd, stdout=sp.PIPE)
for i in range(40)
time.sleep(.5)
out = myProcess.stdout.readline()

Categories

Resources