How to send STDIN twice to Popen process, each time with EOF? - python

I have this part of a code:
for stdin in stdins:
p.stdin.write(stdin)
which writes string stdin to process p's STDIN.
The challenge is: the process p expects to see EOF before it goes to next STDIN.
With the loop above, the problem is that subsequent p.stdin.write(stdin) will be considered by the process p as input of the 1st STDIN input collection. Because, as said earlier, p expect to see an EOF before moving to subsequent fields.
So, my question is: how to solve this problem in Python? The process needs to see something like:
for stdin in stdins:
p.stdin.write(stdin)
p.stdin.send_eof()
Constraints: solution must not use pexpect.

EOF is not a character, it just means there is no more data to read.
As such I don't believe what you're after is possible in Python or most other languages.

I met the same problem when I try to make a asynchronous render in python with several subprocess, which need to communicate with the main process in low delay.
When I use subprocess.popen() with stdin=subprocess.PIPE, I found that the subprocess can't get any content until stdin.close() happen or the main process exit, both sending an EOF signal but making the PIPE disposable. Surely I tried stdin.writelines(), stdin.flush(), pickle.dump() etc, but none of them worked.
But there is a way to repeatedly communicate with subprocess with NumPy.
ndarray.tofile can directly send an array to a file object. Although the document declare that it is equivalent to file.write(a.tobytes()), it make sense indeed. I'm confused until I read this in the end of the doc page:
When fid is a file object, array contents are directly written to the file, bypassing the file object’s write method. As a result, tofile cannot be used with files objects supporting compression (e.g., GzipFile) or file-like objects that do not support fileno() (e.g., BytesIO).
Actually, I think it is file.write()'s fault. Any function with calls write() method is inevitably unable to send EOF, unless we bypass the write() method, which is impossible without using some C extensions like NumPy.
To send general data throuth the PIPE now comes to two ways:
NumPy support dtype=object, which means you can pack your message into an object array directly. See also numpy.lib.format.
Stores object arrays, i.e. arrays containing elements that are arbitrary Python objects. Files with object arrays are not to be mmapable, but can be read and written to disk.
You can state a Struct as a dtype to pack your message if it has a regular pattern, which is my situation. Here is my example.
task = np.dtype([( "index", np.uint8 ),
( "text", np.unicode_, 128 ),
( "color", np.uint8, 2 ),
( "size", np.uint8 )])
for i in range(123):
np.empty(1, dtype=task).tofile(s.stdin) # s is the subprocess' name.
time.sleep(1)
Then I successfully got the message in the subprocess 123 times separately.
I really hope this can help you. Because it takes me nearly 4 days to find this solution. I was almost considering to use a real file on disk to accomplish the communication between processes — which is supposed to be slower — but thanks to NumPy, my debugging finally comes to an end...
Plus, I think np.save() make no sense to send EOF. You can try this in a python console.
>>> import numpy as np
>>> import sys
>>> a = np.arange(100).reshape(10,10)
>>> a.tofile(sys.stdout.buffer)
... some garbled characters ...
>>> a.tofiler(sys.stdout)
... some garbled characters ...
>>> np.save(sys.stdout.buffer, a)
... some garbled characters ...
>>> np.save(sys.stdout, a)
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "<__array_function__ internals>", line 5, in save
...
TypeError: write() argument must be str, not bytes
The reason is that sys.stdout.buffer.write() accept bytes parameter while sys.stdout.write() accept str. So using array.tofile write sys.stdout didn't cause any error reveal that it didn't called write() method while np.save() did. This raise a problem that it seems np.fromfile doesn't support dtype=object mode. Feel sorry about that. Maybe it's really hard to transfer dynamic typed data through processes through a pipe, but I've heard that inside ctype module there's some way to share RAM between processes, which may be of some help.
Mention that I failed to run the script above in the terminal (io.UnsupportedOperation: seek), but it runs well in PyCharm's python console. I have no idea about it. Maybe PyCharm's python console actually has a proxy of sys.stdin too.
Plus, it seems subprocess.PIPE has a maximum buffer size, so transfering rendered images is impossible. As my experimental result, dividing them into blocks dosen't help.

Related

Equivalent of subprocess.call() in C [duplicate]

In C, how should I execute external program and get its results as if it was ran in the console?
if there is an executable called dummy, and it displays 4 digit number in command prompt when executed, I want to know how to run that executable and get the 4 digit number that it had generated. In C.
popen() handles this quite nicely. For instance if you want to call something and read the results line by line:
char buffer[140];
FILE *in;
extern FILE *popen();
if(! (in = popen(somecommand, "r"""))){
exit(1);
}
while(fgets(buff, sizeof(buff), in) != NULL){
//buff is now the output of your command, line by line, do with it what you will
}
pclose(in);
This has worked for me before, hopefully it's helpful. Make sure to include stdio in order to use this.
You can use popen() on UNIX.
This is not actually something ISO C can do on its own (by that I mean the standard itself doesn't provide this capability) - possibly the most portable solution is to simply run the program, redirecting its standard output to a file, like:
system ("myprog >myprog.out");
then use the standard ISO C fopen/fread/fclose to read that output into a variable.
This is not necessarily the best solution since that may depend on the underlying environment (and even the ability to redirect output is platform-specific) but I thought I'd add it for completeness.
There is popen() on unix as mentioned before, which gives you a FILE* to read from.
Alternatively on unix, you can use a combination of pipe(), fork(), exec(), select(), and read(), and wait() to accomplish the task in a more generalized/flexible way.
The popen library call invokes fork and pipe under the hood to do its work. Using it, you're limited to simply reading whatever the process dumps to stdout (which you could use the underlying shell to redirect). Using the lower-level functions you can do pretty much whatever you want, including reading stderr and writing stdin.
On windows, see calls like CreatePipe() and CreateProcess(), with the IO members of STARTUPINFO set to your pipes. You can get a file descriptor to do read()'s using _open_ofshandle() with the process handle. Depending on the app, you may need to read multi-threaded, or it may be okay to block.

How can I redirect outputs from a Python process into a Rust process?

I am trying to spawn a Rust process from a Python program and redirect Python's standard output into its standard input. I have used the following function:
process = subprocess.Popen(["./target/debug/mypro"], stdin=subprocess.PIPE)
and tried to write to the subprocess using:
process.stdin.write(str.encode(json.dumps(dictionnaire[str(index)]))) #Write bytes of Json representation of previous track
I am not getting any errors but standard input in Rust doesn't seem to take any input and standard output isn't printing anything at all.
Here's the version of the Rust code I am currently running:
extern crate rustc_serialize;
use rustc_serialize::json::Json;
use std::fs::File;
use std::io;
use std::env;
use std::str;
fn main(){
let mut buffer = String::new();
let stdin = io::stdin();
//stdin.lock();
stdin.read_line(&mut buffer).unwrap();
println!{"{}", buffer};
println!{"ok"};
}
process.stdin.write(str.encode(json.dumps(dictionnaire[str(index)])) does not add a newline character by default, so on the Rust side I was never getting to the end of the line which was making the process block on read_line.
Adding it manually made everything work smoothly.
process.stdin.write(str.encode(json.dumps(dictionnaire[str(index)])+ "\n") )
This may be a problem on the Python side
subprocess.run(["cargo run -- " + str(r)], shell=True)
This assumes that you have a numeric file descriptor that remains open across fork and exec. Spawning processes may close file descriptors either because they're marked as CLOEXEC or due to explicit cleanup code before exec.
Before attempting to pass a numeric file descriptor as a string argument, you should make sure that they will remain valid in the new process.
A better approach is to use some process spawning API that allows you to explicitly map the file descriptors in the new process to open handles or an API that spawns a process with stdin/out tied to pipes.

Passing output from an external program back to it after processing

I have a Fortran code that takes a file as input and writes the output into stdout. To avoid read/write cycles, I'd like to run the code inside python and convert the output to a numpy array. I can do this using the following function:
def run_fortran(infile):
import subprocess
import numpy as np
cmd = ['./output.e', infile]
p = subprocess.Popen(cmd, stdout = subprocess.PIPE)
out, err = p.communicate()
if p.returncode == 0:
return np.array(out.split(),dtype=int)
Now I take the array, modify it and write it into a file. The new file is again passed into run_fortran(infile). Can I avoid this step and somehow use the output of run_fortran instead of passing a filename?
I tried two cases with no success:
(1) converting to string:
arr = run_fortran('input.txt')
new_arr = str(arr).replace("[","").replace("]","")
run_fortran(new_arr)
This returns an empty array.
(2) converting to a file type object using StringIO:
from cStringIO import StringIO
run_fortran(StringIO(new_arr))
This returns an error: TypeError: execv() arg 2 must contain only strings which makes sense.
In fortran the read(*,*), read(5,*) or read*, statement will read from from standard input, and if that has the right format, it will work. If you are dealing with formatted data, that is, anything that is human readable, and not a binary file, then you probably need a read loop like:
do line=1,numlines
read(*,*) array(line,:)
enddo
No open or close statements needed. So if what you were writing to a file is passed directly, you should be able to remove the those statements and change the file unit to 5 or *.
Now there are more efficient ways to do this kind of communication, but any solution is a good solution if it suit your purpose.
If your Fortran program (it's that './output.e' AFAICS) can read from stdin, not only from a regular file, you can do without temporary files by passing it stdin=subprocess.PIPE (other valid values are "an existing file descriptor (a positive integer) [or] an existing file object"). In UNIX, there's always /dev/stdin to put on the command line btw and in Windows, there's con.
Still, if the program can only work in "sessions" due to the nature of processing (i.e. cannot run continuously and be fed new data as it's available), you'll have to invoke it repeatedly.
Note that you have to handle different pipes in different threads to avoid deadlocks. So, either use communicate() (but then the program cannot run continuously) or spawn an stdout/stderr thread manually (that's what communicate() does; not stdin because the output reading code has to be running by the time you start writing to stdin or the external program may choke with "no space left on device" while writing).
Here's sample code for running the program continuously:
p=subprocess.Popen(argv,stdin=subprocess.PIPE,stdout=subprocess.PIPE)
while True:
#the only more elegant way to return a value from a thread is with a Thread subclass,
# see http://stackoverflow.com/questions/6893968/how-to-get-the-return-value-from-a-thread-in-python
output_container=[]
ot=threading.Thread(target=_outputreaderthread,args=(p.stdout,output_container,data))
ot.start()
p.stdin.write(data)
ot.join()
output=output_container[0]
data=process_output(output)
if no_more_processing_needed(data): break
p.stdin.close()
if p.wait()!=0:raise subprocess.CalledProcessError(p.returncode,argv)
def _outputreaderthread(stream,container,data):
#since there's no process termination as an end mark here,
# you need to read exactly the right number of bytes to avoid getting blocked.
#use whatever technique is appropriate for your data (e.g. readline()'s)
output=stream.read(calculate_output_size(data))
container.append(output)

Passing sys.stdout as an argument to a process

I am passing "sys.stdout" as an argument to a process, and the process then writes to the "sys.stdout" while it does its stuff.
import multiprocessing
import sys
def worker_with(stream):
stream.write('In the process\n')
if __name__ == '__main__':
sys.stdout.write('In the main\n')
lock = multiprocessing.Lock()
w = multiprocessing.Process(target=worker_with, args=(sys.stdout,))
w.start()
w.join()
The code above does not work, it returns the following error: "ValueError: operation on closed file".
I tried running the same code but calling the function directly instead of spawning a process and it works, it prints out to the console.
I also tried running the same code but calling directly sys.stdout inside the function, spawn it as a process and it works.
The problem seems to be passing sys.stout as a parameter of the process.
Does someone haveany idea why ?
Note: this code is inspired by the tutorial PYMOTW - communication between processes.
EDIT : i am running Python 2.7.10, 32 bits on Windows7.
When you pass arguments to a Process, they are pickled in the parent, transmitted to the child, and unpickled there. Unfortunately, it looks like the round trip through pickle silently misbehaves for file objects; with protocol 0, it errors out, but with protocol 2 (the highest Python 2 protocol, and the one used for multiprocessing), it silently produces a junk file object:
>>> import pickle, sys
>>> pickle.loads(pickle.dumps(sys.stdout, pickle.HIGHEST_PROTOCOL))
<closed file '<uninitialized file>', mode '<uninitialized file>' at 0xDEADBEEF>
Same problem occurs for named files too; it's not unique to the standard handles. Basically, pickle can't round trip a file object; even when it claims to succeed, the result is garbage.
Generally, multiprocessing isn't really expected to handle a scenario like this; usually, Processes are worker tasks, and I/O is performed through the main process (because if they all wrote independently to the same file handle, you'd have issues with interleaved writes).
In Python 3.5 at least, they fixed this so the error is immediate and obvious (the file-like objects returned by open, TextIOWrapper and Buffered*, will error out when pickled with any protocol).
The best you could do on Windows would be to send the known file descriptor as an argument:
sys.stdout.flush() # Precaution to minimize output interleaving
w = multiprocessing.Process(target=worker_with, args=(sys.stdout.fileno(),))
then reopen it on the other side using os.fdopen. For fds not part of the standard handles (0, 1 and 2), since Windows uses the "spawn" method of making new Processes, you'd need to make sure any such fd was opened as a consequence of importing the __main__ module when __name__ != "__main__" (Windows simulates a fork by importing the __main__ module, setting the __name__ to something else). Of course, if it's a named file, not a standard handle, you could just pass the name and reopen that. For example, to make this work, you'd change:
def worker_with(stream):
stream.write('In the process\n')
to:
import os
def worker_with(toopen):
opener = open if isinstance(toopen, basestring) else os.fdopen
with opener(toopen, 'a') as stream:
stream.write('In the process\n')
Note: As written, if the fd is for one of the standard handles, os.fdopen will close the underlying file descriptor when the with statement exits, which may not be what you want. If you need file descriptors to survive the close of the with block, when passed a file descriptor, you may want to use os.dup to duplicate the handle before calling os.fdopen, so the two handles are independent of one another.
Other solutions would include writing results back to the main process over a multiprocessing.Pipe (so the main process is responsible for passing the data along to sys.stdout, possibly launching a thread to perform this work asynchronously), or using higher level constructs (e.g. multiprocessing.Pool().*map*) that return data using return statement instead of explicit file I/O.
If you're really desperate to make this work in general for all file descriptors (and don't care about portability), not just the standard handles and descriptors created on import of __main__, you can use the undocumented Windows utility function multiprocessing.forking.duplicate that is used to explicitly duplicate a file descriptor from one process to another; it would be incredibly hacky (you'd need to look at the rest of the Windows definition of multiprocessing.forking.Popen there to see how it would be used), but it would at least allow passing along arbitrary file descriptors, not just statically opened ones.

subprocess.Popen.stdout - reading stdout in real-time (again)

Again, the same question.
The reason is - I still can't make it work after reading the following:
Real-time intercepting of stdout from another process in Python
Intercepting stdout of a subprocess while it is running
How do I get 'real-time' information back from a subprocess.Popen in python (2.5)
catching stdout in realtime from subprocess
My case is that I have a console app written in C, lets take for example this code in a loop:
tmp = 0.0;
printf("\ninput>>");
scanf_s("%f",&tmp);
printf ("\ninput was: %f",tmp);
It continuously reads some input and writes some output.
My python code to interact with it is the following:
p=subprocess.Popen([path],stdout=subprocess.PIPE,stdin=subprocess.PIPE)
p.stdin.write('12345\n')
for line in p.stdout:
print(">>> " + str(line.rstrip()))
p.stdout.flush()
So far whenever I read form p.stdout it always waits until the process is terminated and then outputs an empty string. I've tried lots of stuff - but still the same result.
I tried Python 2.6 and 3.1, but the version doesn't matter - I just need to make it work somewhere.
Trying to write to and read from pipes to a sub-process is tricky because of the default buffering going on in both directions. It's extremely easy to get a deadlock where one or the other process (parent or child) is reading from an empty buffer, writing into a full buffer or doing a blocking read on a buffer that's awaiting data before the system libraries flush it.
For more modest amounts of data the Popen.communicate() method might be sufficient. However, for data that exceeds its buffering you'd probably get stalled processes (similar to what you're already seeing?)
You might want to look for details on using the fcntl module and making one or the other (or both) of your file descriptors non-blocking. In that case, of course, you'll have to wrap all reads and/or writes to those file descriptors in the appropriate exception handling to handle the "EWOULDBLOCK" events. (I don't remember the exact Python exception that's raised for these).
A completely different approach would be for your parent to use the select module and os.fork() ... and for the child process to execve() the target program after directly handling any file dup()ing. (Basically you'd be re-implement parts of Popen() but with different parent file descriptor (PIPE) handling.
Incidentally, .communicate, at least in Python's 2.5 and 2.6 standard libraries, will only handle about 64K of remote data (on Linux and FreeBSD). This number may vary based on various factors (possibly including the build options used to compile your Python interpreter, or the version of libc being linked to it). It is NOT simply limited by available memory (despite J.F. Sebastian's assertion to the contrary) but is limited to a much smaller value.
Push reading from the pipe into a separate thread that signals when a chunk of output is available:
How can I read all availably data from subprocess.Popen.stdout (non blocking)?
The bufsize=256 argument prevents 12345\n from being sent to the child process in a chunk smaller than 256 bytes, as it will be when omitting bufsize or inserting p.stdin.flush() after p.stdin.write(). Default behaviour is line-buffering.
In either case you should at least see one empty line before blocking as emitted by the first printf(\n...) in your example.
Your particular example doesn't require "real-time" interaction. The following works:
from subprocess import Popen, PIPE
p = Popen(["./a.out"], stdin=PIPE, stdout=PIPE)
output = p.communicate(b"12345")[0] # send input/read all output
print output,
where a.out is your example C program.
In general, for a dialog-based interaction with a subprocess you could use pexpect module (or its analogs on Windows):
import pexpect
child = pexpect.spawn("./a.out")
child.expect("input>>")
child.sendline("12345.67890") # send a number
child.expect(r"\d+\.\d+") # expect the number at the end
print float(child.after) # assert that we can parse it
child.close()
I had the same problem, and "proc.communicate()" does not solve it because it waits for process terminating.
So here is what is working for me, on Windows with Python 3.5.1 :
import subprocess as sp
myProcess = sp.Popen( cmd, creationflags=sp.CREATE_NEW_PROCESS_GROUP,stdout=sp.PIPE,stderr=sp.STDOUT)
while i<40:
i+=1
time.sleep(.5)
out = myProcess.stdout.readline().decode("utf-8").rstrip()
I guess creationflags and other arguments are not mandatory (but I don't have time to test), so this would be the minimal syntax :
myProcess = sp.Popen( cmd, stdout=sp.PIPE)
for i in range(40)
time.sleep(.5)
out = myProcess.stdout.readline()

Categories

Resources