Using sys.stdout.buffer.write as input for subprocess

Using sys.stdout.buffer.write as input for subprocess - python

I'm currently struggling with communicating to a subprocess.
To explain the situation:
I have a class that endlessly writes to the sys.stdout
def runGenerator(self):
while self.runEvent.is_set():
sys.stdout.buffer.write(struct.pack('I', self.next()))
(self.next is an unsigned 32 bit integer)
This works fine so far, even though sys.stdout.buffer.write is a bit slow.
But now i create a subprocess in which i call a module named diehader like this:
def runSub(self):
args = [DIEHARDER, GENERATOR_NUMBER, '-d0']
dieharderTestProc = subprocess.Popen(
args, stdin=subprocess.PIPE, stdout=subprocess.PIPE)
print(dieharderTestProc.stdout.readlines())
self.runEvent.clear()
And my goal now is to use the sys.stdout (that is generated by the previous function) as the stdin of this subprocess. But i can't seem to find a solution.
I wan't to keep the input in sys.stdout because my end goal is to have multiple subprocess which can all just use this one sys.stdout, so that i need to generate these numbers only once.
Also these numbers are generated while the subprocess is running as another multiprocess(for time reason).
That means I can't just give the subprocess a list or some object with my numbers, because they are generated on the spot.
The only way I got the communication to work was like this:
while dieharderTestProc.returncode is None:
dieharderTestProc.stdin.write(struct.pack('>I', struct.pack('I', self.next())))
dieharderTestProc.poll()
But this way I would have to generate these numbers for every subprocess that i call separately, which just cost to much time.
Thanks for any solution, ideas or tips you can provide me :)

Related

Passing output from an external program back to it after processing

I have a Fortran code that takes a file as input and writes the output into stdout. To avoid read/write cycles, I'd like to run the code inside python and convert the output to a numpy array. I can do this using the following function:
def run_fortran(infile):
import subprocess
import numpy as np
cmd = ['./output.e', infile]
p = subprocess.Popen(cmd, stdout = subprocess.PIPE)
out, err = p.communicate()
if p.returncode == 0:
return np.array(out.split(),dtype=int)
Now I take the array, modify it and write it into a file. The new file is again passed into run_fortran(infile). Can I avoid this step and somehow use the output of run_fortran instead of passing a filename?
I tried two cases with no success:
(1) converting to string:
arr = run_fortran('input.txt')
new_arr = str(arr).replace("[","").replace("]","")
run_fortran(new_arr)
This returns an empty array.
(2) converting to a file type object using StringIO:
from cStringIO import StringIO
run_fortran(StringIO(new_arr))
This returns an error: TypeError: execv() arg 2 must contain only strings which makes sense.

In fortran the read(*,*), read(5,*) or read*, statement will read from from standard input, and if that has the right format, it will work. If you are dealing with formatted data, that is, anything that is human readable, and not a binary file, then you probably need a read loop like:
do line=1,numlines
read(*,*) array(line,:)
enddo
No open or close statements needed. So if what you were writing to a file is passed directly, you should be able to remove the those statements and change the file unit to 5 or *.
Now there are more efficient ways to do this kind of communication, but any solution is a good solution if it suit your purpose.

If your Fortran program (it's that './output.e' AFAICS) can read from stdin, not only from a regular file, you can do without temporary files by passing it stdin=subprocess.PIPE (other valid values are "an existing file descriptor (a positive integer) [or] an existing file object"). In UNIX, there's always /dev/stdin to put on the command line btw and in Windows, there's con.
Still, if the program can only work in "sessions" due to the nature of processing (i.e. cannot run continuously and be fed new data as it's available), you'll have to invoke it repeatedly.
Note that you have to handle different pipes in different threads to avoid deadlocks. So, either use communicate() (but then the program cannot run continuously) or spawn an stdout/stderr thread manually (that's what communicate() does; not stdin because the output reading code has to be running by the time you start writing to stdin or the external program may choke with "no space left on device" while writing).
Here's sample code for running the program continuously:
p=subprocess.Popen(argv,stdin=subprocess.PIPE,stdout=subprocess.PIPE)
while True:
#the only more elegant way to return a value from a thread is with a Thread subclass,
# see http://stackoverflow.com/questions/6893968/how-to-get-the-return-value-from-a-thread-in-python
output_container=[]
ot=threading.Thread(target=_outputreaderthread,args=(p.stdout,output_container,data))
ot.start()
p.stdin.write(data)
ot.join()
output=output_container[0]
data=process_output(output)
if no_more_processing_needed(data): break
p.stdin.close()
if p.wait()!=0:raise subprocess.CalledProcessError(p.returncode,argv)
def _outputreaderthread(stream,container,data):
#since there's no process termination as an end mark here,
# you need to read exactly the right number of bytes to avoid getting blocked.
#use whatever technique is appropriate for your data (e.g. readline()'s)
output=stream.read(calculate_output_size(data))
container.append(output)

Python inline linux commands

I am testing sorting algorithms and therefore I would like to compine in my Python code, the linux command "time", because it takes some interesting arguments and for example the call of quicksort.
from subprocess import Popen
import quicksort
import rand
time=Popen("time quicksort.main(rand.main())")
This is tottaly wrong, but it is the closest I managed to get. I haven't grasped the idea of subprocess class, is it possible to combine method calls with linux commands, or only add commands in python like "grep..." and send the output to a variable??

If you use Popen from subprocess you need to do a lot of things differently.
I believe what you are looking for is check_output, another function belonging to the subprocess module.
But in order to further your understanding, since you are sort-of close, here is what you need to change to get it to work:
The command string "time quicksort.main(rand.main())" is not going to mean anything to bash. That is python. BUT in the case that it was valid bash language, it would need to be split on word boundaries (like bash WOULD normally do) so you would make it into a list:
['time', '...','...']
The only time you can pass Popen a command STRING (not a list) is when you set shell=True in the keywords to Popen.
But let's just leave shell at False, do some word-splitting for bash, and pass in a list. On to the next part.
Popen returns something you can communicate to/at/with. Not the result of the process' stdout. Use subprocess.PIPE for stdin and stdout keywords to Popen.
Once you have made a Popen object as described, you can call it's communicate method.
The result is two things, stdout and stderr.
You're after the first one. One use case for Popen is for when you need to keep errors and output seperate. Obviously this isn't turning out to be the best option for inline but oh well. Lets deal with stdout.
sdtout will probably need to be decoded:
stdout.decode()
or perhaps even have newlines stripped as well:
stdout.decode().rstrip()
So as you can see, Popen does not fit the use case you have in mind. There is no need to use subprocess and make system calls in order to time python. Look into timeit.

how to send tab-key to python subprocess's stdin

Background: I have a Python subprocess that connects to a shell-like application, which uses the readline library to handle input, and that app has a TAB-complete routine for command input, just like bash. The child process is spawned, like so:
def get_cli_subprocess_handle():
return subprocess.Popen(
'/bin/myshell',
shell=False,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
)
Everything works great, except tab-complete. Whenever my Python program passes the tab character, '\t' to the subprocess, I get 5 spaces in the STDIN, instead of triggering the readline library's tab-complete routine. :(
Question: What can I send to the subprocess's STDIN to trigger the child's tab-complete function? Maybe asked another way: How do I send the TAB key as opposed to the TAB character, if that is even possible?
Related but Unanswered and Derailed:
trigger tab completion for python batch process built around readline

The shell like application is probably differentiating between a terminal being connected to stdin and a pipe being connected to it. Many Unix utilities do just that to optimise their buffering (line vs. block) and shell-like utilities are likely to disable command completion facilities on batch input (i.e. PIPE) to avoid unexpected results. Command completion is really an interactive feature which requires a terminal input.
Check out the pty module and try using a master/slave pair as the pipe for your subprocess.

There really is no such thing as sending a tab key to a pipe. A pipe can only accept strings of bits, and if the tab character isn't doing it, there may not be a solution.
There is a project that does something similar called pexpect. Just looking at its interact() code, I'm not seeing anything obvious that makes it work and yours not. Given that, the most likely explanation is that pexpect actually does some work to make itself look like a pseudo-terminal. Perhaps you could incorporate its code for that?

Based on isedev's answer, I modified my code as follows:
import os, pty
def get_cli_subprocess_handle():
masterPTY, slaveTTY = pty.openpty()
return masterPTY, slaveTTY, subprocess.Popen(
'/bin/myshell',
shell=False,
stdin=slaveTTY,
stdout=slaveTTY,
stderr=slaveTTY,
)
Using this returned tuple, I was able to perform select.select([masterPTY],[],[]) and os.read(masterPTY, 1024) as needed, and I wrote to the master-pty with a function that is very similar to a private method in the pty module source:
def write_all(masterPTY, data):
"""Successively write all of data into a file-descriptor."""
while data:
chars_written = os.write(masterPTY, data)
data = data[chars_written:]
return data
Thanks to all for the good solutions. Hope this example helps someone else. :)

subprocess + multiprocessing - multiple commands in sequence

I have a set of command line tools that I'd like to run in parallel on a series of files. I've written a python function to wrap them that looks something like this:
def process_file(fn):
print os.getpid()
cmd1 = "echo "+fn
p = subprocess.Popen(shlex.split(cmd1))
# after cmd1 finishes
other_python_function_to_do_something_to_file(fn)
cmd2 = "echo "+fn
p = subprocess.Popen(shlex.split(cmd2))
print "finish"
if __name__=="__main__":
import multiprocessing
p = multiprocessing.Pool()
for fn in files:
RETURN = p.apply_async(process_file,args=(fn,),kwds={some_kwds})
While this works, it does not seem to be running multiple processes; it seems like it's just running in serial (I've tried using Pool(5) with the same result). What am I missing? Are the calls to Popen "blocking"?
EDIT: Clarified a little. I need cmd1, then some python command, then cmd2, to execute in sequence on each file.
EDIT2: The output from the above has the pattern:
pid
finish
pid
finish
pid
finish
whereas a similar call, using map in place of apply (but without any provision for passing kwds) looks more like
pid
pid
pid
finish
finish
finish
However, the map call sometimes (always?) hangs after apparently succeeding

Are the calls to Popen "blocking"?
No. Just creating a subprocess.Popen returns immediately, giving you an object that you could wait on or otherwise use. If you want to block, that's simple:
subprocess.check_call(shlex.split(cmd1))
Meanwhile, I'm not sure why you're putting your args together into a string and then trying to shlex them back to a list. Why not just write the list?
cmd1 = ["echo", fn]
subprocess.check_call(cmd1)
While this works, it does not seem to be running multiple processes; it seems like it's just running in serial
What makes you think this? Given that each process just kicks off two processes into the background as fast as possible, it's going to be pretty hard to tell whether they're running in parallel.
If you want to verify that you're getting work from multiple processing, you may want to add some prints or logging (and throw something like os.getpid() into the messages).
Meanwhile, it looks like you're trying to exactly duplicate the effects of multiprocessing.Pool.map_async out of a loop around multiprocessing.Pool.apply_async, except that instead of accumulating the results you're stashing each one in a variable called RESULT and then throwing it away before you can use it. Why not just use map_async?
Finally, you asked whether multiprocessing is the right tool for the job. Well, you clearly need something asynchronous: check_call(args(file1)) has to block other_python_function_to_do_something_to_file(file1), but at the same time not block check_call(args(file2)).
I would probably have used threading, but really, it doesn't make much difference. Even if you're on a platform where process startup is expensive, you're already paying that cost because the whole point is running N * M bunch of child processes, so another pool of 8 isn't going to hurt anything. And there's little risk of either accidentally creating races by sharing data between threads, or accidentally creating code that looks like it shares data between processes that doesn't, since there's nothing to share. So, whichever one you like more, go for it.
The other alternative would be to write an event loop. Which I might actually start doing myself for this problem, but I'd regret it, and you shouldn't do it…

Python's subprocess.Popen returns the same stdout even though it shouldn't

I'm having a very strange issue with Python's subprocess.Popen. I'm using it to call several times an external exe and keep the output in a list.
Every time you call this external exe, it will return a different string. However, if I call it several times using Popen, it will always return the SAME string. =:-O
It looks like Popen is returning always the same value from stdout, without recalling the exe. Maybe doing some sort of caching without actually calling again the exe.
This is my code:
def get_key():
from subprocess import Popen, PIPE
args = [C_KEY_MAKER, '/26', USER_NAME, ENCRYPTION_TEMPLATE, '0', ]
process = Popen(args, stdout=PIPE)
output = process.communicate()[0].strip()
return output
if __name__ == '__main__':
print get_key() # Returns a certain string
print get_key() # Should return another string, but returns the same!
What on Earth am I doing wrong?!

It is possible (if C_KEY_MAKER's random behaviour is based on the current time in seconds, or similar) that when you run it twice on the command line, the time has changed in between runs and so you get a different output, but when python runs it, it runs it twice in such quick succession that the time hasn't changed and so it returns the same value twice in a row.

Nothing. That works fine, on my own tests (aside from your indentation error at the bottom). The problem is either in your exe. or elsewhere.
To clarify, I created a python program tfile.py
cat > tfile.py
#!/usr/bin/env python
import random
print random.random()
And then altered tthe program to get rid of the indentation problem at the bottom, and to call tfile.py . It did give two different results.

I don't know what is going wrong with your example, I cannot replicate this behaviour, however try a more by-the-book approach:
def get_key():
from subprocess import Popen, PIPE
args = [C_KEY_MAKER, '/26', USER_NAME, ENCRYPTION_TEMPLATE, '0', ]
output = Popen(args, stdout=PIPE).stdout
data = output.read().strip()
output.close()
return data

Your code is not executable as is so it's hard to help you out much. Consider fixing indentation and syntax and making it self-contained, so that we can give it a try.
On Linux, it seems to work fine according to Devin Jeanpierre.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.