I have some J code that I'd like to run on an array variable from my Python script. The Python array is simply a variable with 200 floating point numbers. I am aware of memory mapped files, but this seems very low level and technical.
Is there a simple way to call a J function or script from Python without dropping out to the shell, as in:
import subprocess
subprocess.check_output(["echo function.ijs | ijconsole"])
Using this method, I first need to save out my variable into a temporary file, and the J program needs to load that file. Is there a more elegant way?
If you have a single string to pass data to a subprocess's input, and want to read its output all at once, use Popen.communicate().
j_process = subprocess.Popen(["ijconsole"],
stdin=subprocess.PIPE, stdout=subprocess.PIPE)
j_output, _ = j_process.communicate(j_input)
If the interaction is more complex, you may use communicate with Popen.stdin/Popen.stdout, but be careful - it's possible to deadlock due to buffering.
Related
I'm currently struggling with communicating to a subprocess.
To explain the situation:
I have a class that endlessly writes to the sys.stdout
def runGenerator(self):
while self.runEvent.is_set():
sys.stdout.buffer.write(struct.pack('I', self.next()))
(self.next is an unsigned 32 bit integer)
This works fine so far, even though sys.stdout.buffer.write is a bit slow.
But now i create a subprocess in which i call a module named diehader like this:
def runSub(self):
args = [DIEHARDER, GENERATOR_NUMBER, '-d0']
dieharderTestProc = subprocess.Popen(
args, stdin=subprocess.PIPE, stdout=subprocess.PIPE)
print(dieharderTestProc.stdout.readlines())
self.runEvent.clear()
And my goal now is to use the sys.stdout (that is generated by the previous function) as the stdin of this subprocess. But i can't seem to find a solution.
I wan't to keep the input in sys.stdout because my end goal is to have multiple subprocess which can all just use this one sys.stdout, so that i need to generate these numbers only once.
Also these numbers are generated while the subprocess is running as another multiprocess(for time reason).
That means I can't just give the subprocess a list or some object with my numbers, because they are generated on the spot.
The only way I got the communication to work was like this:
while dieharderTestProc.returncode is None:
dieharderTestProc.stdin.write(struct.pack('>I', struct.pack('I', self.next())))
dieharderTestProc.poll()
But this way I would have to generate these numbers for every subprocess that i call separately, which just cost to much time.
Thanks for any solution, ideas or tips you can provide me :)
I am trying to spawn a Rust process from a Python program and redirect Python's standard output into its standard input. I have used the following function:
process = subprocess.Popen(["./target/debug/mypro"], stdin=subprocess.PIPE)
and tried to write to the subprocess using:
process.stdin.write(str.encode(json.dumps(dictionnaire[str(index)]))) #Write bytes of Json representation of previous track
I am not getting any errors but standard input in Rust doesn't seem to take any input and standard output isn't printing anything at all.
Here's the version of the Rust code I am currently running:
extern crate rustc_serialize;
use rustc_serialize::json::Json;
use std::fs::File;
use std::io;
use std::env;
use std::str;
fn main(){
let mut buffer = String::new();
let stdin = io::stdin();
//stdin.lock();
stdin.read_line(&mut buffer).unwrap();
println!{"{}", buffer};
println!{"ok"};
}
process.stdin.write(str.encode(json.dumps(dictionnaire[str(index)])) does not add a newline character by default, so on the Rust side I was never getting to the end of the line which was making the process block on read_line.
Adding it manually made everything work smoothly.
process.stdin.write(str.encode(json.dumps(dictionnaire[str(index)])+ "\n") )
This may be a problem on the Python side
subprocess.run(["cargo run -- " + str(r)], shell=True)
This assumes that you have a numeric file descriptor that remains open across fork and exec. Spawning processes may close file descriptors either because they're marked as CLOEXEC or due to explicit cleanup code before exec.
Before attempting to pass a numeric file descriptor as a string argument, you should make sure that they will remain valid in the new process.
A better approach is to use some process spawning API that allows you to explicitly map the file descriptors in the new process to open handles or an API that spawns a process with stdin/out tied to pipes.
I have a Fortran code that takes a file as input and writes the output into stdout. To avoid read/write cycles, I'd like to run the code inside python and convert the output to a numpy array. I can do this using the following function:
def run_fortran(infile):
import subprocess
import numpy as np
cmd = ['./output.e', infile]
p = subprocess.Popen(cmd, stdout = subprocess.PIPE)
out, err = p.communicate()
if p.returncode == 0:
return np.array(out.split(),dtype=int)
Now I take the array, modify it and write it into a file. The new file is again passed into run_fortran(infile). Can I avoid this step and somehow use the output of run_fortran instead of passing a filename?
I tried two cases with no success:
(1) converting to string:
arr = run_fortran('input.txt')
new_arr = str(arr).replace("[","").replace("]","")
run_fortran(new_arr)
This returns an empty array.
(2) converting to a file type object using StringIO:
from cStringIO import StringIO
run_fortran(StringIO(new_arr))
This returns an error: TypeError: execv() arg 2 must contain only strings which makes sense.
In fortran the read(*,*), read(5,*) or read*, statement will read from from standard input, and if that has the right format, it will work. If you are dealing with formatted data, that is, anything that is human readable, and not a binary file, then you probably need a read loop like:
do line=1,numlines
read(*,*) array(line,:)
enddo
No open or close statements needed. So if what you were writing to a file is passed directly, you should be able to remove the those statements and change the file unit to 5 or *.
Now there are more efficient ways to do this kind of communication, but any solution is a good solution if it suit your purpose.
If your Fortran program (it's that './output.e' AFAICS) can read from stdin, not only from a regular file, you can do without temporary files by passing it stdin=subprocess.PIPE (other valid values are "an existing file descriptor (a positive integer) [or] an existing file object"). In UNIX, there's always /dev/stdin to put on the command line btw and in Windows, there's con.
Still, if the program can only work in "sessions" due to the nature of processing (i.e. cannot run continuously and be fed new data as it's available), you'll have to invoke it repeatedly.
Note that you have to handle different pipes in different threads to avoid deadlocks. So, either use communicate() (but then the program cannot run continuously) or spawn an stdout/stderr thread manually (that's what communicate() does; not stdin because the output reading code has to be running by the time you start writing to stdin or the external program may choke with "no space left on device" while writing).
Here's sample code for running the program continuously:
p=subprocess.Popen(argv,stdin=subprocess.PIPE,stdout=subprocess.PIPE)
while True:
#the only more elegant way to return a value from a thread is with a Thread subclass,
# see http://stackoverflow.com/questions/6893968/how-to-get-the-return-value-from-a-thread-in-python
output_container=[]
ot=threading.Thread(target=_outputreaderthread,args=(p.stdout,output_container,data))
ot.start()
p.stdin.write(data)
ot.join()
output=output_container[0]
data=process_output(output)
if no_more_processing_needed(data): break
p.stdin.close()
if p.wait()!=0:raise subprocess.CalledProcessError(p.returncode,argv)
def _outputreaderthread(stream,container,data):
#since there's no process termination as an end mark here,
# you need to read exactly the right number of bytes to avoid getting blocked.
#use whatever technique is appropriate for your data (e.g. readline()'s)
output=stream.read(calculate_output_size(data))
container.append(output)
I'm currently creating a script that'll loop run a set of Subprocess, and then wait for all the subprocess to finish. I have to add variables in to the subprocess before running them, so I was thinking of writing it as a string, and then converting the string to a command? Would something like that exist?
For example, I have these stringss:
"p1 = subprocess.Popen('python','hello.py')"
"p2 = subprocess.Popen('python','hello2.py')"
How would I execute it to be able to call p1 or p2 later on in the script? (E.g p1.wait())
Using strings is a bad idea, I'd use a list:
options = [('python','hello.py'), ('python','hello2.py')]
for option in options:
process = subprocess.Popen(option)
#do something here
exec("p1 = subprocess.Popen('python','hello.py')")
Note that exec executes statements, while eval evaulates expressions.
But I agree with the other answer that it's better to do this in a different way if you can. One thing you should definitely never do is execute arbitrary strings whose source you don't know, for instance if they could come from a user.
I have a set of command line tools that I'd like to run in parallel on a series of files. I've written a python function to wrap them that looks something like this:
def process_file(fn):
print os.getpid()
cmd1 = "echo "+fn
p = subprocess.Popen(shlex.split(cmd1))
# after cmd1 finishes
other_python_function_to_do_something_to_file(fn)
cmd2 = "echo "+fn
p = subprocess.Popen(shlex.split(cmd2))
print "finish"
if __name__=="__main__":
import multiprocessing
p = multiprocessing.Pool()
for fn in files:
RETURN = p.apply_async(process_file,args=(fn,),kwds={some_kwds})
While this works, it does not seem to be running multiple processes; it seems like it's just running in serial (I've tried using Pool(5) with the same result). What am I missing? Are the calls to Popen "blocking"?
EDIT: Clarified a little. I need cmd1, then some python command, then cmd2, to execute in sequence on each file.
EDIT2: The output from the above has the pattern:
pid
finish
pid
finish
pid
finish
whereas a similar call, using map in place of apply (but without any provision for passing kwds) looks more like
pid
pid
pid
finish
finish
finish
However, the map call sometimes (always?) hangs after apparently succeeding
Are the calls to Popen "blocking"?
No. Just creating a subprocess.Popen returns immediately, giving you an object that you could wait on or otherwise use. If you want to block, that's simple:
subprocess.check_call(shlex.split(cmd1))
Meanwhile, I'm not sure why you're putting your args together into a string and then trying to shlex them back to a list. Why not just write the list?
cmd1 = ["echo", fn]
subprocess.check_call(cmd1)
While this works, it does not seem to be running multiple processes; it seems like it's just running in serial
What makes you think this? Given that each process just kicks off two processes into the background as fast as possible, it's going to be pretty hard to tell whether they're running in parallel.
If you want to verify that you're getting work from multiple processing, you may want to add some prints or logging (and throw something like os.getpid() into the messages).
Meanwhile, it looks like you're trying to exactly duplicate the effects of multiprocessing.Pool.map_async out of a loop around multiprocessing.Pool.apply_async, except that instead of accumulating the results you're stashing each one in a variable called RESULT and then throwing it away before you can use it. Why not just use map_async?
Finally, you asked whether multiprocessing is the right tool for the job. Well, you clearly need something asynchronous: check_call(args(file1)) has to block other_python_function_to_do_something_to_file(file1), but at the same time not block check_call(args(file2)).
I would probably have used threading, but really, it doesn't make much difference. Even if you're on a platform where process startup is expensive, you're already paying that cost because the whole point is running N * M bunch of child processes, so another pool of 8 isn't going to hurt anything. And there's little risk of either accidentally creating races by sharing data between threads, or accidentally creating code that looks like it shares data between processes that doesn't, since there's nothing to share. So, whichever one you like more, go for it.
The other alternative would be to write an event loop. Which I might actually start doing myself for this problem, but I'd regret it, and you shouldn't do it…