I have a command-line Golang executable on Windows named cnki-downloader.exe (open-sourced here: https://github.com/amyhaber/cnki-downloader). I want to run this executable in Python, and interact with it (get its output, then input something, then get output, and so on)
It's a command-line program, so I thought it's the same as a normal Windows command-line program built by MSVC. My code is like this:
# coding=gbk
from subprocess import Popen, PIPE
p = Popen(["cnki-downloader.exe"], stdin=PIPE, stdout=PIPE)
#p = Popen(["WlanHelper.exe"], stdin=PIPE, stdout=PIPE )
p.stdin.write( 'XXXXXX\n' )
result1 = p.stdout.read() # <---- we never return here
print result1
p.stdin.write( '1\n' )
result2 = p.stdout.read()
print result2
My Python code halts at the Popen call with cnki-downloader.exe parameter. Then I tried a C command-line program built by MSVC (named WlanHelper.exe), it runs well. My script can get the output from the exe.
So I suspect that the command-line mechanism of the Golang executable is different from a native C/C++ program, and it's difficult for other languages (like Python) to call and interact with it.
So I want to know how to interact with a Golang executable on Windows in other languages like Python?. If this is impossible, I can also consider modifying the Golang program's source code (because it's open source). But I hope I won't go that step. Thanks!
NOTE:
If possible, I want to call this Golang executable directly, instead of modifying it into a library and let Python import it. If I have to modify the Golang into a library, why not just remove the interactive manner and make everything into command-line parameters? There's no need to bother writing a Golang library. So please let's assume the Golang program is closed-sourced. I don't think there's no way for Python to call a command-line Golang program. If this is the case, then I think Golang REALLY needs to be improved in the interoperability with other languages.
Looking at the documentation for the subprocess module you are using, it seems they have a warning about deadlocks using stdout.read() and friends:
Warning This will deadlock when using stdout=PIPE and/or stderr=PIPE
and the child process generates enough output to a pipe such that it
blocks waiting for the OS pipe buffer to accept more data. Use
communicate() to avoid that.
So, maybe try using communicate instead?
Related
I am using a 3rd-party python module which is normally called through terminal commands. When called through terminal commands it has a verbose option which prints to terminal in real time.
I then have another python program which calls the 3rd-party program through subprocess. Unfortunately, when called through subprocess the terminal output no longer flushes, and is only returned on completion (the process takes many hours so I would like real-time progress).
I can see the source code of the 3rd-party module and it does not set printing to be flushed such as print('example', flush=True). Is there a way to force the flushing through my module without editing the 3rd-party source code? Furthermore, can I send this output to a log file (again in real time)?
Thanks for any help.
The issue is most likely that many programs work differently if run interactively in a terminal or as part of a pipe line (i.e. called using subprocess). It has very little to do with Python itself, but more with the Unix/Linux architecture.
As you have noted, it is possible to force a program to flush stdout even when run in a pipe line, but it requires changes to the source code, by manually applying stdout.flush calls.
Another way to print to screen, is to "trick" the program to think it is working with an interactive terminal, using a so called pseudo-terminal. There is a supporting module for this in the Python standard library, namely pty. Using, that, you will not explicitly call subprocess.run (or Popen or ...). Instead you have to use the pty.spawn call:
def prout(fd):
data = os.read(fd, 1024)
while(data):
print(data.decode(), end="")
data = os.read(fd, 1024)
pty.spawn("./callee.py", prout)
As can be seen, this requires a special function for handling stdout. Here above, I just print it to the terminal, but of course it is possible to do other thing with the text as well (such as log or parse...)
Another way to trick the program, is to use an external program, called unbuffer. Unbuffer will take your script as input, and make the program think (as for the pty call) that is called from a terminal. This is arguably simpler if unbuffer is installed or you are allowed to install it on your system (it is part of the expect package). All you have to do then, is to change your subprocess call as
p=subprocess.Popen(["unbuffer", "./callee.py"], stdout=subprocess.PIPE)
and then of course handle the output as usual, e.g. with some code like
for line in p.stdout:
print(line.decode(), end="")
print(p.communicate()[0].decode(), end="")
or similar. But this last part I think you have already covered, as you seem to be doing something with the output.
I have recently came across a few posts on stack overflow saying that subprocess is much better than os.system, however I am having difficulty finding the exact advantages.
Some examples of things I have run into:
https://docs.python.org/3/library/os.html#os.system
"The subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using this function."
No idea in what ways it is more powerful though, I know it is easier in many ways to use subprocess but is it actually more powerful in some way?
Another example is:
https://stackoverflow.com/a/89243/3339122
The advantage of subprocess vs system is that it is more flexible (you can get the stdout, stderr, the "real" status code, better error handling, etc...).
This post which has 2600+ votes. Again could not find any elaboration on what was meant by better error handling or real status code.
Top comment on that post is:
Can't see why you'd use os.system even for quick/dirty/one-time. subprocess seems so much better.
Again, I understand it makes some things slightly easier, but I hardly can understand why for example:
subprocess.call("netsh interface set interface \"Wi-Fi\" enable", shell=True)
is any better than
os.system("netsh interface set interface \"Wi-Fi\" enabled")
Can anyone explain some reasons it is so much better?
First of all, you are cutting out the middleman; subprocess.call by default avoids spawning a shell that examines your command, and directly spawns the requested process. This is important because, besides the efficiency side of the matter, you don't have much control over the default shell behavior, and it actually typically works against you regarding escaping.
In particular, do not do this:
subprocess.call('netsh interface set interface "Wi-Fi" enable')
since
If passing a single string, either shell must be True (see below) or else the string must simply name the program to be executed without specifying any arguments.
Instead, you'll do:
subprocess.call(["netsh", "interface", "set", "interface", "Wi-Fi", "enable"])
Notice that here all the escaping nightmares are gone. subprocess handles escaping (if the OS wants arguments as a single string - such as Windows) or passes the separated arguments straight to the relevant syscall (execvp on UNIX).
Compare this with having to handle the escaping yourself, especially in a cross-platform way (cmd doesn't escape in the same way as POSIX sh), especially with the shell in the middle messing with your stuff (trust me, you don't want to know what unholy mess is to provide a 100% safe escaping for your command when calling cmd /k).
Also, when using subprocess without the shell in the middle you are sure you are getting correct return codes. If there's a failure launching the process you get a Python exception, if you get a return code it's actually the return code of the launched program. With os.system you have no way to know if the return code you get comes from the launched command (which is generally the default behavior if the shell manages to launch it) or it is some error from the shell (if it didn't manage to launch it).
Besides arguments splitting/escaping and return code, you have way better control over the launched process. Even with subprocess.call (which is the most basic utility function over subprocess functionalities) you can redirect stdin, stdout and stderr, possibly communicating with the launched process. check_call is similar and it avoids the risk of ignoring a failure exit code. check_output covers the common use case of check_call + capturing all the program output into a string variable.
Once you get past call & friends (which is blocking just as os.system), there are way more powerful functionalities - in particular, the Popen object allows you to work with the launched process asynchronously. You can start it, possibly talk with it through the redirected streams, check if it is running from time to time while doing other stuff, waiting for it to complete, sending signals to it and killing it - all stuff that is way besides the mere synchronous "start process with default stdin/stdout/stderr through the shell and wait it to finish" that os.system provides.
So, to sum it up, with subprocess:
even at the most basic level (call & friends), you:
avoid escaping problems by passing a Python list of arguments;
avoid the shell messing with your command line;
either you have an exception or the true exit code of the process you launched; no confusion about program/shell exit code;
have the possibility to capture stdout and in general redirect the standard streams;
when you use Popen:
you aren't restricted to a synchronous interface, but you can actually do other stuff while the subprocess run;
you can control the subprocess (check if it is running, communicate with it, kill it).
Given that subprocess does way more than os.system can do - and in a safer, more flexible (if you need it) way - there's just no reason to use system instead.
There are many reasons, but the main reason is mentioned directly in the docstring:
>>> os.system.__doc__
'Execute the command in a subshell.'
For almost all cases where you need a subprocess, it is undesirable to spawn a subshell. This is unnecessary and wasteful, it adds an extra layer of complexity, and introduces several new vulnerabilities and failure modes. Using subprocess module cuts out the middleman.
I have a Python program from which I spawn a sub-program to process some files without holding up the main program. I'm currently using bash for the sub-program, started with a command and two parameters like this:
result = os.system('sub-program.sh file.txt file.txt &')
That works fine, but I (eventually!) realised that I could use Python for the sub-program, which would be far preferable, so I have converted it. The simplest way of spawning it might be:
result = os.system('python3 sub-program.py file.txt file.txt &')
Some research has shown several more sophisticated alternatives, but I have the impression that the latest and most approved method is this one:
subprocess.Popen(["python3", "-u", "sub-program.py"])
Am I correct in thinking that that is the most appropriate way of doing it? Would anyone recommend a different method and why? Simple would be good as I'm a bit of a Python novice.
If this is the recommended method, I can probably work out what the "-u" does and how to add the parameters for myself.
Optional extras:
Send a message back from the sub-program to the main program.
Make the sub-program quit when the main program does.
Yes, using subprocess is the recommended way to go according to the documentation:
The subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using this function.
However, subprocess.Popen may not be what you're looking for. As opposed to os.system you will create a Popen object that corresponds to the subprocess and you'll have to wait for it in order to wait for it's completion, fx:
proc = subprocess.Popen(["python3", "-u", "sub-program.py"])
do_something()
res = proc.wait()
If you want to just run a program and wait for completion you should probably use subprocess.run (or maybe subprocess.call, subprocess.check_call or subprocess.check_output) instead.
Thanks skyking!
With
import subprocess
at the beginning of the main program, this does what I want:
with open('output.txt', 'w') as f:
subprocess.Popen([spawned.py, parameter1, parameter2], stdout = f)
The first line opens a file for the output from the sub-program started in the second line. In the second line, the square brackets contain the stuff for the sub-program - name followed by two parameters. The parameters are available in the sub-program in sys.argv[1] and sys.argv[2]. After that come the subprocess parameters - the f says to output to the text file mentioned above.
Is there any particular reason it has to be another program entirely? Why not just spawn another process which runs one of the functions defined within your script?
I suggest that you read up on multiprocessing. Python has module just for that: https://docs.python.org/dev/library/multiprocessing.html
Here you can find info on spawning new processes, communicating between them and syncronizing them.
Be warned though that if you want to really speed up your file processing you'll want to use processes instead of threads (due to some limitations in python, threads will only slow you down which is confusing).
Also check out this page: https://pymotw.com/2/multiprocessing/basics.html
It has some code samples that will help you out a lot.
Don't forget this guard in your script:
if __name__ == '__main__':
It is very important ;)
I have a weird behaviour while managing a third party executable in my python code. Conceptually I have the following code in python:
import subprocess
p = subprocess.Popen([r'c:\path\to\programme.exe', '-d'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
out, err = p.communicate()
print p.returncode, out, err
And the tool crashes showing in out its traceback, and returning an error code that means "unhandled exception". I have tried with a simple os.system(...) with the same results.
But, here comes the fun part, when I just paste the command in the windows shell, it works perfectly...
C:\> c:\path\to\programme.exe -d
The python interpreter is a 32bit 2.7.2 version.
So... what can be the difference between these two calls that leads to the crash? thanks in advance.
Extra info
I am not quite sure if this helps, but this external tool connects to a database and performs some operations. With some RDBMS it works when called from python code, but when it connects to an Oracle DB, it crashes. So the python code seems to be right, there is just a factor or difference that I don't know.
Well, you really don´t provide much info. I will make a guess based on my own experience dealing with situations like this.
Make sure you´re running the Python app as admin if the third party app require priveleges.
Check there is no problem with the working dir. Meaning, if the program opens some file or in any way it references to some relative path, you must change your working directory when executing from python. See code below for how to do this.
If the programm you're executing is a builtin windows shell app (dir, copy, etc...) consider using shell=True when creating the Popen object. See Popen constructor reference.
Python sets or modifies some environment variable needed/used by your third party application.
Code for changing working directory within the running Python app.
import os
os.chdir('/path_you_need/python/work_from')
You are supposed to use raw string.
p = subprocess.Popen([r'c:\path\to\programme.exe', '-d'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
else you can use \\ instead of using raw string like this:-
p = subprocess.Popen(['c:\\path\\to\\programme.exe', '-d'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
How can i run an external program, let's say "Firefox", from my python script and make sure that its process will remain alive after the termination of my python script? I want to make it crossplatform if it's doable.
There is no cross-platform way to do this with just the stdlib. However, if you write code for POSIX and for Windows, that's usually good enough, right?
On Windows, you want to pass a creationflags argument. Read the docs (both there and at MSDN) and decide whether you want a console-detached process, a new-console process, or a new-process-group process, then use the appropriate flag. You may also want to set some of the flags in startupinfo; again, MSDN will tell you what they mean.
On POSIX, if you just want the simplest behavior, and you're using 3.2+, you want to pass start_new_session=True. In earlier Python versions, or for other cases, you want to pass a preexec_fn that allows you to do whatever daemonization you want. That could be as little as os.setsid() (what start_new_session does), or a whole lot more. See PEP 3143 -- Standard daemon process library for a discussion of all of the different things you might want to do here.
So, the simplest version is:
def launch_in_background(args):
try:
subprocess.CREATE_NEW_PROCESS_GROUP
except AttributeError:
# not Windows, so assume POSIX; if not, we'll get a usable exception
p = subprocess.Popen(args, start_new_session=True)
else:
# Windows
p = subprocess.Popen(args, creationflags=subprocess.CREATE_NEW_PROCESS_GROUP)
If you're willing to go outside the stdlib, there are dozens of "shell-type functionality" libraries out there, many of which have some kind of "detach" functionality. Just search shell, cli, or subprocess at PyPI and find the one you like best.