python calling perl, file isn't created by Popen until read()

python calling perl, file isn't created by Popen until read() - python

Edit: Oops! I accidentally posted code that didn't match my question. I started writing this post before I was done experimenting, so I posted code from an intermediate point in my testing process. I've amended this question to reflect what I really meant to post by changing the code slightly.
The following Python code works as expected with one problem: unless I If verbose == False, the file produced by the Perl script is not created. Why do I have to call output.stdout.read() for the underlying Perl script to successfully create a file?
cmdStringList = ["perl","script.pl","arg1",...]
output = subprocess.Popen(cmdStringList,stdout=subprocess.PIPE)
if verbose:
print output.stdout.read()
I didn't even realize anything was wrong until I tried to execute my Python script with verbose=False in a production environment. I did some google-fu to try to understand the behavior of Popen and subprocess, but I haven't come up with a reason for this behavior. Any help would be greatly appreciated.

You actually have the following code, right?
if verbose:
outputRead = output.stdout.read()
print outputRead
It's quite likely that the problem is that the child is blocked trying to write to STDOUT. Until you make some space in the pipe by reading from it, the child won't be able to finish writing to STDOUT and move on to where it creates the file to which you refer.
If you want to prevent the child from blocking without reading, redirect its stdout to nul (Windows) or /dev/null (elsewhere). This should work everywhere but Windows:
cmdStringList = ["sh","-c","\"$0\"" \"$#\" >/dev/null","perl","script.pl","arg1",...]
(Pardon any syntax errors. I don't know Python at all.)

You should keep reading a stream, otherwise the buffer may become full, then your perl process may get blocked. It has nothing to do with a specific language but the underyling operating system.
I encountered a similar problem when I use Java.

Related

Starting process in Google Colab with Prefix "!" vs. "subprocess.Popen(..)"

I've been using Google Colab for a few weeks now and I've been wondering what the difference is between the two following commands (for example):
!ffmpeg ...
subprocess.Popen(['ffmpeg', ...
I was wondering because I ran into some issues when I started either of the commands above and then tried to stop execution midway. Both of them cancel on KeyboardInterrupt but I noticed that after that the runtime needs a factory reset because it somehow got stuck. Checking ps aux in the Linux console listed a process [ffmpeg] <defunct> which somehow still was running or at least blocking some ressources as it seemed.
I then did some research and came across some similar posts asking questions on how to terminate a subprocess correctly (1, 2, 3). Based on those posts I generally came to the conclusion that using the subprocess.Popen(..) variant obviously provides more flexibility when it comes to handling the subprocess: Defining different stdout procedures or reacting to different returncode etc. But I'm still unsure on what the first command above using the ! as prefix exactly does under the hood.
Using the first command is much easier and requires way less code to start this process. And assuming I don't need a lot of logic handling the process flow it would be a nice way to execute something like ffmpeg - if I were able to terminate it as expected. Even following the answers from the other posts using the 2nd command never got me to a point where I could terminate the process fully once started (even when using shell=False, process.kill() or process.wait() etc.). This got me frustrated, because restarting and re-initializing the Colab instance itself can take several minutes every time.
So, finally, I'd like to understand in more general terms what the difference is and was hoping that someone could enlighten me. Thanks!

! commands are executed by the notebook (or more specifically by the ipython interpreter), and are not valid Python commands. If the code you are writing needs to work outside of the notebook environment, you cannot use ! commands.
As you correctly note, you are unable to interact with the subprocess you launch via !; so it's also less flexible than an explicit subprocess call, though similar in this regard to subprocess.call
Like the documentation mentions, you should generally avoid the bare subprocess.Popen unless you specifically need the detailed flexibility it offers, at the price of having to duplicate the higher-level functionality which subprocess.run et al. already implement. The code to run a command and wait for it to finish is simply
subprocess.check_call(['ffmpeg', ... ])
with variations for capturing its output (check_output) and the more modern run which can easily replace all three of the legacy high-level calls, albeit with some added verbosity.

Python3 curses code not working in pipeline

I am writing a python script that I want to use in a unix pipeline. My goal is to write to the screen using curses (which should only be seen by the person running the command, not the pipe), and then write the "return value" to stdout at the end so it can continue down the pipeline, something along the lines of ./myscript.py | consumer_script
This was failing in mysterious ways until I found This. The suggested solution was to use newterm instead of init_scr.
My problem is that I am using python, and from what I could find in the documentation, newterm doesnt exist. All I was able to find was a single reference to newterm, and it didn't come with a link.
Could someone please either point me towards the python newterm, or suggest another way of working with pipes and curses.

I think you're making this more complicated than it needs to be... the simple answer is to write the curses stream to another handle than stdout. If it works for you, stderr is the obvious choice. In short, anything that gets written to stdout goes into the pipeline, and if you don't want it there, you need a different handle.
Check out this thread for ways to write to stderr in python:
How to print to stderr in Python?

sys.stdout.flush() not working properly with python and electronjs

I'm building an electronjs python application and I'm using the pythonshell module. The electron application is supposed to log any messages my python script prints to the console, but rather than printing each message when it's supposed to be printed it waits until the script has finished executing and then prints everything. I've tried using sys.stdout.write("message") and then sys.stdout.flush(), but it still doesn't work.
The question I'm linking has a similar problem that I do, but the answer that worked for them didn't work for me on the electron application. It's flushing properly on the python backend, the frontend is what's causing the problem.
Similar question: Python sys.stdout.flush() doesn't work

file.flush() does not necessarily write the file’s data to disk!. you need to Use flush() followed by os.fsync(fd) to ensure this behavior.
see below:
sys.stdout.flush()
os.fsync(sys.stdout.fileno())
os.fsync(fd) documentation from python docs
Force write of file with file descriptor fd to disk. On Unix, this
calls the native fsync() function; on Windows, the MS _commit()
function.
If you’re starting with a Python file object f, first do f.flush(),
and then do os.fsync(f.fileno()), to ensure that all internal buffers
associated with f are written to disk.

Is there a simple way to launch a background task from a Python CGI script without waiting around for it to terminate?

In Windows, that is.
I think the answer to this question is that I need to create a Windows service. This seems ludicrously heavyweight for what I am trying to do.
I'm just trying to slap together a little prototype here for my manager, I'm not going to be responsible for productizing it... in fact, it may never even BE productized; it might just be something that a few researchers play around with.
I have a CGI script that receives a file for upload, stores it to a temporary location, then launches a background process to do some serious number-crunching on the file. Then some Javascript stuff sits around calling other CGI scripts to check on the status and update the page as needed.
All of this works, except the damn web server won't close the connection as long as the subrocess is running. I've done some searching, and it appears the answer on Unix is to make it a daemon, but I'm stuck on Windows right now and I guess the answer there is to make it a Windows service?!? This seems incredibly heavyweight to just, you know, launch a damn process and then close the server connection.
That's really the only way?
Edit: Okay, found a nifty little hack over here (the choice (3) that the guy gives):
How to completely background a process in Perl CGI under IIS
I was able to modify this to make it even simpler, and although this is a klugey solution, it is perfect for the quick-and-dirty little prototype I am trying to make.
So I initially had my main script doing this:
subprocess.Popen("python.exe","myscript.py","arg1","arg2")
Which doesn't work, as I've described. Instead, I now have my main script emit this little bit of Javascript which runs after the document is fully loaded:
$("#somecrap").load("launchBackgroundProcess.py", {arg1:"foo",arg2:"bar"});
And then launchBackgroundProcess.py does the subprocess.Popen.
This solution would never scale, since it still leaves the browser connection open during the entire time the background task is running. But since this little thinger I am whipping up might someday have two simultaneous users at most (even then I doubt it) resources are not a concern. This allows the user to see the main page and get the Javascript updates even though there is still an http connection hanging open for no good reason.
Thanks for the answers! If I'm ever asked to productize this, I'll take at the resources Profane recommends.

If you haven't much experience with windows programming and don't wish to peruse the MSDN docs-- I don't blame you-- you may want to try to pick up a copy of Mark Hammond's cannonical guide to all things python and windows. It somehow never goes out-of-date on many of these sorts of recurring questions. Instead of launching the process with the every-platform solution, you'd probably be better off using the win32process module. Chapter 17 of the Hammond book covers this extensively, but you could probably get all you need by downloading the pywin ide (I think it comes bundled in the windows extensions which you can download from pypi), and looking through the help docs it has on python's windows' api. Here's an example of using the api, from a project I was working on recently. It may in fact do some of what you want with a little adaptation. You'd probably want to focus on CreationFlags. In particular, win32process.DETACHED_PROCESS is "often used to execute console programs in the background." Many other flags are available and conveniently wrapped however.
if subprocess.mswindows:
su=subprocess.STARTUPINFO()
su.dwFlags |= subprocess._subprocess.STARTF_USESHOWWINDOW
process = subprocess.Popen(['program', 'flag', 'flag2'], bufsize=-1,
stdout=subprocess.PIPE, startupinfo=su)

Simplest, but not most efficient way would be to just run another python executable
from subprocess import Popen
Popen("python somescript.py")

You can just use a system call using the "start" windows command. This way your python script will not wait for the completion of the started program.

CGI scripts are run with standard output redirected, either directly to the TCP socket or to a pipe. Typically, the connection won't close until the handle, and all copies of it, are closed. By default, the subprocess will inherit a copy of the handle.
There are two ways to prevent the connection from waiting on the subprocess. One is to prevent the subprocess from inheriting the handle, the other is for the subprocess to close its copy of the handle when it starts.
If the subprocess is in Perl, I think you could close the handle very simply:
close(STDOUT);
If you want to prevent the subprocess from inheriting the handle, you could use the SetHandleInformation function (if you have access to the Win32 API) or set bInheritHandles to FALSE in the call to CreateProcess. Alternatively, close the handle before launching the subprocess.

Running multiple processes and capturing the output in python with pygtk

I'd like to write a simple application that runs multiple programs and displays their output in multiple terminal (style) windows. In addition, I want to be able to read the stdout/stderr of these processes and search for keywords in the output.
I've tried implementing this two ways in python, the first using subprocess.Popen and the second using vte (python-vte).
I've only gotten Popen to work w/ polling. I have to constantly check to see if the processes have data to be read, read the data, and then send it to my TextArea. It's been recommended to use gobject.io_add_watch() instead, but whenever I try that my program hangs on the second call to io_add_watch--it's like it can only handle one file descriptor at a time.
vte works great but I haven't found a reliable way to capture the output. You can get a callback when the cursor moves and then screen scrape w/ get_text(), but I've already run into cases where these programs I'm viewing generate an obscene about of tty in one go and then it's off the screen. There doesn't appear to be a callback that contains new text to be added to the window.
Any ideas?

I did something similar to this using the subprocess.Popen. For each process I actually ended up redirecting the stdout and stderr to a temporary file, then periodically checking the file for updates and dumping the output into a TextView.
The reason for not using a pipe to the process was that the processes themselves were volatile and prone to segfaults. When that happened I sometimes lost data between the last read and the segfault (which was the most needed data to determine the cause of the segfault).
As it turned out, sometimes I'd want to save the output from a specific process, so this method worked well for me.

If you go with igkuk's suggestion, I got some good advice on watching files for changes in a related question. That worked pretty well for me (I was watching a log file for changes).

You want to use select to monitor the pipes from your subprocesses. It's better than polling.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.