Best way to call executable and get output in python - python

First a simple example to illustrate the situation. First the source .cpp that has been compiled into an executable:
#include <iostream>
using std::cout;
using std::endl;
int main()
{
cout << "Hello World!" << endl;
return 0;
}
This file has been made into an exe. Now in python it is called as follows:
import subprocess
import sys
sys.path.append('C:\Foo\Bar')
output = subprocess.Popen("hello_world.exe", stdout=subprocess.PIPE)
print output.stdout.read()
So this works fine for calling the executable and returning an output from the standard output. The question is, is there any feasible way to perform this to return potentially large arrays? From what I understand this is basically like returning and reading a text file, which would be quite slow! While I know SWIG or Cython are options for extending python with C++, I would find separate executable for each function more organized and modular!
TLDR: Can you return large arrays from executables back to python at a reasonable speed? Or is extending with Cython/SWIG/ctypes the only way to go?

You have two options.
subprocess.check_output or similar will return the entire program output as a string
subprocess.popen and incrementally read.
In the 1st scenario the output must fit in ram, otherwise very bad things will happen. In the second scenario you can set the buffer size and your subprocess should then halt until you read from the buffer at which point it will continue again.
cython/swig etc. etc. do nothing to reduce the memory footprint. If this is memory bound then you most likely want to use python and incrementally read/write to disk. Python is very efficient at passing around memory (i.e. it will not do eleventy seven memory copies unless you tell it to do that).

Related

How can I pass variables from Python back to C++?

I have 2 files - a .cpp file and a .py file. I use system("python something.py"); to run the .py file and it has to get some input. How do I pass the input back to the .cpp file? I don't use the Python.h library, I have two separate files.
system() is a very blunt hammer and doesn't support much in the way of interaction between the parent and the child process.
If you want to pass information from the Python script back to the C++ parent process, I'd suggest having the python script print() to stdout the information you want to send back to C++, and have the C++ program parse the python script's stdout-output. system() won't let you do that, but you can use popen() instead, like this:
#include <stdio.h>
int main(int, char **)
{
FILE * fpIn = popen("python something.py", "r");
if (fpIn)
{
char buf[1024];
while(fgets(buf, sizeof(buf), fpIn))
{
printf("The python script printed: [%s]\n", buf);
// Code to parse out values from the text in (buf) could go here
}
pclose(fpIn); // note: be sure to call pclose(), *not* fclose()
}
else printf("Couldn't run python script!\n");
return 0;
}
If you want to get more elaborate than that, you'd probably need to embed a Python interpreter into your C++ program and then you'd be able to call the Python functions directly and get back their return values as Python objects, but that's a fairly major undertaking which I'm guessing you want to avoid.

How to send and receive data (and / or data structures) from a C ++ script to a Python script?

I am working on a project that needs to do the following:
[C++ Program] Checks a given directory, extracts all the names (full paths) of the found files and records them in a vector<string>.
[C++ Program] "Send" the vector to a Python script.
[Python Script] "Receive" the vector and transform it into a List.
[Python Script] Compares the elements of the List (the paths) against the records of a database and removes the matches from the List (removes the paths already registered).
[Python Script] "Sends" the processed List back to the C++ Program.
[C++ Program] "Receives" the List, transforms it into a vector and continues its operations with this processed data.
I would like to know how to send and receive data structures (or data) between a C ++ Script and a Python Script.
For this case I put the example of a vector transforming into a List, however I would like to know how to do it for any structure or data in general.
Obviously I am a beginner, that is why I would like your help on what documentation to read, what concepts should I start with, what technique should I use (maybe there is some implicit standard), what links I could review to learn how to communicate data between Scripts of the languages ​​I just mentioned.
Any help is useful to me.
You could check out redis. It can be used as a message broker between programs. https://redis.io/
They have a clients for many langues including c++ and python.
https://redis.io/clients
You can set up channels for each program to publish messages through and have the other program subscribe to that channel to receive those messages.
Check out the pubsub part of the documentation: https://redis.io/topics/pubsub
You can pass arguments to any process/script no matter what language they are written in.
In C++ they are represented as argc (number of arguments) and argv (actual arguments).
#include <iostream>
int main(int argc, char** argv) {
std::cout << "Have " << argc << " arguments:" << std::endl;
for (int i = 0; i < argc; ++i) {
std::cout << argv[i] << std::endl;
}
}
For example, if you're using g++ as your compiler. Then running:
g++ file.cpp -o main
./main hello world
Should output:
Have 3 arguments:
hello
world
Note that size of the arguments is 3 although you're passing hello and world only, it's because the first element in argv is always the name of your program.
Its equivalent in Python would be:
import sys
print('Have ' + len(sys.argv) + 'arguments:')
print("Argument List:", str(sys.argv))
If I were you, I would start by serializing the vector/list and pass it as an argument back and forth between the Python and C++ processes as above.
If the idea is to execute the python script from the c++ process, then the easiest would be to design the python script to accept input_file and output_file as arguments and the c++ program should write the input_file, start the script and read the output_file.
For simple structures like list-of-strings, you can simply write them as text files and share, but for more complex types, you can use google-protocolbuffers to do the marshalling/unmarshalling.
if the idea is to send/receive data between two already stared process, then you can use the same protocol buffers to encode data and send/receive via sockets between each other. Check gRPC

Why captured sys.exit() code is different from the provided one?

I have a c++ program which runs a python program using system() command. I have return some value using the error code using sys.exit() in python code. But when I capture the returned value back in c++, it differs from one I coded in the python program.
My python code: test.py
import sys
sys.exit(10)
My c++ code: test.cpp
#include "iostream"
using namespace std;
int main ()
{
string str = "python test.py";
const char *command = str.c_str();
int value = system(command);
cout<<value;
return 0;
}
when I run run test.cpp, I got 2560
Why this happens?
The C++ standard defers to C (C++17 to C11) for this aspect, and it has this to say:
If the argument is a null pointer, the system function returns nonzero only if a
command processor is available. If the argument is not a null pointer, and the system function does return, it returns an implementation-defined value.
Assuming you're using Linux or some other POSIX-y type system, the return code matches that of the status that gets populated by the wait() call.
It's not a simple mapping, since it has to be able to return all sorts of meta-information about why the process exited, above and beyond the simple exit code (think of signals killing the process, for example).
That means you probably need to use the same macros you would for wait(), if you want to get the correct values.
Specifically, you should be using something like:
if (WIFEXITED(value))
printf("Normal exit, returned %d\n", WEXITSTATUS(value));
In your particular implementation, it probably shifts the return value eight bits left and uses the remaining bits to specify all those other useful things. That's an implementation detail so may not necessarily be correct but it's a good educated guess.
Just as paxdiablo said, it is indeed X*256 where X is the return code. Browsing through C++ documentation, the output of system MAY contain the error code, or may not, implementation defined:
Check: this and this. If you want to use the output and the return code on POSIX systems, you should be able to use wait

Python "print" not working when embedded into MPI program

I have an Python 3 interpreter embedded into an C++ MPI application. This application loads a script and passes it to the interpreter.
When I execute the program on 1 process without the MPI launcher (simply calling ./myprogram), the script is executed properly and its "print" statements output to the terminal. When the script has an error, I print it on the C++ side using PyErr_Print().
However when I lauch the program through mpirun (even on a single process), I don't get any output from the "print" in the python code. I also don't get anything from PyErr_Print() when my script has errors.
I guess there is something in the way Python deals with standard output that do not match the way MPI (actuall Mpich here) deals with redirecting the processes' output to the launcher and finally to the terminal.
Any idea on how to solve this?
[edit, following the advice from this issue]
You need to flush_io() after each call to PyErr_Print, where flush_io could be this function:
void flush_io(void)
{
PyObject *type, *value, *traceback;
PyErr_Fetch(&type, &value, &traceback); // in Python/pythonrun.c, they save the traceback, let's do the the same
for (auto& s: {"stdout", "stderr"}) {
PyObject *f = PySys_GetObject(s);
if (f) PyObject_CallMethod(f, "flush", NULL);
else PyErr_Clear();
}
PyErr_Restore(type, value, traceback);
}
[below my old analysis, it still has some interesting info]
I ended up with the same issue (PyErr_Print not working from an mpirun). Tracing back (some gdb of python3 involved) and comparing the working thing (./myprogram) and non-working thing (mpirun -np 1 ./myprogram), I ended up in _io_TextIOWrapper_write_impl at ./Modules/_io/textio.c:1277 (python-3.6.0 by the way).
The only difference between the 2 runs is that self->line_buffering is 1 vs. 0 (at this point self represents sys.stderr).
Then, in pylifecycle.c:1128, we can see who decided this value:
if (isatty || Py_UnbufferedStdioFlag)
line_buffering = Py_True;
So it seems that MPI does something to stderr before launching the program, which makes it not a tty. I haven't investigated if there's an option in mpirun to keep the tty flag on stderr ... if someone knows, it'd be interesting (though on second thought mpi probably has good reasons to put his file descriptors in place of stdout&stderr, for its --output-filename for example).
With this info, I can come out with 3 solutions (the first 2 are quick-fixes, the 3rd is better):
1/ in the C code that starts the python interpreter, set the buffering flag before creating sys.stderr. The code becomes :
Py_UnbufferedStdioFlag = 1; // force line_buffering for _all_ I/O
Py_Initialize();
This brings Python's traceback back to screen in all situations; but will probably give catastrophic I/O ... so only an acceptable solution in debug mode.
2/ in the python (embedded) script, at the very beginning add this :
import sys
#sys.stderr.line_buffering = True # would be nice, but readonly attribute !
sys.stderr = open("error.log", 'w', buffering=1 )
The script then dumps the traceback to this file error.log.
I also tried adding a call to fflush(stderr) or fflush(NULL) right after the PyErr_Print() ... but this didn't work (because sys.stderr has its own internal buffering). That'd be a nice solution though.
3/ After a little more digging, I found the perfect function in
Python/pythonrun.c:57:static void flush_io(void);
It is in fact called after each PyErr_Print in this file.
Unfortunately it's static (only exists in that file, no reference to it in Python.h, at least in 3.6.0). I copied the function from this file to myprogram and it turns out to do exactly the job.

Why does Python compile modules but not the script being run?

Why does Python compile libraries that are used in a script, but not the script being called itself?
For instance,
If there is main.py and module.py, and Python is run by doing python main.py, there will be a compiled file module.pyc but not one for main. Why?
Edit
Adding bounty. I don't think this has been properly answered.
If the response is potential disk permissions for the directory of main.py, why does Python compile modules? They are just as likely (if not more likely) to appear in a location where the user does not have write access. Python could compile main if it is writable, or alternatively in another directory.
If the reason is that benefits will be minimal, consider the situation when the script will be used a large number of times (such as in a CGI application).
Files are compiled upon import. It isn't a security thing. It is simply that if you import it python saves the output. See this post by Fredrik Lundh on Effbot.
>>>import main
# main.pyc is created
When running a script python will not use the *.pyc file.
If you have some other reason you want your script pre-compiled you can use the compileall module.
python -m compileall .
compileall Usage
python -m compileall --help
option --help not recognized
usage: python compileall.py [-l] [-f] [-q] [-d destdir] [-x regexp] [directory ...]
-l: don't recurse down
-f: force rebuild even if timestamps are up-to-date
-q: quiet operation
-d destdir: purported directory name for error messages
if no directory arguments, -l sys.path is assumed
-x regexp: skip files matching the regular expression regexp
the regexp is searched for in the full path of the file
Answers to Question Edit
If the response is potential disk permissions for the directory of main.py, why does Python compile modules?
Modules and scripts are treated the same. Importing is what triggers the output to be saved.
If the reason is that benefits will be minimal, consider the situation when the script will be used a large number of times (such as in a CGI application).
Using compileall does not solve this.
Scripts executed by python will not use the *.pyc unless explicitly called. This has negative side effects, well stated by Glenn Maynard in his answer.
The example given of a CGI application should really be addressed by using a technique like FastCGI. If you want to eliminate the overhead of compiling your script you may want eliminate the overhead of starting up python too, not to mention database connection overhead.
A light bootstrap script can be used or even python -c "import script", but these have questionable style.
Glenn Maynard provided some inspiration to correct and improve this answer.
Nobody seems to want to say this, but I'm pretty sure the answer is simply: there's no solid reason for this behavior.
All of the reasons given so far are essentially incorrect:
There's nothing special about the main file. It's loaded as a module, and shows up in sys.modules like any other module. Running a main script is nothing more than importing it with a module name of __main__.
There's no problem with failing to save .pyc files due to read-only directories; Python simply ignores it and moves on.
The benefit of caching a script is the same as that of caching any module: not wasting time recompiling the script every time it's run. The docs acknowledge this explicitly ("Thus, the startup time of a script may be reduced ...").
Another issue to note: if you run python foo.py and foo.pyc exists, it will not be used. You have to explicitly say python foo.pyc. That's a very bad idea: it means Python won't automatically recompile the .pyc file when it's out of sync (due to the .py file changing), so changes to the .py file won't be used until you manually recompile it. It'll also fail outright with a RuntimeError if you upgrade Python and the .pyc file format is no longer compatible, which happens regularly. Normally, this is all handled transparently.
You shouldn't need to move a script to a dummy module and set up a bootstrapping script to trick Python into caching it. That's a hackish workaround.
The only possible (and very unconvincing) reason I can contrive is to avoid your home directory from being cluttered with a bunch of .pyc files. (This isn't a real reason; if that was an actual concern, then .pyc files should be saved as dotfiles.) It's certainly no reason not to even have an option to do this.
Python should definitely be able to cache the main module.
Pedagogy
I love and hate questions like this on SO, because there's a complex mixture of emotion, opinion, and educated guessing going on and people start to get snippy, and somehow everybody loses track of the actual facts and eventually loses track of the original question altogether.
Many technical questions on SO have at least one definitive answer (e.g. an answer that can be verified by execution or an answer that cites an authoritative source) but these "why" questions often do not have just a single, definitive answer. In my mind, there are 2 possible ways to definitively answer a "why" question in computer science:
By pointing to the source code that implements the item of concern. This explains "why" in a technical sense: what preconditions are necessary to evoke this behavior?
By pointing to human-readable artifacts (comments, commit messages, email lists, etc.) written by the developers involved in making that decision. This is the real sense of "why" that I assume the OP is interested in: why did Python's developers make this seemingly arbitrary decision?
The second type of answer is more difficult to corroborate, since it requires getting in the mind of the developers who wrote the code, especially if there's no easy-to-find, public documentation explaining a particular decision.
To date, this thread has 7 answers that solely focus on reading the intent of Python's developers and yet there is only one citation in the whole batch. (And it cites a section of the Python manual that does not answer the OP's question.)
Here's my attempt at answering both of the sides of the "why" question along with citations.
Source Code
What are the preconditions that trigger compilation of a .pyc? Let's look at the source code. (Annoyingly, the Python on GitHub doesn't have any release tags, so I'll just tell you that I'm looking at 715a6e.)
There is promising code in import.c:989 in the load_source_module() function. I've cut out some bits here for brevity.
static PyObject *
load_source_module(char *name, char *pathname, FILE *fp)
{
// snip...
if (/* Can we read a .pyc file? */) {
/* Then use the .pyc file. */
}
else {
co = parse_source_module(pathname, fp);
if (co == NULL)
return NULL;
if (Py_VerboseFlag)
PySys_WriteStderr("import %s # from %s\n",
name, pathname);
if (cpathname) {
PyObject *ro = PySys_GetObject("dont_write_bytecode");
if (ro == NULL || !PyObject_IsTrue(ro))
write_compiled_module(co, cpathname, &st);
}
}
m = PyImport_ExecCodeModuleEx(name, (PyObject *)co, pathname);
Py_DECREF(co);
return m;
}
pathname is the path to the module and cpathname is the same path but with a .pyc extension. The only direct logic is the boolean sys.dont_write_bytecode. The rest of the logic is just error handling. So the answer we seek isn't here, but we can at least see that any code that calls this will result in a .pyc file under most default configurations. The parse_source_module() function has no real relevance to the flow of execution, but I'll show it here because I'll come back to it later.
static PyCodeObject *
parse_source_module(const char *pathname, FILE *fp)
{
PyCodeObject *co = NULL;
mod_ty mod;
PyCompilerFlags flags;
PyArena *arena = PyArena_New();
if (arena == NULL)
return NULL;
flags.cf_flags = 0;
mod = PyParser_ASTFromFile(fp, pathname, Py_file_input, 0, 0, &flags,
NULL, arena);
if (mod) {
co = PyAST_Compile(mod, pathname, NULL, arena);
}
PyArena_Free(arena);
return co;
}
The salient aspect here is that the function parses and compiles a file and returns a pointer to the byte code (if successful).
Now we're still at a dead end, so let's approach this from a new angle. How does Python load it's argument and execute it? In pythonrun.c there are a few functions for loading code from a file and executing it. PyRun_AnyFileExFlags() can handle both interactive and non-interactive file descriptors. For interactive file descriptors, it delegates to PyRun_InteractiveLoopFlags() (this is the REPL) and for non-interactive file descriptors, it delegates to PyRun_SimpleFileExFlags(). PyRun_SimpleFileExFlags() checks if the filename ends in .pyc. If it does, then it calls run_pyc_file() which directly loads compiled byte code from a file descriptor and then runs it.
In the more common case (i.e. .py file as an argument), PyRun_SimpleFileExFlags() calls PyRun_FileExFlags(). This is where we start to find our answer.
PyObject *
PyRun_FileExFlags(FILE *fp, const char *filename, int start, PyObject *globals,
PyObject *locals, int closeit, PyCompilerFlags *flags)
{
PyObject *ret;
mod_ty mod;
PyArena *arena = PyArena_New();
if (arena == NULL)
return NULL;
mod = PyParser_ASTFromFile(fp, filename, start, 0, 0,
flags, NULL, arena);
if (closeit)
fclose(fp);
if (mod == NULL) {
PyArena_Free(arena);
return NULL;
}
ret = run_mod(mod, filename, globals, locals, flags, arena);
PyArena_Free(arena);
return ret;
}
static PyObject *
run_mod(mod_ty mod, const char *filename, PyObject *globals, PyObject *locals,
PyCompilerFlags *flags, PyArena *arena)
{
PyCodeObject *co;
PyObject *v;
co = PyAST_Compile(mod, filename, flags, arena);
if (co == NULL)
return NULL;
v = PyEval_EvalCode(co, globals, locals);
Py_DECREF(co);
return v;
}
The salient point here is that these two functions basically perform the same purpose as the importer's load_source_module() and parse_source_module(). It calls the parser to create an AST from Python source code and then calls the compiler to create byte code.
So are these blocks of code redundant or do they serve different purposes? The difference is that one block loads a module from a file, while the other block takes a module as an argument. That module argument is — in this case — the __main__ module, which is created earlier in the initialization process using a low-level C function. The __main__ module doesn't go through most of the normal module import code paths because it is so unique, and as a side effect, it doesn't go through the code that produces .pyc files.
To summarize: the reason why the __main__ module isn't compiled to .pyc is that it isn't "imported". Yes, it appears in sys.modules, but it gets there via a very different code path than real module imports take.
Developer Intent
Okay, so we can now see that the behavior has more to do with the design of Python than with any clearly expressed rationale in the source code, but that doesn't answer the question of whether this is an intentional decision or just a side effect that doesn't bother anybody enough to be worth changing. One of the benefits of open source is that once we've found the source code that interests us, we can use the VCS to help trace back to the decisions that led to the present implementation.
One of the pivotal lines of code here (m = PyImport_AddModule("__main__");) dates back to 1990 and was written by the BDFL himself, Guido. It has been modified in intervening years, but the modifications are superficial. When it was first written, the main module for a script argument was initialized like this:
int
run_script(fp, filename)
FILE *fp;
char *filename;
{
object *m, *d, *v;
m = add_module("`__main__`");
if (m == NULL)
return -1;
d = getmoduledict(m);
v = run_file(fp, filename, file_input, d, d);
flushline();
if (v == NULL) {
print_error();
return -1;
}
DECREF(v);
return 0;
}
This existed before .pyc files were even introduced into Python! Small wonder that the design at that time didn't take compilation into account for script arguments. The commit message enigmatically says:
"Compiling" version
This was one of several dozen commits over a 3 day period... it appears that Guido was deep into some hacking/refactoring and this was the first version that got back to being stable. This commit even predates the creation of the Python-Dev mailing list by about five years!
Saving the compiled bytecode was introduced 6 months later, in 1991.
This still predates the list serve, so we have no real idea of what Guido was thinking. It appears that he simply thought that the importer was the best place to hook into for the purpose of caching bytecodes. Whether he considered the idea of doing the same for __main__ is unclear: either it didn't occur to him, or else he thought that it was more trouble than it was worth.
I can't find any bugs on bugs.python.org that are related to caching the bytecodes for the main module, nor can I find any messages on the mailing list about it, so apparently nobody else thinks it's worth the trouble to try adding it.
To summarize: the reason why all modules are compiled to .pyc except __main__ is that it's a quirk of history. The design and implementation for how __main__ works was baked into the code before .pyc files even existed. If you want to know more than that, you'll need to e-mail Guido and ask.
Glenn Maynard's answer says:
Nobody seems to want to say this, but I'm pretty sure the answer is simply: there's no solid reason for this behavior.
I agree 100%. There's circumstantial evidence to support this theory and nobody else in this thread has provided a single shred of evidence to support any other theory. I upvoted Glenn's answer.
To answer your question, reference to 6.1.3. “Compiled” Python files in Python official document.
When a script is run by giving its name on the command line, the bytecode for the script is never written to a .pyc or .pyo file. Thus, the startup time of a script may be reduced by moving most of its code to a module and having a small bootstrap script that imports that module. It is also possible to name a .pyc or .pyo file directly on the command line.
Since:
A program doesn’t run any faster when it is read from a .pyc or .pyo file than when it is read from a .py file; the only thing that’s faster about .pyc or .pyo files is the speed with which they are loaded.
That is unnecessary to generate .pyc file for main script. Only the libraries which might be loaded many times should be compiled.
Edited:
It seem you didn't get my point. First, knowing the whole idea of compiling into .pyc file is to make the same file executing faster at the second time. However, consider if Python did compile the script being run. The interpreter will write bytecode into a .pyc file at the first running, this takes time. So it will even run a bit slower. You might argue that it will run faster after. Well, it just a choice. Plus, as this says:
Explicit is better than implicit.
If one wants a speedup by using .pyc file, one should compile it manually and run the .pyc file explicitly.
Because the script being run may be somewhere where it is inappropriate to generate .pyc files, such as /usr/bin.
Because different versions of Python (3.6, 3.7 ...) have different bytecode representations, and trying to design a compile system for that was deemed too complicated. PEP 3147 discusses the rationale.

Categories

Resources