C helper program for Python package - python

I am writing a Python package that, in normal operation, needs to run a helper program that's written in C. The helper program ships as part of the package, and it doesn't make sense to try to run it independently.
How do I persuade Distutils to compile, and install to an appropriate location, an independent C program rather than a C extension module?
How should the Python part of the code locate and start the helper program?
N.B. Porting the actual code (especially the C helper) to Windows would necessitate a >90% rewrite, so I only care about making installation work on Unix.

This is pretty interesting. I've never done this, but I think you can use the distutils compiler directly.
I checked out github for some possible examples that might give you inspiration. Check out this one
The filter I used was distutils ccompiler language:python filename:setup.py in case you want to extend it/trim it down

Related

Distribute a standalone Python software with Julia dependencies

I have a software that is mostly written in Python and, for now, I'm using PyInstaller to bundle and distribute the software in a user-friendly way (it's part of my CI pipeline, for Linux and Windows).
However, my performance is terrible and I want to rewrite some heavy parts in Julia while keeping the front-end in Python. I can use PyJulia to do this, but it means that the user has to install Julia manually in order to use my program.
Julia does have the equivalent of PyInstaller, which is PackageCompiler.jl, but I don't know how to call something compiled with PackageCompiler.jl from the Python side.
How can I make this work, so I can bundle and distribute an executable that has Python, Julia and everything it needs to run?
A little more details
My end user is someone (chemists and pharmacists) that have no idea what programming is. They don't have Python, Julia or Docker (and they don't even want to install it).
In my current approach, the software bundled with PyInstaller consists of a single executable with everything inside it (Python and everything it needs). What I really want is to keep the same user experience, but also with Julia running on the backstage.
I'll implement several functions in the Julia side, and I want (almost) the same level of integration as I get with PyJulia.
Maybe I'll go to Rust and just use the C interface, but I really would like to use Julia.
Thank y'all for your time.
Per here: https://julialang.github.io/PackageCompiler.jl/stable/apps.html#Creating-an-app-1 you can get what is basically an executable file and then you can just follow this post: Python Script execute commands in Terminal to execute the file you create.
You could try using JuliaWin. This provides a standalone Julia runtime that works fully self-contained.
We deployed a tool using PyInstaller with the JuliaWin included as one of the datas as PyInstaller calls them. Iirc it was not PyJulia but JuliaCall+PythonCall doing the interfacing in our case.
Unfortunately the startup was ridiculous (order of minutes). This is in part due to Julia's startup time, but largely due to the generated .exe first having to unpack the JuliaWin. So we are currently investigating using PackageCompiler, possibly also switching to providing the user with 2 files instead of 1.

Running python script without installed libraries

I have working Python script using scipy and numpy functions and I need to run it on the computer with installed Python but without modules scipy and numpy. How should I do that? Is .pyc the answer or should I do something more complex?
Notes:
I don't want to use py2exe. I am aware of it but it doesn't fit to the problem.
I have read, these questions (What is the difference between .py and .pyc files?, Python pyc files (main file not compiled?)) with obvious connection to this problem but since I am a physicist, not a programmer, I got totally lost.
It is not possible.
A pyc-file is nothing more than a python file compiled into byte-code. It does not contain any modules that this file imports!
Additionally, the numpy module is an extension written in C (and some Python). A substantial piece of it are shared libraries that are loaded into Python at runtime. You need those for numpy to work!
Python first "compiles" a program into bytecode, and then throws this bytecode through an interpreter.
So if your code is all Python code, you would be able to one-time generate the bytecode and then have the Python runtime use this. In fact I've seen projects such as this, where the developer has just looked through the bytecode spec, and implemented a bytecode parsing engine. It's very lightweight, so it's useful for e.g. "Python on a chip" etc.
Problem comes with external libraries not entirely written in Python, (e.g. numpy, scipy).
Python provides a C-API, allowing you to create (using C/C++ code) objects that appear to it as Python objects. This is useful for speeding things up, interacting with hardware, making use of C/C++ libs.
Take a look at Nuitka. If you'll be able to compile your code (not necessarily a possible or easy task), you'll get what you want.

Python's freeze.py doesn't install on Windows

I have been looking for the freeze.py utility which is supposed to come bundled with Python 3 in a Python 3.3 Windows install (albeit with distribute and pip installed) and haven't found it. The utility can be downloaded directly out of the Python svn repository here, but I'm wondering: does freeze come with a standard Windows Python 3 install?
It looks like Windows binary installations of Python don't come with the freeze tool. And there's apparently a good reason for this. According to the freeze README in the source tree:
Under Windows 95 or NT, you must use the -p option and point it to the top of the Python source tree.
If you read the whole section, it comes down to this: On Windows, freeze only works if you've built Python from source, and have the resulting tree sitting around to be used for freezing. So, there's no good reason to give you freeze in binary installations.
Meanwhile, I probably should have asked this in the first place, but… are you sure you want freeze in the first place?
The freeze utility is very out of date (you might have guessed that from the README talking about requiring VC++ 5.0, Windows 95 or NT 4.0, etc.). It also never worked that well on Windows (as you can tell from the documentation describing it as a utility "… to compile executables for Unix systems"). And there's just a lot of things it can't handle, or handles badly. At this point should probably be considered more as example code than as a useful tool.
There are a number of third-party alternatives out there: cx_freeze, py2exe, PyInstaller, etc. If you search PyPI for "freeze" (and other terms that seem reasonable), you will find a bunch of these alternatives. If your goal is to create a standalone executable out of your Python script (which, btw, freeze can never do on Windows anyway), experiment with a few of these and pick the one you like best.
If your goal is something different, the right tool will be different—you might be better off using venv or just zipping up a user site-packages directory or creating a local PyPI server.
In the comments, you said:
What I was actually looking for is a tool to convert Python code to C code. Apparently, that's impossible.
It's not impossible, it's just not what freeze (or its successors/competitors) does. Cython compiles almost a strict superset of Python to C code, although it's C code that uses Python runtime objects (except where you explicitly statically declare variables and functions with C types). If C++ is an acceptable alternative to C, Shed Skin compiles a restricted subset of Python 2.6 (using native C++ objects, and using type inference so you don't have to statically declare your types).
The question is why you want to compile Python code to C.
If you're looking to optimize some slow code, Cython is great at speeding up small pieces of bottleneck code. It takes a bit of effort (deciding what to move to Cython, what static type declarations to put in, etc.), but the curve of payoff to effort is pretty solid. Shed Skin takes a lot less effort—if it works, it just speeds up everything, automatically—but it also means you can't write a lot of idiomatic Python code in the first place. But really, before looking at either, you should consider PyPy, a complete implementation of Python 2.7.3 (and hopefully 3.3 soon) in a JIT-compiling interpreter, that often offers similar speedups, with pretty much no tradeoffs at all. Or, alternatively, you may just need to rewrite slow code to take advantage of already-optimized libraries (numpy instead of mapping over lists, itertools instead of explicit loops, lxml instead of html.parse, …).
If you're looking to write Python code that can interact directly with C code, without all the headaches of ctypes (or manually building Python bindings), Cython scores again. Cython code can effectively natively call both Python code and C code, and the compiler makes it all work like magic.
If you're looking to get C code that you can read, maintain, and improve on… there, you're out of luck. And this one may actually be impossible. Idiomatic Python code is just so different from idiomatic C code that it's hard to imagine how you could translate one into the other.
If you're wondering what the underlying problem is:
As far as I can tell, freeze makes a lot of assumptions about how things are laid out. It should be enough to have any Python installation that can build C extension modules and embedding apps, but it's not, because freeze goes under the covers and expects that building to work in specific ways. A standard binary installation on almost every *nix platform ends up looking like what freeze expects,* but a standard binary installation on Windows looks completely different.
It's not impossible to hack things up using Windows symlinks (at least if you have Vista or later and a drive with a modern version of NTFS) to get everything organized the way freeze expects (I found a blog where someone did that with 2.7.1…), but really, I don't think it's worth trying. It will be a lot of work (especially if you're just learning this stuff), and there's no guarantee you won't immediately run into another problem.
* This isn't actually true. On a Mac, both Apple's pre-installed Python and the binary installers at python.org actually give you the files organized as a Mac framework—but they provide a bunch of symlinks that simulate the traditional layout, which is good enough. On most linux distros, and many other platforms, the binary python package doesn't include any of the development files at all—but once you install an add-on binary package named something like python-devel, then you've got the right layout. Anyway, none of this matters to you, because if you wanted to learn about dpkg dependencies or framework builds you wouldn't be using Windows, right?

Embedding Python on Windows: why does it have to be a DLL?

I'm trying to write a software plug-in that embeds Python. On Windows the plug-in is technically a DLL (this may be relevant). The Python Windows FAQ says:
1.Do not build Python into your .exe file directly. On Windows, Python must be a DLL to handle importing modules that are themselves DLL’s. (This is the first key undocumented fact.) Instead, link to pythonNN.dll; it is typically installed in C:\Windows\System. NN is the Python version, a number such as “23” for Python 2.3.
My question is why exactly Python must be a DLL? If, as in my case, the host application is not an .exe, but also a DLL, could I build Python into it? Or, perhaps, this note means that third-party C extensions rely on pythonN.N.dll to be present and other DLL won't do? Assuming that I'd really want to have a single DLL, what should I do?
I see there's the dynload_win.c file, which appears to be the module to import C extensions on Windows and, as far as I can see, it scans the extension file to find which pythonX.X.dll it imports; but I'm not experienced with Windows and I don't quite understand all the code there.
You need to link to pythonXY.dll as a DLL, instead of linking the relevant code directly into your executable, because otherwise the Python runtime can't load other DLLs (the extension modules it relies on.) If you make your own DLL you could theoretically link all the Python code in that DLL directly, since it doesn't end up in the executable but still in a DLL. You'll have to take care to do the linking correctly, however, as pretty much none of the standard tools (like distutils) will do this for you.
However, regardless of how you embed Python, you can't make do with just the DLL, nor can you make do with just any DLL. The ABI changes between Python versions, so if you compiled your code against Python 2.6, you need python26.dll; you can't use python25.dll or python27.dll. And Python isn't just a DLL; it also needs its standard library, which includes extension modules (which are DLLs themselves, although they have the .pyd extension.) The code in dynload_win.c you ran into is for loading those DLLs, and are not related to loading of pythonXY.dll.
In short, in order to embed Python in your plugin, you need to either ship Python with the plugin, or require that the right Python version is already installed.
(Sorry, I did a stupid thing, I first wrote the question, and then registered, and now I cannot alter it or comment on the replies, because StackOverflow's engine doesn't think I'm the author. I cannot even properly thank those who replied :( So this is actually an update to the question and comments.)
Thanks for all the advice, it's very valuable. As far as I understand with some effort I can link Python statically into a custom DLL, provided that I compile other dynamically loaded extensions myself and link them against the same DLL. (I know I need to ship the standard library too; my plan was to append a zipped archive to the DLL file. As far as I understand, I will even be able to import pure Python modules from it.)
I also found an interesting place in dynload_win.c. (I understand it loads dynamic extensions that use Python C API, e.g. _ctypes.) As far as I can see it not only looks for init_ctypes symbol or whatever the extension name is, but also scans the .pyd file's import table looking for (regex) python\d+\. and then compares the found symbol with known pythonNN. string to make sure the extension was compiled for this version of Python. If the import table doesn't have such a symbol or it refers to another version, it raises an error.
For me it means that:
If I link an extension against pythonNN.dll and try to load it from my custom DLL that includes a statically linked Python, it will pass the check, but — well, here I'm not sure: will it fail because there's no pythonNN.dll (i.e. even before getting to the check) or it will happily load the symbols?
And if I link it against my custom DLL, it will find symbols, but won't pass the check :)
I guess I could rewrite this piece to suit my needs... Are there any other such places, I wonder.
Python needs to be a dll (with a standard name) such that your application, and the plugin, can use the same instance of python.
Plugin dlls are already going to expect to be loading (and using python from) a python26.dll (or whichever version) - if your python is statically embedded in your exe, then two different instances of the python library would be managing the same data structures.
If the python libraries use no static variables at all, and the compile settings are exactly the same this should not be a problem. However, generally its far safer to simply ensure that only one instance of the python interpreter is being used.
On *nix, all shared objects in a process, including the executable, contribute their exported names into a common pool; any of the shared objects can then pull any of the names from the pool and use them as they like. This allows e.g. cStringIO.so to pull the relevant Python library functions from the main executable when the Python library is statically-linked.
On Windows, each shared object has its own independent pool of names it can use. This means that it must read the relevant different shared objects it needs functions from. Since it is a lot of work to get all the names from the main executable, the Python functions are separated out into their own DLL.

Python package with compiled code

I'm looking into releasing a python package which includes an existing fortran or C program. The fortran/C program is compiled by running
./configure
make
The python code calls the resulting binary through subprocess calls (i.e. the code is not really wrapped as such). What I would like is that when the user types
python setup.py install
the fortran/C program is first compiled using the ./configure and make commands, then I want the python module to be installed, and the binary to be installed in the python bin/ directory alongside executables that are usually installed via the scripts= option in distutils.core.setup.
First, are there any problems with doing this? And if not, what is the best way to do it via setup.py? Are there existing functions to automate the ./configure and make, since this is pretty standard? Or should I just use os.system calls? And either way, where should those commands go in setup.py? Then should I have make output the binary to e.g. scripts/ and then have scripts=['scripts/mybinary'] in the setup() function?
Don't make this too complex.
Just provide them as separate items with a README that says -- basically -- what you said in the question.
Build the Fortran/C with ./configure; make; make install.
Setup Python with python setup.py install.
It doesn't appear to be rocket science. Trying to over-simplify the installation means that you must account for every OS vagary and oddness.
It's easier to trust the users to do "standard" installations so that the Fortran/C is on the system PATH, and your Python script should be configured to find them on the system PATH.
People who want to use your software are then free to reconfigure it to their own unique needs. They will anyway. Don't overpackage and force them to fight against you to reconfigure things.
consider writing a python C extension as a wrapper for your C code, and a f2py extension as a wrapper for your fortran code. Then you can just use them in your python code as fast calls instead of using subprocess.

Categories

Resources