Why do some Python modules have to be "compiled"? - python

I've noticed that Python modules/packages come in two sorts. Some are just pure Python scripts and can simply be copied and pasted to the Python directory. Others however require, and I think these are usually wrappers for or based on C/C++ code, that the code is "built" and/or "compiled" with setup.py to produce a set of new files.
My questions is about the second type of module/package. Why is it that they have to be compiled, is there a particular reason for it? Couldn't the distributor just provide all the files from the beginning?
The reason I ask is because I wish to distribute such C++ based packages as part of my own packages, so that the user don't have to worry about installing dependencies on their own, and not having to worry about compiling, etc. I've always wondered why module distributors don't just include the dependencies instead of asking the user to install them themselves.
I suspect the answer might be bc those extra files have to be written in a specific way depending on what type of OS the computer is, and whether it is 32 or 64 bit. Could this mean that distributing the compiled files will work, but only if the user has the same specific OS and bit-system as the one where the files were compiled.
Anyway, curious to know the answer.

Why packages just don't include dependencies: License. You can't just add somebody else's code, compiled or not, without asking the person/company or even pay fees.
In Python, these built modules you talk about are Python Extensions. They are usually there to improve performance or access low-level functionality which is not in Python. Sometimes also to include proprietary functionality.

Related

How to compile the CPython python interpreter from source and build an installer for it

I don't even know the right way to put my question but i will try my best.
I downloaded (Python-3.4.2.tar) the source code of a python interpreter from www.python.org
I extracted the files(using 7-zip).
Now lets say i latter want to use the unziped/extracted fies to create an installer i.e put it in a form that i can double click and Python-3.4.2 will be installed in my computer.
i guess it is called creating a build distriution.
I know i can just download Python-3.4.2.exe from the site and install right away but i want to know how it goes from being source code to becoming something one can install.
Here's a starting point for compiling:
https://github.com/python/cpython/blob/2.7/PCbuild/readme.txt
I was able to compile 2.7.13 with:
cd PCBuild
cmd /c get_externals.bat
cmd /c build.bat -e --no-tkinter "/p:PlatformToolset=v100"
3.4 may require a later compiler.
Here's how to build an installer:
https://github.com/python/cpython/blob/2.7/Tools/msi/README.txt
For building the CPython python interpreter from source, you'll want to have a look at the instructions at https://docs.python.org/devguide/
These instructions probably also (somewhere) contain the steps to produce a Windows installer, which seems to be done with some tool called PCbuild.
Basically, I think what you are asking is how you can distribute the Python runtime along with your program. The process is pretty simple. First, you may want to take a look at Python's distutils. Secondly, you will need to distribute the Python runtime. Python is currently released within installation binaries for pretty much every major operating system. You have the option of compiling target system(s) yourself in order to make Python silently install.
You will then need to decide where you want the files to go. Most people don't like this sort of behavior, so I recommend putting the files in your programs directory. Shortcuts, system variables, et cetera will need to be looked into.
As another option you can consider porting your script(s) to Jython, as most people tend to have at least the Java runtime. In the process of porting your code there is a way of constructing Java .class files with Jython when coded a certain way. They can then be placed into an executable Jar file. Easy peasy, but I haven't tried this myself, to be honest.
If you want to stick with pure Python, the last hurtle is putting everything in one file. There are plenty of tools for that. I would try Github for starters. For bonus points you can always code your own self-extracting binary.
Post Script: With the title changed, I think you are looking for Creating Build Distributions. It is the second link within the Google results for the query "how to compile python installer". Further down the same page I found this gem.

Distributing a Python module with ffmpeg dependency

Rookie software developer here. I’m working on a Python module that harnesses some functionality from the FFmpeg framework - specifically, the ebur128 filter function. Ideally the module will stand on its own as an independent, platform agnostic tool for verifying that audio clips comply with EBU loudness standards. It’s being designed so that end users need only perform one simple, (hopefully!) painless installation procedure, which will encompass the installation of both the FFmpeg libraries and my Python wrapper/GUI.
I apologize for the rather vague question, but does anyone have general advice for creating Python module with external dependencies, or specific advice for standardizing the FFmpeg installation across platforms? Distutils seems pretty helpful – are there other guidelines or standard practices for developing a neatly packaged Python tool? I want to minimize any installation headaches for end users.
Thanks very much.
For Windows
I think it will be easy to find ffmpeg binaries that work on any system, just like for Qt or whatever GUI library you are using. You can ship these binaries with your project and things will work (you may want to distinguish 32 bit and 64 bit systems, though).
It looks like you want to create a software that is self-contained and easily installable for end-users. Inkscape is such an example -- its installer contains Python and all other dependencies, in binary form (if required). That is, for Windows, you do not need to create a real Python package (which would allow installation with pip), and you do not need to look into distutils (which supports building C extensions). Both you do not need/want, I guess.
Maybe it will be enough for you to assemble a good directory structure and to distribute a ZIP archive with your software. This is enough if you do not need to interact with the Windows registry, for instance. Such programs are usually called "standalone", in the Windows world. However, you might still want to have a real Windows installer (even if it is just a self-extracting archive). The following article covers your requirements, I believe: http://cyrille.rossant.net/create-a-standalone-windows-installer-for-your-python-application/
It suggests using http://www.jrsoftware.org/isinfo.php for creating such an installer.
Other platforms
On other operating systems it will be more difficult. For instance, I think it will be almost impossible to create ffmpeg binaries that run on every Linux system, because ffmpeg itself has so many binary dependencies. I do not know whether you can statically build ffmpeg at all.

Difference between installing and importing modules

New to Python, so excuse my lack of specific technical jargon. Pretty simple question really, but I can't seem to grasp or understand the concept.
It seems that a lot of modules require using pip or easy_install and running setup.py to "install" into your python installation or your virtualenv. What is the difference between installing a module and simply taking it and importing the into another script? It seems that you access the modules the same way.
Thanks!
It's like the difference between:
Uploading a photo to the internet
Linking the photo URL inside an HTML page
Installing puts the code somewhere python expects those kinds of things to be, and the import statement says "go look there for something named X now, and make the data available to me for use".
For a single module, it usually doesn't make any difference. For complicated webs of modules, though, an installation program may do many things that wouldn't be immediately obvious. For example, it may also copy data files into locations the new modules can find them, put executables (binary libraries, or DLLs on Windws, for example) where the new modules can find them, do different things depending on which version of Python you have, and so on.
If deploying a web of modules were always easy, nobody would have written setup programs to begin with ;-)

Python's freeze.py doesn't install on Windows

I have been looking for the freeze.py utility which is supposed to come bundled with Python 3 in a Python 3.3 Windows install (albeit with distribute and pip installed) and haven't found it. The utility can be downloaded directly out of the Python svn repository here, but I'm wondering: does freeze come with a standard Windows Python 3 install?
It looks like Windows binary installations of Python don't come with the freeze tool. And there's apparently a good reason for this. According to the freeze README in the source tree:
Under Windows 95 or NT, you must use the -p option and point it to the top of the Python source tree.
If you read the whole section, it comes down to this: On Windows, freeze only works if you've built Python from source, and have the resulting tree sitting around to be used for freezing. So, there's no good reason to give you freeze in binary installations.
Meanwhile, I probably should have asked this in the first place, but… are you sure you want freeze in the first place?
The freeze utility is very out of date (you might have guessed that from the README talking about requiring VC++ 5.0, Windows 95 or NT 4.0, etc.). It also never worked that well on Windows (as you can tell from the documentation describing it as a utility "… to compile executables for Unix systems"). And there's just a lot of things it can't handle, or handles badly. At this point should probably be considered more as example code than as a useful tool.
There are a number of third-party alternatives out there: cx_freeze, py2exe, PyInstaller, etc. If you search PyPI for "freeze" (and other terms that seem reasonable), you will find a bunch of these alternatives. If your goal is to create a standalone executable out of your Python script (which, btw, freeze can never do on Windows anyway), experiment with a few of these and pick the one you like best.
If your goal is something different, the right tool will be different—you might be better off using venv or just zipping up a user site-packages directory or creating a local PyPI server.
In the comments, you said:
What I was actually looking for is a tool to convert Python code to C code. Apparently, that's impossible.
It's not impossible, it's just not what freeze (or its successors/competitors) does. Cython compiles almost a strict superset of Python to C code, although it's C code that uses Python runtime objects (except where you explicitly statically declare variables and functions with C types). If C++ is an acceptable alternative to C, Shed Skin compiles a restricted subset of Python 2.6 (using native C++ objects, and using type inference so you don't have to statically declare your types).
The question is why you want to compile Python code to C.
If you're looking to optimize some slow code, Cython is great at speeding up small pieces of bottleneck code. It takes a bit of effort (deciding what to move to Cython, what static type declarations to put in, etc.), but the curve of payoff to effort is pretty solid. Shed Skin takes a lot less effort—if it works, it just speeds up everything, automatically—but it also means you can't write a lot of idiomatic Python code in the first place. But really, before looking at either, you should consider PyPy, a complete implementation of Python 2.7.3 (and hopefully 3.3 soon) in a JIT-compiling interpreter, that often offers similar speedups, with pretty much no tradeoffs at all. Or, alternatively, you may just need to rewrite slow code to take advantage of already-optimized libraries (numpy instead of mapping over lists, itertools instead of explicit loops, lxml instead of html.parse, …).
If you're looking to write Python code that can interact directly with C code, without all the headaches of ctypes (or manually building Python bindings), Cython scores again. Cython code can effectively natively call both Python code and C code, and the compiler makes it all work like magic.
If you're looking to get C code that you can read, maintain, and improve on… there, you're out of luck. And this one may actually be impossible. Idiomatic Python code is just so different from idiomatic C code that it's hard to imagine how you could translate one into the other.
If you're wondering what the underlying problem is:
As far as I can tell, freeze makes a lot of assumptions about how things are laid out. It should be enough to have any Python installation that can build C extension modules and embedding apps, but it's not, because freeze goes under the covers and expects that building to work in specific ways. A standard binary installation on almost every *nix platform ends up looking like what freeze expects,* but a standard binary installation on Windows looks completely different.
It's not impossible to hack things up using Windows symlinks (at least if you have Vista or later and a drive with a modern version of NTFS) to get everything organized the way freeze expects (I found a blog where someone did that with 2.7.1…), but really, I don't think it's worth trying. It will be a lot of work (especially if you're just learning this stuff), and there's no guarantee you won't immediately run into another problem.
* This isn't actually true. On a Mac, both Apple's pre-installed Python and the binary installers at python.org actually give you the files organized as a Mac framework—but they provide a bunch of symlinks that simulate the traditional layout, which is good enough. On most linux distros, and many other platforms, the binary python package doesn't include any of the development files at all—but once you install an add-on binary package named something like python-devel, then you've got the right layout. Anyway, none of this matters to you, because if you wanted to learn about dpkg dependencies or framework builds you wouldn't be using Windows, right?

Organizing Python projects with shared packages

What is the best way to organize and develop a project composed of many small scripts sharing one (or more) larger Python libraries?
We have a bunch of programs in our repository that all use the same libraries stored in the same repository. So in other words, a layout like
trunk
libs
python
utilities
projects
projA
projB
When the official runs of our programs are done, we want to record what version of the code was used. For our C++ executables, things are simple because as long as the working copy is clean at compile time, everything is fine. (And since we get the version number programmatically, it must be a working copy, not an export.) For Python scripts, things are more complicated.
The problem is that, often one project (e.g. projA) will be running, and projB will need to be updated. This could cause the working copy revision to appear mixed to projA during runtime. (The code takes hours to run, and can be used as inputs for processes that take days to run, hence the strong traceability goal.)
My current workaround is, if necessary, check out another copy of the trunk to a different location, and run off there. But then I need to remember to change my PYTHONPATH to point to the second version of lib/python, not the one in the first tree.
There's not likely to be a perfect answer. But there must be a better way.
Should we be using subversion keywords to store the revision number, which would allow the data user to export files? Should we be using virtualenv? Should we be going more towards a packaging and installation mechanism? Setuptools is the standard, but I've read mixed things about it, and it seems designed for non-developer end users (of which we have none).
The much better solution involves not storing all your projects and their shared dependencies in the same repository.
Use one repository for each project, and externals for the shared libraries.
Make use of tags in the shared library repositories, so consumer projects may use exactly the version they need in their external.
Edit: (just copying this from my comment) use virtualenv if you need to provide isolated runtime environments for the different apps on the same server. Then each environment can contain a unique version of the library it needs.
If I'm understanding your question properly, then you definitely want virtualenv. Add in some virtualenvwrapper goodness to make it that much better.

Categories

Resources