I need to make some modifications to scikit-learn, including changes to the cython code.
I haven't worked on cython before, so could do with some guidance - so far I have got all the dependencies going in a python virtualenv, and cloned and installed the sklearn git.
Now, what is a good workflow for modifying the .pyx files? Should I make modifications and then reinstall to see the effects? Or build instead?
Is there any way to avoid recompiling all the stuff that is unchanged?
I have heard of import pyximport; pyximport.install() but for me this throws a compile error with sklearn -> is there a way to ensure it uses the same options as the Makefile which runs successfully?
In general I am looking for guidance on how to modify a large cython project without spending decades waiting for unmodified files to recompile.
You could simply run,
python setup.py develop
after each modification. Unlike the install command this will not copy any files and only creates a symbolic link to the working directory. It will also automatically build all the necessary extensions in place, in an equivalent of
python setup.py build_ext --inplace
If you change a Cython file in your project, only those files will be recompiled next time you run the develop command.
The pyximport module is nice for standalone Cython functions. However, for a more complex project, with multiple files, the above approach would probably be simpler.
Related
I'm using jsonpickle in my program, but found it to be a performance bottleneck. So I'm trying to see if I can compile it to C using Nuitka, then use the C version in my program (through some wrappers perhaps).
But to be honest, I'm new to Nuitka, so I don't even know if this is a legit use case. Can someone give me some hints?
Note: this question is not about how to make a program faster. I'm building a library, not an application, so certain approaches like Pypy won't work. I'm also aware of Cython and is investigating it too, but this question is not about Cython either.
Figured it out myself. Just git clone the repo, go into the directory, and run
python -m nuitka --module jsonpickle --include-package=jsonpickle
Nuitka will generate a .so file, and you can import and use it just like the original package.
This method should be universal and could apply to any package and not just jsonpickle.
Do note that Nuitka is meant to be used in an application, but not good for building a library. The main pain point is that it doesn't support cross-compilation.
I have a C++ library (we'll call it Example in the following) for which I wrote Python bindings using the boost.python library. This Python-wrapped library will be called pyExample. The entire project is built using CMake and the resulting Python-wrapped library is a file named libpyExample.so.
When I use the Python bindings from a Python script located in the same directory as libpyExample.so, I simply have to write:
import libpyExample
libpyExample.hello_world()
and this executes a hello_world() function exposed by the wrapping process.
What I want to do
For convenience, I would like my pyExample library to be available from anywhere simply using the command
import pyExample
I also want pyExample to be easily installable in any virtualenv in just one command. So I thought a convenient process would be to use setuptools to make that happen. That would therefore imply:
Making libpyExample.so visible for any Python script
Changing the name under which the module is accessed
I did find many things about compiling C++ extensions with setuptools, but nothing about packaging a pre-compiled C++ extension. Is what I want to do even possible?
What I do not want to do
I don't want to build the pyExample library with setuptools, I would like to avoid modifying the existing project too much. The CMake build is just fine, I can retrieve the libpyExample.so file very easily.
If I understand your question correctly, you have the following situation:
you have an existing CMake-based build of a C++ library with Python bindings
you want to package this library with setuptools
The latter then allows you to call python setup.py install --user, which installs the lib in the site-packages directory and makes it available from every path in your system.
What you want is possible, if you overload the classes that setuptools uses to build extensions, so that those classes actually call your CMake build system. This is not trivial, but you can find a working example here, provided by the pybind11 project:
https://github.com/pybind/cmake_example
Have a look into setup.py, you will see how the classes build_ext and Extension are inherited from and modified to execute the CMake build.
This should work out of the box for you or with little modification - if your build requires special -D flags to be set.
I hope this helps!
This is probably a question that has a very easy and straightforward answer, however, despite having a few years programming experience, for some reason I still don't quite get the exact concepts of what it means to "build" and then to "install". I know how to use them and have used them a lot, but have no idea about the exact processes which happen in the background...
I have looked across the web, wikipedia, etc... but there is no one simple answer to it, neither can I find one here.
A good example, which I tried to understand, is adding new modules to python:
http://docs.python.org/2/install/index.html#how-installation-works
It says that "the build command is responsible for putting the files to install into a build directory"
And then for the install command: "After the build command runs (whether you run it explicitly, or the install command does it for you), the work of the install command is relatively simple: all it has to do is copy everything under build/lib (or build/lib.plat) to your chosen installation directory."
So essentially what this is saying is:
1. Copy everything to the build directory and then...
2. Copy everything to the installation directory
There must be a process missing somewhere in the explanation...complilation?
Would appreciate some straightforward not too techy answer but in as much detail as possible :)
Hopefully I am not the only one who doesn't know the detailed answer to this...
Thanks!
Aivoric
Building means compiling the source code to binary in a sandbox location where it won't affect your system if something goes wrong, like a build subdirectory inside the source code directory.
Install means copying the built binaries from the build subdirectory to a place in your system path, where they become easily accessible. This is rarely done by a straight copy command, and it's often done by some package manager that can track the files created and easily uninstall them later.
Usually, a build command does all the compiling and linking needed, but Python is an interpreted language, so if there are only pure Python files in the library, there's no compiling step in the build. Indeed, everything is copied to a build directory, and then copied again to a final location. Only if the library depends on code written in other languages that needs to be compiled you'll have a compiling step.
You want a new chair for your living-room and you want to make it yourself. You browse through a catalog and order a pile of parts. When they arrives at your door, you can't immediately use them. You have to build the chair at your workshop. After a bit of elbow-grease, you can sit down in it. Afterwards, you install the chair in your living-room, in a convenient place to sit down.
The chair is a program you want to use. It arrives at your house as source code. You build it by compiling it into a runnable program. You install it by making it easier to use.
The build and install commands you are refering to come from setup.py file right?
Setup.py (http://docs.python.org/2/distutils/setupscript.html)
This file is created by 3rd party applications / extensions of Python. They are not part of:
Python source code (bunch of c files, etc)
Python libraries that come bundled with Python
When a developer makes a library for python that he wants to share to the world he creates a setup.py file so the library can be installed on any computer that has python. Maybe this is the MISSING STEP
Setup.py sdist
This creates a python module (the tar.gz files). What this does is copy all the files used by the python library into a folder. Creates a setup.py file for the module and archives everything so the library can be built somewhere else.
Setup.py build
This builds the python module back into a library (SPECIFICALLY FOR THIS OS).
As you may know, the computer that the python library originally came from will be different from the library that you are installing on.
It might have a different version of python
It might have a different operating system
It might have a different processor / motherboard / etc
For all the reasons listed above the code will not work on another computer. So setup.py sdist creates a module with only the source files needed to rebuild the library on another computer.
What setup.py does exactly is similar to what a makefile would do. It compiles sources / creates libraries all that stuff.
Now we have a copy of all the files we need in the library and they will work on our computer / operating system.
Setup.py install
Great we have all the files needed. But they won't work. Why? Well they have to be added to Python that's why. This is where install comes in. Now that we have a local copy of the library we need to install it into python so you can use it like so:
import mycustomlibrary
In order to do this we need to do several things including:
Copy files to their library folders in our version of python.
Make sure library can be imported using import command
Run any special install instructions for this library. (seting up paths, etc)
This is the most complicated part of the task. What if our library uses BeautifulSoup? This is not a part of Python Library. We'd have to install it in a way such that our library and any others can use BeautifulSoup without interfering with each other.
Also what if python was installed someplace else? What if it was installed on a server with many users?
Install handles all these problems transparently. What is does is make the library that we just built able to run. All you have to do is use the import command, install handles the rest.
I am writing a program in python to be sent to other people, who are running the same python version, however these some 3rd party modules that need to be installed to use it.
Is there a way to compile into a .pyc (I only say pyc because its a python compiled file) that has the all the dependant modules inside it as well?
So they can run the programme without needing to install the modules separately?
Edit:
Sorry if it wasnt clear, but I am aware of things such as cx_freeze etc but what im trying to is just a single python file.
So they can just type "python myapp.py" and then it will run. No installation of anything. As if all the module codes are in my .py file.
If you are on python 2.3 or later and your dependencies are pure python:
If you don't want to go the setuptools or distutiles routes, you can provide a zip file with the pycs for your code and all of its dependencies. You will have to do a little work to make any complex pathing inside the zip file available (if the dependencies are just lying around at the root of the zip this is not necessary. Then just add the zip location to your path and it should work just as if the dependencies files has been installed.
If your dependencies include .pyds or other binary dependencies you'll probably have to fall back on distutils.
You can simply include .pyc files for the libraries required, but no - .pyc cannot work as a container for multiple files (unless you will collect all the source into one .py file and then compile it).
It sounds like what you're after is the ability for your end users to run one command, e.g. install my_custom_package_and_all_required_dependencies, and have it assemble everything it needs.
This is a perfect use case for distutils, with which you can make manifests for your own code that link out to external dependencies. If your 3rd party modules are available publicly in a standard format (they should be, and if they're not, it's pretty easy to package them yourself), then this approach has the benefit of allowing you to very easily change what versions of 3rd party libraries your code runs against (see this section of the above linked doc). If you're dead set on packaging others' code with your own, you can always include the required files in the .egg you create with distutils.
Two options:
build a package that will install the dependencies for them (I don't recommend this if the only dependencies are python packages that are installed with pip)
Use virtual environments. You use an existing python on their system but python modules are installed into the virtualenv.
or I suppose you could just punt, and create a shell script that installs them, and tell them to run it once before they run your stuff.
I'm working on a Python Django package whose front-end components employ a bit of CoffeeScript.
Right now, I have a rather brain-dead external script that takes care of the CoffeeScript compilation. It simply runs a coffee compile command for every *.coffee file in a src/coffee/ directory and stores the output in src/static/js -- this similar to how python ./setup.py build_ext --inplace stores a C extension's build files in the development source tree.
That works for now, but it's pretty cheesy -- it forces a flat directory structure, and modifies the files in src/static (which is the opposite of what "static" implies).
I want to be maximally pythonic about things, so I looked into modifying distutils.ccompiler.CCompiler to run coffee as a subcommand of the setup.py "build_ext" subcommand -- I was envisioning the ability to do things like this:
% python ./setup.py build_coffee
% python ./setup.py build_coffee --inplace
% python ./setup.py build_ext --inplace # implying 'build_coffee --inplace'
... but I found distutils' compiler API to be way too focussed on C compilation nuances that have no analog in this case e.g. preprocessing, linking, etc. I also looked at Cython's code (specifically at Cython's CCompiler subclass, which preprocesses .pyx files into .c source) but this looked similarly specialized, and not so appropriate for this case.
Does anyone have a good solution for compiling CoffeeScript with a distutils setup.py script? Or, barring that, a good alternative suggestion?
You can roll this into a custom manage.py command.
See the official Django documentation here this way the script will be run everytime the server is run always resulting in a clean build of your js.
You could have a pre-commit hook* that compiles coffescript into javascript.
So everytime you commit a change in the coffescript, the javascript version is updated.
*pre commit hook: the way to do it depends on the VCS you use, and depends on you using a sane VCS.
Maybe take a look at DukPy... It's a simple javascript interpreter for Python and can compile CoffeeScript, TypeScript, BabelJS and JSX. Usage is very simple, just import and compile like this:
import dukpy
dukpy.coffee_compile("CoffeeScript goes here!")
Note: DukPy is the successor to the Python-CoffeeScript package witch is no longer maintained.