Can you cimport an .so file?

Can you cimport an .so file? - python

I have an .so file called tissue-classifier.cpython-37m-x86_64-linux-gnu.so from an external library that I want to import so that I can extend it in one of my local classes. Since I am extending a class, I need to import it as an extension type using cimport and I am wondering if this is even possible. If I use a normal import statement then I will be left with a Python compiled version which cannot be used to extend a cdef class in my current file.
When I try to cimport the tissue_classifier file, it gives me the error that tissue_classifier.pxd file could not be found which makes sense since it is in .so format. Sorry if this is a dumb question, I just haven't been able to figure this out for a while.

No, a *.so-file cannot be cimported.
If one is having C/CPP-backgrounds, then pyx/pxd/so-business is propably easiest to understand using the following model:
the resulting extension (*.so-file) corresponds to the final artefact in C/CPP-world which could be an executable, a shared-object (*.so), or a library/object-file collection. If you just runs the resulting program it is all you need. For example you can use (and probably do) a CPython-interpreter without having built it or having its source code. In analogy, if you have a binary extension (*.so) you can import and use it whitout having to build it (or even having corresponding pyx-files or a compiler on your machine) - that is what is provided by a wheel.
*.pyx corresponds to c/cpp-files, which have the definitions of the functionality. These files are needed if you want to build the resulting artifact from the source. In C/CPP-world this build process would be triggered by using make or similar. pyx-files are needed if you install the package via python setup.py install - which corresponds to calling make.
*.pxd corresponds to headers (h/hpp-files): it decribes the functionality in the resulting so-files, so it can be reused. For example just having CPython-interpreter isn't enough to build extensions - one has to install the dev-version so also the includes Python.h&Co. are present at the machine.
So what can be done?
First possibility:
If authors of the package consider *.pxd-files being part of the public API, then they can put the corresponding pxd-files next to *.so-files into the installation, so the c-interface of the module can be used/extended.
If they don't put the pxd-file into the installation, so chances are high that this c-interface is an implementation detail and you should not be using it, as it may change without notice in the future.
However, it is possible to take the risk and to copy the necessary pxd-files to the installation per hand, but making first sure that it is the right pxd-version (i.e. the same with which so-files in the installation were built).
Second possibility:
The easiest way to ensure, that the right pxd-version is used is to build package from the source, i.e.
dowloading the the right versioin from github (master or last release)
calling python setup.py install or what README tells you to do
Now, instead of copying pdx-files to the installation, one could add include_path to the downloaded package-source via include_path for cythonize-function or by adding it to sys.path.
Alternatively, as #BeforeFlight has pointed out in the comments, one could use python setup.py develop (or pip install -e the same folder so it can be deinstalled), and because it creates a link instead of copying data, the pxd-files will be found.
The above solutions will help to build the module, distributing it is a completely different story however.

Related

Python sys.path vs import

What I wish to understand is what is good/bad practice, and why, when it comes to imports. What I want to understand is the agreed upon view by the community on the matter, if there's any one such in some PEP document or similar.
What I see normally is people have a python environment, use conda/pip to install packages and all that's needed to do in the code is to use "import X" (and variants). In my current understanding this is the right way to do things.
Whenever python interacts with C++ at my firm, though, it always ends up with the need to use sys.path and absolute imports (but we have some standardized paths to use as "base" and usually define relative paths based on those).
There are 2 major cases:
Python wrapper for C++ library (pybind/ctype/etc) - in this case the user python code must use sys.path to specify where the C++ library to import is.
A project that establish communications between python and C++ (say C++ server, python clients, TCP connections and flatbuffer serialization between the two) - Here the python code lives together with the C++ code, and if it some python files end up using sys.path to import python modules from the same project but that live in a different directory - Essentially we deploy the python together with the C++ through our C++ deployment procedure.
I am not fully sure if we can do something better for case #1, but case #2 seems quite unnecessary, and basically forced just by the choice to not deploy the python code through a python package manager. Choice ends up forcing us to use sys.path on both the library and user code.
This seems very bad to me as basically this way of doing things doesn't allow us to fully manage our python environments (since we have some libraries that we import even thought they are not technically installed in the environment), and that is probably why I have a negative view of using sys.path for imports. But I need to find if I'm right, and if so I need some official (or almost) documents to support my case if I'm to propose fixes to our procedures.

For your scenario 2, my understanding is you have some C++ and accompanying python in one place, and a separate python project wants to import that python.
Could you structure the imported python as a package and install it to your environment with pip install path/to/package? If it's a package that you'll continue to edit, you can add the -e flag to pip install so that when the package changes your imports get the latest code.

if I make a function that calls a library like numpy, and someone else use it, they will need to have numpy installed?

it is very much the title. if I have a code that use a non built-in library in my repository in github and someone copy it, this person will have to have that library installed, right?

Short answer, Yes.
Long Answer, Yes, but actually you do the following in order to make the script executable on other systems.
Add a requirements.txt file, which specifies the libraries used and needed to be installed. Usually, this is used in a virtual environment. This makes sure that the packages/libraries used will not get mixed up with the main python installation
This is a rough solution, and I would use it in very extreme scenarios. (I used it when I had to run a python code on AWS Lambda where the library I used was compiled in C beforehand.) You can directly copy the Library folder in your code and use it. Mind you, this will increase the code size and is Absolutely not recommended to be done.

Check if package is imported from within the source tree

Users should install our python package via pip or it can be cloned from a github repo and installed from source. Users should not be running import Foo from within the source tree directory for a number of reasons, e.g. C extensions are missing (numpy has the same issue: read here). So, we want to check if the user is running import Foo from within the source tree, but how to do this cleanly, efficiently, and robustly with support for Python 3 and 2?
Edit: Note the source tree here is defined as where the code is downloaded too (e.g. via git or from the source archive) and it contrasts with the installation directory where the code is installed too.
We considered the following:
Check for setup.py, or other file like PKG-INFO, which should only be present in the source. It’s not that elegant and checking for the presence of a file is not very cheap, given this check will happen every time someone import Foo. Also there is nothing to stop someone from putting a setup.py outside to the source tree in their lib/python3.X/site-packages/ directory or similar.
Parsing the contents of setup.py for the package name, but it also adds overhead and is not that clean to parse.
Create a dummy flag file that is only present in the source tree.
Some clever, but likely overcomplicated and error-prone, ideas like modifying Foo/__init__.py during installation to note that we are now outside of the source tree.

Since you mention numpy in your comments and wanting to do it like they do but not fully understanding it, I figured I would break that down and see if you could implement a similar process.
__init__.py
The error you are seeking starts here which is what you linked to in your comments and answers so you already know that. It's just attempting an import of __config__.py and failing if it isn't there or can't be imported.
try:
from numpy.__config__ import show as show_config
except ImportError:
msg = """Error importing numpy: you should not try to import numpy from
its source directory; please exit the numpy source tree, and relaunch
your python interpreter from there."""
raise ImportError(msg)
So where does the __config__.py file come from then and how does that help? Let's follow below...
setup.py
When the package is installed, setup is called to run and it in turn does some configuration actions. This is essentially what ensures that the package is properly installed rather than being run from the download directory (which I think is what you are wanting to ensure).
The key here is this line:
config.make_config_py() # installs __config__.py
misc_util.py
That is imported from distutils/misc_util.py which we can follow all the way down to here.
def make_config_py(self,name='__config__'):
"""Generate package __config__.py file containing system_info
information used during building the package.
This file is installed to the
package installation directory.
"""
self.py_modules.append((self.name, name, generate_config_py))
Which is then running here which is writing in that __config__.py file with some system information and your show() function.
Summary
The import of __config__.py is attempted and fails which generates the error you are wanting to raise if setup.py wasn't run, which is what triggers that file to be properly created. That ensures not only that a file check is being done but that the file only exists in the installation directory. It is still some overhead of importing an additional file on every import but no matter what you do you're adding some amount of overhead making this check in the first place.
Suggestions
I think that you could implement a much lighter weight version of what numpy is doing while accomplishing the same thing.
Remove the distutils subfunction and create the checked file within your setup.py file as part of the standard install. It would only exist in the installed directory after installation and never elsewhere unless a user faked that (in which case they could get around just about anything you try probably).
As an alternative (without knowing your application and what your setup file is doing) possibly you have a function that is normally imported anyway that isn't key to the running of the application but is good to have available (in numpy's case the functions are information about the installation like version(). Instead of keeping those functions where you put them now, you make them part of this file that is created. Then you are at least loading something that you would otherwise load anyway from somewhere else.
Using this method you are importing something no matter what, which has some overhead, or raising the error. I think as far as methods to raise an error because they aren't working out of the installed directory, it's a pretty clean and straightforward way to do it. No matter what method you use, you have some overhead of using that method so I would focus on keeping the overhead low, simple, and not going to cause errors.
I wouldn't do something that is complicated like parsing the setup file or modifying necessary files like __init__.py somewhere. I think you are right that those methods would be more error prone.
Checking if setup.py exists could work but I would consider it less clean than attempting to import which is already optimized as a standard Python function. They accomplish similar things but I think implemented the numpy style is going to be more straight forward.

Manually adding libraries

Given that every library is a python code itself, I think its logical that instead of using the import command, we can actually copy the whole code of that library and paste it to the top of our main.py.
I'm working on a remote pc, I cannot install libraries, can I use a library by just doing this?
Forgive me if this a very silly question.
Thanks

In some cases yes, you can. But there are (a lot of) libraries that have some of their functionality written in C and compiled to binary (e.g. the famous numpy). You can't just paste those.
Another thing that the pasting might introduce are naming colisions. If you use
import module
than any name in the module module can be safely used in the importing module using module.name even if the name name is already defined somewhere in the importing module. If you just paste the code this won't work.

While pasting the entire library at the top of your main file can work, I don't think it's the best way to go about solving your problem.
One option is to move the library and put it in to the same folder as your main.py file. I believe the import statement will check the current working directory for the library before looking elsewhere.
Another option is to use a virtual environment(virtualenv) and then install all the required libraries within this virtual environment. I'm not sure that this will work for you since you said you cannot install on this libraries and virtualenv requires pep. If you are interested in learning more about python virtual environments, take a look here.

Most modules are actually written in C, like Pygame for example. Python itself is based on C. You can't jump to conclusions, but if the library is pure Python, I'd suggest copying the package into your project directory and importing, rather than copying and pasting code snippets.

What is Building and Installing?

This is probably a question that has a very easy and straightforward answer, however, despite having a few years programming experience, for some reason I still don't quite get the exact concepts of what it means to "build" and then to "install". I know how to use them and have used them a lot, but have no idea about the exact processes which happen in the background...
I have looked across the web, wikipedia, etc... but there is no one simple answer to it, neither can I find one here.
A good example, which I tried to understand, is adding new modules to python:
http://docs.python.org/2/install/index.html#how-installation-works
It says that "the build command is responsible for putting the files to install into a build directory"
And then for the install command: "After the build command runs (whether you run it explicitly, or the install command does it for you), the work of the install command is relatively simple: all it has to do is copy everything under build/lib (or build/lib.plat) to your chosen installation directory."
So essentially what this is saying is:
1. Copy everything to the build directory and then...
2. Copy everything to the installation directory
There must be a process missing somewhere in the explanation...complilation?
Would appreciate some straightforward not too techy answer but in as much detail as possible :)
Hopefully I am not the only one who doesn't know the detailed answer to this...
Thanks!
Aivoric

Building means compiling the source code to binary in a sandbox location where it won't affect your system if something goes wrong, like a build subdirectory inside the source code directory.
Install means copying the built binaries from the build subdirectory to a place in your system path, where they become easily accessible. This is rarely done by a straight copy command, and it's often done by some package manager that can track the files created and easily uninstall them later.
Usually, a build command does all the compiling and linking needed, but Python is an interpreted language, so if there are only pure Python files in the library, there's no compiling step in the build. Indeed, everything is copied to a build directory, and then copied again to a final location. Only if the library depends on code written in other languages that needs to be compiled you'll have a compiling step.

You want a new chair for your living-room and you want to make it yourself. You browse through a catalog and order a pile of parts. When they arrives at your door, you can't immediately use them. You have to build the chair at your workshop. After a bit of elbow-grease, you can sit down in it. Afterwards, you install the chair in your living-room, in a convenient place to sit down.
The chair is a program you want to use. It arrives at your house as source code. You build it by compiling it into a runnable program. You install it by making it easier to use.

The build and install commands you are refering to come from setup.py file right?
Setup.py (http://docs.python.org/2/distutils/setupscript.html)
This file is created by 3rd party applications / extensions of Python. They are not part of:
Python source code (bunch of c files, etc)
Python libraries that come bundled with Python
When a developer makes a library for python that he wants to share to the world he creates a setup.py file so the library can be installed on any computer that has python. Maybe this is the MISSING STEP
Setup.py sdist
This creates a python module (the tar.gz files). What this does is copy all the files used by the python library into a folder. Creates a setup.py file for the module and archives everything so the library can be built somewhere else.
Setup.py build
This builds the python module back into a library (SPECIFICALLY FOR THIS OS).
As you may know, the computer that the python library originally came from will be different from the library that you are installing on.
It might have a different version of python
It might have a different operating system
It might have a different processor / motherboard / etc
For all the reasons listed above the code will not work on another computer. So setup.py sdist creates a module with only the source files needed to rebuild the library on another computer.
What setup.py does exactly is similar to what a makefile would do. It compiles sources / creates libraries all that stuff.
Now we have a copy of all the files we need in the library and they will work on our computer / operating system.
Setup.py install
Great we have all the files needed. But they won't work. Why? Well they have to be added to Python that's why. This is where install comes in. Now that we have a local copy of the library we need to install it into python so you can use it like so:
import mycustomlibrary
In order to do this we need to do several things including:
Copy files to their library folders in our version of python.
Make sure library can be imported using import command
Run any special install instructions for this library. (seting up paths, etc)
This is the most complicated part of the task. What if our library uses BeautifulSoup? This is not a part of Python Library. We'd have to install it in a way such that our library and any others can use BeautifulSoup without interfering with each other.
Also what if python was installed someplace else? What if it was installed on a server with many users?
Install handles all these problems transparently. What is does is make the library that we just built able to run. All you have to do is use the import command, install handles the rest.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Can you cimport an .so file? - python

Related

Python sys.path vs import

if I make a function that calls a library like numpy, and someone else use it, they will need to have numpy installed?

Check if package is imported from within the source tree

Manually adding libraries

What is Building and Installing?

Categories

Resources