I have an environment with some extreme constraints that require me to reduce the size of a planned Python 3.8.1 installation. The OS is not connected to the internet, and a user will never open an interactive shell or attach a debugger.
There are of course lots of ways to do this, and one of the ways I am exploring is to remove some core modules, for example python3-email. I am concerned that there are 3rd-party packages that future developers may include in their apps that have unused but required dependencies on core python features. For example, if python3-email is missing, what 3rd-party packages might not work that one would expect too? If a developer decides to use a logging package that contains an unreferenced EmailLogger class in a referenced module, it will break, simply because import email appears at the top.
Do package design requirements or guidelines exist that address this?
It's an interesting question, but it is too broad to be cleanly answered here. In short, the Python standard library is expected to always be there, even though sometimes it broken up in multiple parts (Debian for example). But you say it yourself, you don't know what your requirements are since you don't know yet what future packages will run on this interpreter... This is impossible to answer. One thing you could do is to use something like modulefinder on the future code before letting it run on that constrained Python interpreter.
I was able to get to a solution. The issue was best described to me as cascading imports. It is possible to stop a module from being loaded, by adding an entry to sys.modules. For example, when importing the asyncio module ssl and _ssl modules will be loaded, even though they are not needed outside of ssl. This can be stopped with the following code. This can be verified both by seeing the python process is 3MB smaller, but also by using module load hooks to watch each module as it loads:
import importhook
import sys
sys.modules['ssl'] = None
#importhook.on_import(importhook.ANY_MODULE)
def on_any_import(module):
print(module.__spec__.name)
assert module.__spec__.name not in ['ssl', '_ssl']
import asyncio
For my original question about 3rd-party design guidelines, some recommend placing the import statements within the class rathe that at the module level, however this is not routinely done.
In Python ( CPython) we can import module:
import module and module can be just *.py file ( with a python code) or module can be a module written in C/C++ ( be extending python). So, a such module is just compiled object file ( like *.so/*.o on the Unix).
I would like to know how is it executed by the interpreter exactly.
I think that python module is compiled to a bytecode and then it will be interpreted. In the case of C/C++ module functions from a such module are just executed. So, jump to the address and start execution.
Please correct me if I am wrong/ Please say more.
When you import a C extension, python uses the platform's shared library loader to load the library and then, as you say, jumps to a function in the library. But you can't load just any library or jump to any function this way. It only works for libs specifically implemented to support python and to functions that are exported by the library as a python object. The lib must understand python objects and use those objects to communicate.
Alternately, instead of importing, you can use a foreign-function library like ctypes to load the library and convert data to the C view of data to make calls.
I am trying to debug a file for a project I am working on, and the first thing I made sure to do is install/build all of the modules that the file is importing. Thisis the first line of the file:
from scitbx.array_family import flex
which in turn reads from flex.py,
from __future__ import division
import boost.optional # import dependency
import boost.std_pair # import dependency
import boost.python
I entered the commands in ipython individually and get stuck on importing boost.optional. Since they are all from the same module I tried searching for the module named boost.
I found the site: http://www.boost.org/doc/libs/1_57_0/more/getting_started/unix-variants.html
and installed the related .bz2 file in the same directory as my other modules to make sure it is within sys.path. However I still can't get ipython to import anything. Am I completely off base in my approach or is there some other boost module that I can't find? I should mention that I am a complete novice with computers, and am learning as I go along. Any advice is much appreciated!
The library you have installed is called Boost. This is a collection of C++ libraries, one of which is Boost.Python. However this library doesn't provide Python modules that you can import directly - it doesn't provide boost.optional. Instead it enables interoperability between Python and C++ - you can write a C++ library using Boost.Python that can then be used in a normal Python interpreter.
In you case boost.optional is provided by the CCTBX collection of software, which does depend on Boost and Boost.Python. So you are not too far off. This thread in the mailing list covers your error message and some potential solutions.
Essentially you need to use the custom cctbx.python command (or scitbx.python, they are equivalent) to run python as this sets the PYTHONPATH correctly for their requirements. It's also documented on this page.
I have background in Java and I am new to Python. I want to make sure I understand correctly Python terminology before I go ahead.
My understanding of a module is: a script which can be imported by many scripts, to make reading easier. Just like in java you have a class, and that class can be imported by many other classes.
My understanding of a library is: A library contains many modules which are separated by its use.
My question is: Are libraries like packages, where you have a package e.g. called food, then:
chocolate.py
sweets.py
biscuts.py
are contained in the food package?
Or do libraries use packages, so if we had another package drink:
milk.py
juice.py
contained in the package. The library contains two packages?
Also, an application programming interface (API) usually contains a set of libraries is this at the top of the hierarchy:
API
Library
Package
Module
Script
So an API will consist off all from 2-5?
From The Python Tutorial - Modules
Module:
A module is a file containing Python definitions and statements. The file name is the module name with the suffix .py appended.
Package:
Packages are a way of structuring Python’s module namespace by using “dotted module names”.
If you read the documentation for the import statement gives more details, for example:
Python has only one type of module object, and all modules are of this
type, regardless of whether the module is implemented in Python, C, or
something else. To help organize modules and provide a naming
hierarchy, Python has a concept of packages.
You can think of packages as the directories on a file system and
modules as files within directories, but don’t take this analogy too
literally since packages and modules need not originate from the file
system. For the purposes of this documentation, we’ll use this
convenient analogy of directories and files. Like file system
directories, packages are organized hierarchically, and packages may
themselves contain subpackages, as well as regular modules.
It’s important to keep in mind that all packages are modules, but not
all modules are packages. Or put another way, packages are just a
special kind of module. Specifically, any module that contains a
__path__ attribute is considered a package.
Hence the term module refers to a specific entity: it's a class whose instances are the module objects you use in python programs. It is also used, by analogy, to refer to the file in the file system from which these instances "are created".
The term script is used to refer to a module whose aim is to be executed. It has the same meaning as "program" or "application", but it is usually used to describe simple and small programs(i.e. a single file with at most some hundreds of lines). Writing a script takes minutes or few hours.
The term library is simply a generic term for a bunch of code that was designed with the aim of being usable by many applications. It provides some generic functionality that can be used by specific applications.
When a module/package/something else is "published" people often refer to it as a library. Often libraries contain a package or multiple related packages, but it could be even a single module.
Libraries usually do not provide any specific functionality, i.e. you cannot "run a library".
The API can have different meanings depending on the context. For example:
it can define a protocol like the DB API or the buffer protocol.
it can define how to interact with an application(e.g. the Python/C API)
when related to a library/package it simply the interface provided by that library for its functionality(set of functions/classes/constants etc.)
In any case an API is not python code. It's a description which may be more or less formal.
Only package and module have a well-defined meaning specific to Python.
An API is not a collection of code per se - it is more like a "protocol" specification how various parts (usually libraries) communicate with each other. There are a few notable "standard" APIs in python. E.g. the DB API
In my opinion, a library is anything that is not an application - in python, a library is a module - usually with submodules. The scope of a library is quite variable - for example the python standard library is vast (with quite a few submodules) while there are lots of single purpose libraries in the PyPi, e.g. a backport of collections.OrderedDict for py < 2.7
A package is a collection of python modules under a common namespace. In practice one is created by placing multiple python modules in a directory with a special __init__.py module (file).
A module is a single file of python code that is meant to be imported. This is a bit of a simplification since in practice quite a few modules detect when they are run as script and do something special in that case.
A script is a single file of python code that is meant to be executed as the 'main' program.
If you have a set of code that spans multiple files, you probably have an application instead of script.
Library : It is a collection of modules.
(Library either contains built in modules(written in C) + modules written in python).
Module : Each of a set of standardized parts or independent units that can be used to construct a more complex structure.
Speaking in informal language, A module is set of lines of code which are used for a specific purpose and can be used in other programs as it is , to avoid DRY(Don’t Repeat Yourself) as a team and focusing on main requirement. source
API is an interface for other applications to interact with your library without having direct access.
Package is basically a directory with files.
Script means series of commands within a single file.
I will try to answer this without using terms the earliest of beginners would use,and explain why or how they used differently, along with the most "official" and/or most understood or uniform use of the terms.
It can be confusing, and I confused myself thinking to hard, so don't think to much about it. Anyways context matters, greatly.
Library- Most often will refer to the general library or another collection created with a similar format and use. The General Library is the sum of 'standard', popular and widely used Modules, witch can be thought of as single file tools, for now or short cuts making things possible or faster. The general library is an option most people enable when installing Python. Because it has this name "Python General Library" it is used often with similar structure, and ideas. Witch is simply to have a bunch of Modules, maybe even packages grouped together, usually in a list. The list is usually to download them. Generally it is just related files, with similar interests. That is the easiest way to describe it.
Module- A Module refers to a file. The file has script 'in it' and the name of the file is the name of the module, Python files end with .py. All the file contains is code that ran together makes something happen, by using functions, strings ect.
Main modules you probably see most often are popular because they are special modules that can get info from other files/modules.
It is confusing because the name of the file and module are equal and just drop the .py. Really it's just code you can use as a shortcut written by somebody to make something easier or possible.
Package- This is a termis used to generally sometimes, although context makes a difference. The most common use from my experience is multiple modules (or files) that are grouped together. Why they are grouped together can be for a few reasons, that is when context matters.
These are ways I have noticed the term package(s) used. They are a group of Downloaded, created and/or stored modules. Which can all be true, or only 1, but really it is just a file that references other files, that need to be in the correct structure or format, and that entire sum is the package itself, installed or may have been included in the python general library. A package can contain modules(.py files) because they depend on each other and sometimes may not work correctly, or at all. There is always a common goal of every part (module/file) of a package, and the total sum of all of the parts is the package itself.
Most often in Python Packages are Modules, because the package name is the name of the module that is used to connect all the pieces. So you can input a package because it is a module, also allows it to call upon other modules, that are not packages because they only perform a certain function, or task don't involve other files. Packages have a goal, and each module works together to achieve that final goal.
Most confusion come from a simple file file name or prefix to a file, used as the module name then again the package name.
Remember Modules and Packages can be installed. Library is usually a generic term for listing, or formatting a group of modules and packages. Much like Pythons general library. A hierarchy would not work, APIs do not belong really, and if you did they could be anywhere and every ware involving Script, Module, and Packages, the worl library being such a general word, easily applied to many things, also makes API able to sit above or below that. Some Modules can be based off of other code, and that is the only time I think it would relate to a pure Python related discussion.
I am working on Python version 2.7.
I have a module extension for Python written in C.
The module initialization function PyMODINIT_FUNC initmymodule contains some code for initializing OpenSSL library. My module built as shared library and loading via imp.load_dynamic
This module may loading many times and I can't control it. Django and python doing that. And when it loading twice then OPENSSL_config function calling twice too. And it leading to process crash.
I can't control it from C-code, I can't control it from Python-code.
Here look at the docs
http://docs.python.org/2.7/library/imp.html
It says:
imp.load_dynamic Load and initialize a module implemented as a
dynamically loadable shared library and return its module object. If
the module was already initialized, it will be initialized again.
Nice.
I found that the similar problem was solved in Python version 3.4
http://hg.python.org/cpython/file/ad51ed93377c/Python/import.c#l459
Modules which do support multiple initialization set their m_size
field to a non-negative number (indicating the size of the
module-specific state). They are still recorded in the extensions
dictionary, to avoid loading shared libraries twice.
But what shall I do in Python 2.7?
Maybe to do workaround by registering own custom import hook where you could control the case which causes you problem (prevent double initialization). Some references for writing custom import hooks:
Python import hooks article
PEP-302 New Import Hooks - python 2.3+
create and register custom import/reload functions - example of implementation in project lazy_reload
This is hackish solution, so I suggest extra caution if this is to be used in production systems.
I have found the cause of my problem. It happens because my django application uses driver to connect PostgreSQL and this driver loads OpenSSL library. It leads to conflict just as user315052 showed in this comment
I think I have to take out all crypto-functionality of my application to a separate process.