Whats the difference between a module and a library in Python? - python

I have background in Java and I am new to Python. I want to make sure I understand correctly Python terminology before I go ahead.
My understanding of a module is: a script which can be imported by many scripts, to make reading easier. Just like in java you have a class, and that class can be imported by many other classes.
My understanding of a library is: A library contains many modules which are separated by its use.
My question is: Are libraries like packages, where you have a package e.g. called food, then:
chocolate.py
sweets.py
biscuts.py
are contained in the food package?
Or do libraries use packages, so if we had another package drink:
milk.py
juice.py
contained in the package. The library contains two packages?
Also, an application programming interface (API) usually contains a set of libraries is this at the top of the hierarchy:
API
Library
Package
Module
Script
So an API will consist off all from 2-5?

From The Python Tutorial - Modules
Module:
A module is a file containing Python definitions and statements. The file name is the module name with the suffix .py appended.
Package:
Packages are a way of structuring Python’s module namespace by using “dotted module names”.
If you read the documentation for the import statement gives more details, for example:
Python has only one type of module object, and all modules are of this
type, regardless of whether the module is implemented in Python, C, or
something else. To help organize modules and provide a naming
hierarchy, Python has a concept of packages.
You can think of packages as the directories on a file system and
modules as files within directories, but don’t take this analogy too
literally since packages and modules need not originate from the file
system. For the purposes of this documentation, we’ll use this
convenient analogy of directories and files. Like file system
directories, packages are organized hierarchically, and packages may
themselves contain subpackages, as well as regular modules.
It’s important to keep in mind that all packages are modules, but not
all modules are packages. Or put another way, packages are just a
special kind of module. Specifically, any module that contains a
__path__ attribute is considered a package.
Hence the term module refers to a specific entity: it's a class whose instances are the module objects you use in python programs. It is also used, by analogy, to refer to the file in the file system from which these instances "are created".
The term script is used to refer to a module whose aim is to be executed. It has the same meaning as "program" or "application", but it is usually used to describe simple and small programs(i.e. a single file with at most some hundreds of lines). Writing a script takes minutes or few hours.
The term library is simply a generic term for a bunch of code that was designed with the aim of being usable by many applications. It provides some generic functionality that can be used by specific applications.
When a module/package/something else is "published" people often refer to it as a library. Often libraries contain a package or multiple related packages, but it could be even a single module.
Libraries usually do not provide any specific functionality, i.e. you cannot "run a library".
The API can have different meanings depending on the context. For example:
it can define a protocol like the DB API or the buffer protocol.
it can define how to interact with an application(e.g. the Python/C API)
when related to a library/package it simply the interface provided by that library for its functionality(set of functions/classes/constants etc.)
In any case an API is not python code. It's a description which may be more or less formal.

Only package and module have a well-defined meaning specific to Python.
An API is not a collection of code per se - it is more like a "protocol" specification how various parts (usually libraries) communicate with each other. There are a few notable "standard" APIs in python. E.g. the DB API
In my opinion, a library is anything that is not an application - in python, a library is a module - usually with submodules. The scope of a library is quite variable - for example the python standard library is vast (with quite a few submodules) while there are lots of single purpose libraries in the PyPi, e.g. a backport of collections.OrderedDict for py < 2.7
A package is a collection of python modules under a common namespace. In practice one is created by placing multiple python modules in a directory with a special __init__.py module (file).
A module is a single file of python code that is meant to be imported. This is a bit of a simplification since in practice quite a few modules detect when they are run as script and do something special in that case.
A script is a single file of python code that is meant to be executed as the 'main' program.
If you have a set of code that spans multiple files, you probably have an application instead of script.

Library : It is a collection of modules.
(Library either contains built in modules(written in C) + modules written in python).
Module : Each of a set of standardized parts or independent units that can be used to construct a more complex structure.
Speaking in informal language, A module is set of lines of code which are used for a specific purpose and can be used in other programs as it is , to avoid DRY(Don’t Repeat Yourself) as a team and focusing on main requirement. source
API is an interface for other applications to interact with your library without having direct access.
Package is basically a directory with files.
Script means series of commands within a single file.

I will try to answer this without using terms the earliest of beginners would use,and explain why or how they used differently, along with the most "official" and/or most understood or uniform use of the terms.
It can be confusing, and I confused myself thinking to hard, so don't think to much about it. Anyways context matters, greatly.
Library- Most often will refer to the general library or another collection created with a similar format and use. The General Library is the sum of 'standard', popular and widely used Modules, witch can be thought of as single file tools, for now or short cuts making things possible or faster. The general library is an option most people enable when installing Python. Because it has this name "Python General Library" it is used often with similar structure, and ideas. Witch is simply to have a bunch of Modules, maybe even packages grouped together, usually in a list. The list is usually to download them. Generally it is just related files, with similar interests. That is the easiest way to describe it.
Module- A Module refers to a file. The file has script 'in it' and the name of the file is the name of the module, Python files end with .py. All the file contains is code that ran together makes something happen, by using functions, strings ect.
Main modules you probably see most often are popular because they are special modules that can get info from other files/modules.
It is confusing because the name of the file and module are equal and just drop the .py. Really it's just code you can use as a shortcut written by somebody to make something easier or possible.
Package- This is a termis used to generally sometimes, although context makes a difference. The most common use from my experience is multiple modules (or files) that are grouped together. Why they are grouped together can be for a few reasons, that is when context matters.
These are ways I have noticed the term package(s) used. They are a group of Downloaded, created and/or stored modules. Which can all be true, or only 1, but really it is just a file that references other files, that need to be in the correct structure or format, and that entire sum is the package itself, installed or may have been included in the python general library. A package can contain modules(.py files) because they depend on each other and sometimes may not work correctly, or at all. There is always a common goal of every part (module/file) of a package, and the total sum of all of the parts is the package itself.
Most often in Python Packages are Modules, because the package name is the name of the module that is used to connect all the pieces. So you can input a package because it is a module, also allows it to call upon other modules, that are not packages because they only perform a certain function, or task don't involve other files. Packages have a goal, and each module works together to achieve that final goal.
Most confusion come from a simple file file name or prefix to a file, used as the module name then again the package name.
Remember Modules and Packages can be installed. Library is usually a generic term for listing, or formatting a group of modules and packages. Much like Pythons general library. A hierarchy would not work, APIs do not belong really, and if you did they could be anywhere and every ware involving Script, Module, and Packages, the worl library being such a general word, easily applied to many things, also makes API able to sit above or below that. Some Modules can be based off of other code, and that is the only time I think it would relate to a pure Python related discussion.

Related

Can an arbitrary Python program have its dependencies inlined?

In the JavaScript ecosystem, "compilers" exist which will take a program with a significant dependency chain (of other JavaScript libraries) and emit a standalone JavaScript program (often, with optimizations applied).
Does any equivalent tool exist for Python, able to generate a script with all non-standard-library dependencies inlined? Are there other tools/practices available for bundling in dependencies in Python?
Since your goal is to be cross-architecture, any Python program which relies on native C modules will not be possible with this approach.
In general, using virtualenv to create a target environment will mean that even users who don't have permission to install new system-level software can install dependencies under their own home directory; thus, what you ask about is not often needed in practice.
However, if you wanted to do things that are consider evil / bad practices, pure-Python modules can in fact be bundled into a script; thus, a tool of this sort would be possible for modules with only native-Python dependencies!
If I were writing such a tool, I might start the following way:
Use pickle to serialize content of modules on the "sending" side
In the loader code, use imp.create_module() to create new module objects, and assign unpickled objects to them.

What is the argument for Python to seemingly frown on importing from different directories?

This might be a more broad question, and more related to understanding Python's nature and probably good programming practices in general.
I have a file, called util.py. It has a lot of different small functions I've collected over the past few months that are useful when doing various machine learning tasks.
My thinking is this: I'd like to continue adding important functions to this script as I go. As so, I will want to use import util.py often, now and in the future, in many unrelated projects.
But Python seems to feel like I should only be able to access the code in this file if it lives in my current directly, even if the functions in this file are useful for scripts in different directories. I sense some reason behind the way that works that I don't fully grasp; to me, it seems like I'll be forced to make unnecessary copies often.
If I should have to create a new copy of util.py every time I'm working from within a new directory, on a different project, it won't be long until I have many different version / iterations of this file, scattered all over my hard drive, in various states. I don't desire this degree of modularity in my programming -- for the sake of simplicity, repeatability, and clarity, I want only one file in only one location, accessible to many projects.
The question in a nutshell: What is the argument for Python to seemingly frown on importing from different directories?
If your util.py file contains functions you're using in a lot of different projects, then it's actually a library, and you should package it as such so you can install it in any Python environment with a single line (python setup.py install), and update it if required (Python's packaging ecosystem has several features to track and update library versions).
An added benefit is that right now, if you're doing what the other answers suggested, you have to remember to manually have put util.py in your PYTHONPATH (the "dirty" way). If you try to run one of your programs and you haven't done that, you'll get a cryptic ImportError that doesn't explain much: is it a missing dependency? A typo in the program?
Now think about what happens if someone other than you tries to run the program(s) and gets those error messages.
If you have a library, on the other hand, trying to set up your program will either complain in clear, understandable language that the library is missing or out of date, or (if you've taken the appropriate steps) automatically download and install it so things are ready to roll.
On a related topic, having a file/module/namespace called "util" is a sign of bad design. What are these utilities for? It's the programming equivalent of a "miscellaneous" folder: eventually, everything will end up in it and you'll have no way to know what it contains other than opening it and reading it all.
Another way, is adding the directory/you/want/to/import/from to the path from within the scripts that need it.
You should have a file __init__.py in the same folder where utils.py lives, to tell python to treat the folder as a package. The file __init__.py may be empty, or not, you can define other things in there.
Example:
/home/marcos/python/proj1/
__init__.py
utils.py
/home/marcos/school_projects/final_assignment/
my_scrpyt.py
And then inside my_script.py
import sys
sys.path.append('/home/marcos/python/')
from proj1 import utils
MAX_HEIGHT = utils.SOME_CONSTANT
a_value = utils.some_function()
First, define an environment variable. If you are using bash, for example, then put the following in the appropriate startup file:
export PYTHONPATH=/path/to/my/python/utilities
Now, put your util.py and any of your other common modules or packages in that directory. Now you can import util from anywhere and python will find it.

Module naming convention (avoid collisions)

I'm learning Python and I have already created few ad-hoc utility modules that I use for whatever. I don't intend to install them anywhere, having them simply laying around and copying them wherever needed is OK for me now.
So I typically just create a file named like mymodule.py and import mymodule from a script in the same directory. For names I use lowercase alphabet (i.e. no _s) only. So now after I have seen couple of "real" Python modules and realized that the convention is mostly the same, I'm starting to wonder about clashes.
Is there a convention for naming own ad-hoc modules, so that one can avoid clashes with core or "pip" modules (even future ones that are yet to be added)?
Similar convention exists in Perl community (especially because of CPAN), where all such modules should start with Local::, as in Local::MyCrazyModule.
Note: There is a similar question here on SO, but that one does not seem to ask specifically about modules, but rather about variable names clashing with modules.
The easiest way to not to conflict with pip modules is to create dump module with your name and publish it to pypi. And then you can use this top level namespace for all your modeles, the published or not published.
For more informations about name conversions you can read: http://www.python.org/dev/peps/pep-0423/

Embedding Python on Windows: why does it have to be a DLL?

I'm trying to write a software plug-in that embeds Python. On Windows the plug-in is technically a DLL (this may be relevant). The Python Windows FAQ says:
1.Do not build Python into your .exe file directly. On Windows, Python must be a DLL to handle importing modules that are themselves DLL’s. (This is the first key undocumented fact.) Instead, link to pythonNN.dll; it is typically installed in C:\Windows\System. NN is the Python version, a number such as “23” for Python 2.3.
My question is why exactly Python must be a DLL? If, as in my case, the host application is not an .exe, but also a DLL, could I build Python into it? Or, perhaps, this note means that third-party C extensions rely on pythonN.N.dll to be present and other DLL won't do? Assuming that I'd really want to have a single DLL, what should I do?
I see there's the dynload_win.c file, which appears to be the module to import C extensions on Windows and, as far as I can see, it scans the extension file to find which pythonX.X.dll it imports; but I'm not experienced with Windows and I don't quite understand all the code there.
You need to link to pythonXY.dll as a DLL, instead of linking the relevant code directly into your executable, because otherwise the Python runtime can't load other DLLs (the extension modules it relies on.) If you make your own DLL you could theoretically link all the Python code in that DLL directly, since it doesn't end up in the executable but still in a DLL. You'll have to take care to do the linking correctly, however, as pretty much none of the standard tools (like distutils) will do this for you.
However, regardless of how you embed Python, you can't make do with just the DLL, nor can you make do with just any DLL. The ABI changes between Python versions, so if you compiled your code against Python 2.6, you need python26.dll; you can't use python25.dll or python27.dll. And Python isn't just a DLL; it also needs its standard library, which includes extension modules (which are DLLs themselves, although they have the .pyd extension.) The code in dynload_win.c you ran into is for loading those DLLs, and are not related to loading of pythonXY.dll.
In short, in order to embed Python in your plugin, you need to either ship Python with the plugin, or require that the right Python version is already installed.
(Sorry, I did a stupid thing, I first wrote the question, and then registered, and now I cannot alter it or comment on the replies, because StackOverflow's engine doesn't think I'm the author. I cannot even properly thank those who replied :( So this is actually an update to the question and comments.)
Thanks for all the advice, it's very valuable. As far as I understand with some effort I can link Python statically into a custom DLL, provided that I compile other dynamically loaded extensions myself and link them against the same DLL. (I know I need to ship the standard library too; my plan was to append a zipped archive to the DLL file. As far as I understand, I will even be able to import pure Python modules from it.)
I also found an interesting place in dynload_win.c. (I understand it loads dynamic extensions that use Python C API, e.g. _ctypes.) As far as I can see it not only looks for init_ctypes symbol or whatever the extension name is, but also scans the .pyd file's import table looking for (regex) python\d+\. and then compares the found symbol with known pythonNN. string to make sure the extension was compiled for this version of Python. If the import table doesn't have such a symbol or it refers to another version, it raises an error.
For me it means that:
If I link an extension against pythonNN.dll and try to load it from my custom DLL that includes a statically linked Python, it will pass the check, but — well, here I'm not sure: will it fail because there's no pythonNN.dll (i.e. even before getting to the check) or it will happily load the symbols?
And if I link it against my custom DLL, it will find symbols, but won't pass the check :)
I guess I could rewrite this piece to suit my needs... Are there any other such places, I wonder.
Python needs to be a dll (with a standard name) such that your application, and the plugin, can use the same instance of python.
Plugin dlls are already going to expect to be loading (and using python from) a python26.dll (or whichever version) - if your python is statically embedded in your exe, then two different instances of the python library would be managing the same data structures.
If the python libraries use no static variables at all, and the compile settings are exactly the same this should not be a problem. However, generally its far safer to simply ensure that only one instance of the python interpreter is being used.
On *nix, all shared objects in a process, including the executable, contribute their exported names into a common pool; any of the shared objects can then pull any of the names from the pool and use them as they like. This allows e.g. cStringIO.so to pull the relevant Python library functions from the main executable when the Python library is statically-linked.
On Windows, each shared object has its own independent pool of names it can use. This means that it must read the relevant different shared objects it needs functions from. Since it is a lot of work to get all the names from the main executable, the Python functions are separated out into their own DLL.

How to organise the file structure of my already working plugin system?

I am working on a project whose main design guiding principle is extensibility.
I implemented a plugin system by defining a metaclass that register - with a class method - the class name of any plugin that gets loaded (each type of plugin inherit from a specific class defined in the core code, as there are different types of plugins in the application). Basically this means that a developer will have to define his class as
class PieChart(ChartPluginAncestor):
# Duck typing:
# Implement compulsory methods for Plugins
# extending Chart functionality
and the main program will know of his presence because PieChart will be included in the list of registered plugins available at ChartPluginAncestor.plugins.
Being the mounting method a class method, all plugins get registered when their class code is loaded into memory (so even before an object of that class is instantiated).
The system works good enough™ for me (although I am always open to suggestions on how to improve the architecture!) but I am now wondering what would be the best way to manage the plugin files (i.e. where and how the files containing the plugins should be stored).
So far I am using - for developing purposes - a package that I called "plugins". I put all my *.py files containing plugins classes in the package directory, and I simply issue import plugins in the main.py file, for all the plugins to get mounted properly.
EDIT: Jeff pointed out in the comments that import plugins the classes contained in the various modules of the packages won't be readily available (I did not realise this as I was - for debugging purposes - importing each class separately with from plugins.myAI import AI).
However this system is only good while I am developing and testing the code, as:
Plugins might come with their own unittests, and I do not want to load those in memory.
All plugins are currently loaded into memory, but indeed there are certain plugins which are alternative versions of the same feature, so you really just need to know that you can switch between the two, but you want to load into memory just the one you picked from the config pane.
At some point, I will want to have a double location for installing plugins: a system-wide location (for example somewhere under /usr/local/bin/) and a user-specific one (for example somewhere under /home/<user>/.myprogram/).
So my questions are really - perhaps - three:
Plugin container: what is the most sensible choice for my goal? single files? packages? a simple directory of .py files?)
Recognise the presence of plugins without necessarily loading (importing) them: what is a smart way to use Python introspection to do so?
Placing plugins in two different locations: is there a standard way / best practice (under gnu/linux, at least) to do that?
The question is hard to address, because the needs are complex.
Anyway I will try with some suggestions.
About
Placing plugins in two different
locations: is there a standard way /
best practice (under gnu/linux, at
least) to do that?
A good approach is virtualenv. Virtualenv is a python module to build "isolated" python installation. It is the better way to get separate projects working together.
You get a brand new site-package where you can put your plugins with the relevant project modules.
Give it a try: http://pypi.python.org/pypi/virtualenv
Plugin container: what is the most
sensible choice for my goal? single
files? packages? a simple directory of
.py files?)
A good approach is a python package which can do a "self registration" upon import: simply define inside the package directory a proper init.py
An example can be http://www.qgis.org/wiki/Writing_Python_Plugins
and also the API described here http://twistedmatrix.com/documents/current/core/howto/plugin.html
See also http://pypi.python.org/pypi/giblets/0.2.1
Giblets is a simple plugin system
based on the component architecture of
Trac. In a nutshell, giblets allows
you to declare interfaces and discover
components that implement them without
coupling.
Giblets also includes plugin discovery
based on file paths or entry points
along with flexible means to manage
which components are enabled or
disabled in your application.
I also have a plugin system with three types of plugins, though I don't claim to have done it well. You can see some details here.
For internal plugins, I have a package (e.g., MethodPlugins) and in this package is a module for each plugin (e.g., MethodPlugins.IRV). Here is how I load the plugins:
Load the package (import MethodPlugins)
Use pkgutil.iter_modules to load all the modules there (e.g., MethodPlugins.IRV)
All the plugins descend from a common base class so I can use __subclassess__ to identify them all.
I believe this would allow you to recognize plugins without actually loading them, though I don't do that as I just load them all.
For external plugins, I have a specified directory where users can put them, and I use os.listdir to import them. The user is required to use the right base class so I can find them.
I would be interested in improving this as well, but it also works good enough for me. :)

Categories

Resources