I'm learning Python and I have already created few ad-hoc utility modules that I use for whatever. I don't intend to install them anywhere, having them simply laying around and copying them wherever needed is OK for me now.
So I typically just create a file named like mymodule.py and import mymodule from a script in the same directory. For names I use lowercase alphabet (i.e. no _s) only. So now after I have seen couple of "real" Python modules and realized that the convention is mostly the same, I'm starting to wonder about clashes.
Is there a convention for naming own ad-hoc modules, so that one can avoid clashes with core or "pip" modules (even future ones that are yet to be added)?
Similar convention exists in Perl community (especially because of CPAN), where all such modules should start with Local::, as in Local::MyCrazyModule.
Note: There is a similar question here on SO, but that one does not seem to ask specifically about modules, but rather about variable names clashing with modules.
The easiest way to not to conflict with pip modules is to create dump module with your name and publish it to pypi. And then you can use this top level namespace for all your modeles, the published or not published.
For more informations about name conversions you can read: http://www.python.org/dev/peps/pep-0423/
Related
I'm buiding an application that depends on some_package (which is rather large) as installed through pip or conda. I would like to reuse parts of some_package directly in the application; to that end, I have forked some_package, installed it locally, and modified its functionality as needed. The application now depends on two (diverging) versions of the same package of the same name for different functionality.
How do I refer to the pip/conda managed ~/anaconda3/envs/my_env/lib/python3.7/site-packages/some_package/ for internal dependency, and the modified ~/my_project/dependencies/some_package/ for use in my application?
There are several questions on Stack Overflow, but they are either quite old or not the same question:
Python: Two packages with the same name; how do you specify which is loaded?
Is it possible to use two Python packages with the same name?
Importing from builtin library when module with same name exists
What I've tried:
conda develop <local package path> : in this case, the site-package is not visible and breaks internal dependencies
changing the name of the local package folder and importing: there are internal references to the package name that would mean renaming everywhere, and create a management mess if I ever wish to pull new code on the fork
a comment suggested import some_package as package_dev: this obviously won't work as I have no way to refer to both packages in the first place
In the linked questions (and others), there are a number of hacks that will kind of work but break the import system in subtle ways (reload, for package updates, etc). Is there a "pythonic"/recommended way to accomplish this?
This might be a more broad question, and more related to understanding Python's nature and probably good programming practices in general.
I have a file, called util.py. It has a lot of different small functions I've collected over the past few months that are useful when doing various machine learning tasks.
My thinking is this: I'd like to continue adding important functions to this script as I go. As so, I will want to use import util.py often, now and in the future, in many unrelated projects.
But Python seems to feel like I should only be able to access the code in this file if it lives in my current directly, even if the functions in this file are useful for scripts in different directories. I sense some reason behind the way that works that I don't fully grasp; to me, it seems like I'll be forced to make unnecessary copies often.
If I should have to create a new copy of util.py every time I'm working from within a new directory, on a different project, it won't be long until I have many different version / iterations of this file, scattered all over my hard drive, in various states. I don't desire this degree of modularity in my programming -- for the sake of simplicity, repeatability, and clarity, I want only one file in only one location, accessible to many projects.
The question in a nutshell: What is the argument for Python to seemingly frown on importing from different directories?
If your util.py file contains functions you're using in a lot of different projects, then it's actually a library, and you should package it as such so you can install it in any Python environment with a single line (python setup.py install), and update it if required (Python's packaging ecosystem has several features to track and update library versions).
An added benefit is that right now, if you're doing what the other answers suggested, you have to remember to manually have put util.py in your PYTHONPATH (the "dirty" way). If you try to run one of your programs and you haven't done that, you'll get a cryptic ImportError that doesn't explain much: is it a missing dependency? A typo in the program?
Now think about what happens if someone other than you tries to run the program(s) and gets those error messages.
If you have a library, on the other hand, trying to set up your program will either complain in clear, understandable language that the library is missing or out of date, or (if you've taken the appropriate steps) automatically download and install it so things are ready to roll.
On a related topic, having a file/module/namespace called "util" is a sign of bad design. What are these utilities for? It's the programming equivalent of a "miscellaneous" folder: eventually, everything will end up in it and you'll have no way to know what it contains other than opening it and reading it all.
Another way, is adding the directory/you/want/to/import/from to the path from within the scripts that need it.
You should have a file __init__.py in the same folder where utils.py lives, to tell python to treat the folder as a package. The file __init__.py may be empty, or not, you can define other things in there.
Example:
/home/marcos/python/proj1/
__init__.py
utils.py
/home/marcos/school_projects/final_assignment/
my_scrpyt.py
And then inside my_script.py
import sys
sys.path.append('/home/marcos/python/')
from proj1 import utils
MAX_HEIGHT = utils.SOME_CONSTANT
a_value = utils.some_function()
First, define an environment variable. If you are using bash, for example, then put the following in the appropriate startup file:
export PYTHONPATH=/path/to/my/python/utilities
Now, put your util.py and any of your other common modules or packages in that directory. Now you can import util from anywhere and python will find it.
I have background in Java and I am new to Python. I want to make sure I understand correctly Python terminology before I go ahead.
My understanding of a module is: a script which can be imported by many scripts, to make reading easier. Just like in java you have a class, and that class can be imported by many other classes.
My understanding of a library is: A library contains many modules which are separated by its use.
My question is: Are libraries like packages, where you have a package e.g. called food, then:
chocolate.py
sweets.py
biscuts.py
are contained in the food package?
Or do libraries use packages, so if we had another package drink:
milk.py
juice.py
contained in the package. The library contains two packages?
Also, an application programming interface (API) usually contains a set of libraries is this at the top of the hierarchy:
API
Library
Package
Module
Script
So an API will consist off all from 2-5?
From The Python Tutorial - Modules
Module:
A module is a file containing Python definitions and statements. The file name is the module name with the suffix .py appended.
Package:
Packages are a way of structuring Python’s module namespace by using “dotted module names”.
If you read the documentation for the import statement gives more details, for example:
Python has only one type of module object, and all modules are of this
type, regardless of whether the module is implemented in Python, C, or
something else. To help organize modules and provide a naming
hierarchy, Python has a concept of packages.
You can think of packages as the directories on a file system and
modules as files within directories, but don’t take this analogy too
literally since packages and modules need not originate from the file
system. For the purposes of this documentation, we’ll use this
convenient analogy of directories and files. Like file system
directories, packages are organized hierarchically, and packages may
themselves contain subpackages, as well as regular modules.
It’s important to keep in mind that all packages are modules, but not
all modules are packages. Or put another way, packages are just a
special kind of module. Specifically, any module that contains a
__path__ attribute is considered a package.
Hence the term module refers to a specific entity: it's a class whose instances are the module objects you use in python programs. It is also used, by analogy, to refer to the file in the file system from which these instances "are created".
The term script is used to refer to a module whose aim is to be executed. It has the same meaning as "program" or "application", but it is usually used to describe simple and small programs(i.e. a single file with at most some hundreds of lines). Writing a script takes minutes or few hours.
The term library is simply a generic term for a bunch of code that was designed with the aim of being usable by many applications. It provides some generic functionality that can be used by specific applications.
When a module/package/something else is "published" people often refer to it as a library. Often libraries contain a package or multiple related packages, but it could be even a single module.
Libraries usually do not provide any specific functionality, i.e. you cannot "run a library".
The API can have different meanings depending on the context. For example:
it can define a protocol like the DB API or the buffer protocol.
it can define how to interact with an application(e.g. the Python/C API)
when related to a library/package it simply the interface provided by that library for its functionality(set of functions/classes/constants etc.)
In any case an API is not python code. It's a description which may be more or less formal.
Only package and module have a well-defined meaning specific to Python.
An API is not a collection of code per se - it is more like a "protocol" specification how various parts (usually libraries) communicate with each other. There are a few notable "standard" APIs in python. E.g. the DB API
In my opinion, a library is anything that is not an application - in python, a library is a module - usually with submodules. The scope of a library is quite variable - for example the python standard library is vast (with quite a few submodules) while there are lots of single purpose libraries in the PyPi, e.g. a backport of collections.OrderedDict for py < 2.7
A package is a collection of python modules under a common namespace. In practice one is created by placing multiple python modules in a directory with a special __init__.py module (file).
A module is a single file of python code that is meant to be imported. This is a bit of a simplification since in practice quite a few modules detect when they are run as script and do something special in that case.
A script is a single file of python code that is meant to be executed as the 'main' program.
If you have a set of code that spans multiple files, you probably have an application instead of script.
Library : It is a collection of modules.
(Library either contains built in modules(written in C) + modules written in python).
Module : Each of a set of standardized parts or independent units that can be used to construct a more complex structure.
Speaking in informal language, A module is set of lines of code which are used for a specific purpose and can be used in other programs as it is , to avoid DRY(Don’t Repeat Yourself) as a team and focusing on main requirement. source
API is an interface for other applications to interact with your library without having direct access.
Package is basically a directory with files.
Script means series of commands within a single file.
I will try to answer this without using terms the earliest of beginners would use,and explain why or how they used differently, along with the most "official" and/or most understood or uniform use of the terms.
It can be confusing, and I confused myself thinking to hard, so don't think to much about it. Anyways context matters, greatly.
Library- Most often will refer to the general library or another collection created with a similar format and use. The General Library is the sum of 'standard', popular and widely used Modules, witch can be thought of as single file tools, for now or short cuts making things possible or faster. The general library is an option most people enable when installing Python. Because it has this name "Python General Library" it is used often with similar structure, and ideas. Witch is simply to have a bunch of Modules, maybe even packages grouped together, usually in a list. The list is usually to download them. Generally it is just related files, with similar interests. That is the easiest way to describe it.
Module- A Module refers to a file. The file has script 'in it' and the name of the file is the name of the module, Python files end with .py. All the file contains is code that ran together makes something happen, by using functions, strings ect.
Main modules you probably see most often are popular because they are special modules that can get info from other files/modules.
It is confusing because the name of the file and module are equal and just drop the .py. Really it's just code you can use as a shortcut written by somebody to make something easier or possible.
Package- This is a termis used to generally sometimes, although context makes a difference. The most common use from my experience is multiple modules (or files) that are grouped together. Why they are grouped together can be for a few reasons, that is when context matters.
These are ways I have noticed the term package(s) used. They are a group of Downloaded, created and/or stored modules. Which can all be true, or only 1, but really it is just a file that references other files, that need to be in the correct structure or format, and that entire sum is the package itself, installed or may have been included in the python general library. A package can contain modules(.py files) because they depend on each other and sometimes may not work correctly, or at all. There is always a common goal of every part (module/file) of a package, and the total sum of all of the parts is the package itself.
Most often in Python Packages are Modules, because the package name is the name of the module that is used to connect all the pieces. So you can input a package because it is a module, also allows it to call upon other modules, that are not packages because they only perform a certain function, or task don't involve other files. Packages have a goal, and each module works together to achieve that final goal.
Most confusion come from a simple file file name or prefix to a file, used as the module name then again the package name.
Remember Modules and Packages can be installed. Library is usually a generic term for listing, or formatting a group of modules and packages. Much like Pythons general library. A hierarchy would not work, APIs do not belong really, and if you did they could be anywhere and every ware involving Script, Module, and Packages, the worl library being such a general word, easily applied to many things, also makes API able to sit above or below that. Some Modules can be based off of other code, and that is the only time I think it would relate to a pure Python related discussion.
From Namespace Packages in distribute, I know I can make use of namespace packages to separate a big Python package into several smaller ones. It is really awesome. The document also mentions:
Note, by the way, that your project’s source tree must include the
namespace packages’ __init__.py files (and the __init__.py of any
parent packages), in a normal Python package layout. These __init__.py
files must contain the line:
__import__('pkg_resources').declare_namespace(__name__)
This code ensures that the namespace package machinery is operating
and that the current package is registered as a namespace package.
I'm wondering are there any benefits to keep the same hierarchy of directories to the hierarchy of packages? Or, this is just the technical requirement of the namespace packages feature of distribute/setuptools?
Ex,
I would like to provide a sub-package foo.bar, such that I have to build the following hierarchy of folders and prepare a __init__.py to make setup.py work the namespace package:
~foo.bar/
~foo.bar/setup.py
~foo.bar/foo/__init__.py <= one-lined file dedicated to namespace packages
~foo.bar/foo/bar/__init__.py
~foo.bar/foo/bar/foobar.py
I'm not familiar with namespace packages but it looks to me that 1) foo/bar and 2) (nearly) one-lined __init__.py are routine tasks. They do provide some hints of "this is a namespace package" but I think we already have that information in setup.py?
edit:
As illustrated in the following block, can I have a namespace package without that nested directory and one-lined __init__.py in my working directory? That is, can we ask setup.py to automatically generate those by just putting one line namespace_packages = ['foo']?
~foo.bar/
~foo.bar/setup.py
~foo.bar/src/__init__.py <= for bar package
~foo.bar/src/foobar.py
A namespace package mainly has a particular effect when it comes time to import a sub-package. Basically, here's what happens, when importing foo.bar
the importer scans through sys.path looking for something that looks like foo.
when it finds something, it will look inside of the discovered foo for bar.
if bar is not found:
if foo is a normal package, an ImportError is raised, indicating that foo.bar doesn't exist.
if foo is a namespace package, the importer goes back to looking through sys.path for the next match of foo. the ImportError is only raised if all paths have been exhausted.
So that's what it does, but doesn't explain why you might want that. Suppose you designed a big, useful library (foo) but as part of that, you also developed a small, but very useful utility (foo.bar) that others python programmers find useful, even when they don't have a use for the bigger library.
You could distribute them together as one big blob of a package (as you designed it) even though most of the people using it only ever import the sub-module. Your users would find this terribly inconvenient because they'd have to download the whole thing (all 200MB of it!) even though they are only really interested in a 10 line utility class. If you have an open license, you'll probably find that several people end up forking it and now there are a half dozen diverging versions of your utility module.
You could rewrite your whole library so that the utility lives outside the foo namespace (just bar instead of foo.bar). You'll be able to distribute the utility separately, and some of your users will be happy, but that's a lot of work, especially considering that there actually are lots of users using the whole library, and so they'll have to rewrite their programs to use the new.
So what you really want is a way to install foo.bar on its own, but happily coexist with foo when that's desired too.
A namespace package allows exactly this, two totally independent installations of a foo package can coexist. setuptools will recognize that the two packages are designed to live next to each other and politely shift the folders/files in such a way that both are on the path and appear as foo, one containing foo.bar and the other containing the rest of foo.
You'll have two different setup.py scripts, one for each. foo/__init__.py in both packages have to indicate that they are namespace packages so the importer knows to continue regardless of which package is discovered first.
I've got a number of scripts that use common definitions. How do I split them in multiple files? Furthermore, the application can not be installed in any way in my scenario; it must be possible to have an arbitrary number of versions concurrently running and it must work without superuser rights. Solutions I've come up with are:
Duplicate code in every
script. Messy, and probably the worst
scheme.
Put all scripts and common
code in a single directory, and
use from . import to load them.
The downside of this approach is that
I'd like to put my libraries in other
directory than the applications.
Put common
code in its own directory, write a __init__.py that imports all submodules and finally use from . import to load them.
Keeps code organized, but it's a little bit of overhead to maintain __init__.py and qualify names.
Add the library directory to
sys.path and
import. I tend to
this, but I'm not sure whether
fiddling with sys.path
is nice code.
Load using
execfile
(exec in Python 3).
Combines the advantages of the
previous two approaches: Only one
line per module needed, and I can use
a dedicated. On the other hand, this
evades the python module concept and
polutes the global namespace.
Write and install a module using
distutils. This
installs the library for all python
scripts and needs superuser rights
and impacts other applications and is hence not applicable in my case.
What is the best method?
Adding to sys.path (usually using site.addsitedir) is quite common and not particularly frowned upon. Certainly you will want your common working shared stuff to be in modules somewhere convenient.
If you are using Python 2.6+ there's already a user-level modules folder you can use without having to add to sys.path or PYTHONPATH. It's ~/.local/lib/python2.6/site-packages on Unix-likes - see PEP 370 for more information.
You can set the PYTHONPATH environment variable to the directory where your library files are located. This adds that path to the library search path and you can use a normal import to import them.
If you have multiple environments which have various combinations of dependencies, a good solution is to use virtualenv to create sandboxed Python environments, each with their own set of installed packages. Each environment will function in the same way as a system-wide Python site-packages setup, but no superuser rights are required to create local environments.
Google has plenty of info, but this looks like a pretty good starting point.
Another alternative to manually adding the path to sys.path is to use the environment variable PYTHONPATH.
Also, distutils allows you to specify a custom installation directory using
python setup.py install --home=/my/dir
However, neither of these may be practical if you need to have multiple versions running simultaneously with the same module names. In that case you're probably best off modifying sys.path.
I've used the third approach (add the directories to sys.path) for more than one project, and I think it's a valid approach.