In the JavaScript ecosystem, "compilers" exist which will take a program with a significant dependency chain (of other JavaScript libraries) and emit a standalone JavaScript program (often, with optimizations applied).
Does any equivalent tool exist for Python, able to generate a script with all non-standard-library dependencies inlined? Are there other tools/practices available for bundling in dependencies in Python?
Since your goal is to be cross-architecture, any Python program which relies on native C modules will not be possible with this approach.
In general, using virtualenv to create a target environment will mean that even users who don't have permission to install new system-level software can install dependencies under their own home directory; thus, what you ask about is not often needed in practice.
However, if you wanted to do things that are consider evil / bad practices, pure-Python modules can in fact be bundled into a script; thus, a tool of this sort would be possible for modules with only native-Python dependencies!
If I were writing such a tool, I might start the following way:
Use pickle to serialize content of modules on the "sending" side
In the loader code, use imp.create_module() to create new module objects, and assign unpickled objects to them.
Related
What I wish to understand is what is good/bad practice, and why, when it comes to imports. What I want to understand is the agreed upon view by the community on the matter, if there's any one such in some PEP document or similar.
What I see normally is people have a python environment, use conda/pip to install packages and all that's needed to do in the code is to use "import X" (and variants). In my current understanding this is the right way to do things.
Whenever python interacts with C++ at my firm, though, it always ends up with the need to use sys.path and absolute imports (but we have some standardized paths to use as "base" and usually define relative paths based on those).
There are 2 major cases:
Python wrapper for C++ library (pybind/ctype/etc) - in this case the user python code must use sys.path to specify where the C++ library to import is.
A project that establish communications between python and C++ (say C++ server, python clients, TCP connections and flatbuffer serialization between the two) - Here the python code lives together with the C++ code, and if it some python files end up using sys.path to import python modules from the same project but that live in a different directory - Essentially we deploy the python together with the C++ through our C++ deployment procedure.
I am not fully sure if we can do something better for case #1, but case #2 seems quite unnecessary, and basically forced just by the choice to not deploy the python code through a python package manager. Choice ends up forcing us to use sys.path on both the library and user code.
This seems very bad to me as basically this way of doing things doesn't allow us to fully manage our python environments (since we have some libraries that we import even thought they are not technically installed in the environment), and that is probably why I have a negative view of using sys.path for imports. But I need to find if I'm right, and if so I need some official (or almost) documents to support my case if I'm to propose fixes to our procedures.
For your scenario 2, my understanding is you have some C++ and accompanying python in one place, and a separate python project wants to import that python.
Could you structure the imported python as a package and install it to your environment with pip install path/to/package? If it's a package that you'll continue to edit, you can add the -e flag to pip install so that when the package changes your imports get the latest code.
I need to integrate a body of Python code into an existing OSGi (Apache Felix) deployment.
I assume, or at least hope, that packages exist to help with this effort.
If it helps, the Python code is still relatively new and small, so can probably re re-architected to meet whatever constraints are needed. However, it must remain in Python, because of dependencies on third-party libraries.
What are suggested best practices?
The trick is to make this an extender, see 1 and 2. You want your Python code to be separate from the code that handles the interaction with the interpreter. So what you do is wrap the Python code and any native libraries in a bundle. This is trivial since it is just a zip file.
You then develop a bundle that listens to starting bundle (see the BundleTracker) that have python code. A manifest is often used but you can also look in a directory in the JAR. If you detect this code, you extract any native libraries and run the code in the interpeter of your choice.
If can use JYthon then that would be highly recommended. You can then carry the interpreter as an OSGi bundle that runs on the VM. If you need to use a native compiler your life is less rosy. You can rely on the environment to provide you with an interpreter but then why use OSGi in the first place. You basically lose the write once run anywhere advantage. You could go the full monty by creating bundles that contain Python installers for all platforms you support. Can be done, not even that hard, but a maintenance nightmare. Believe me, native code suck, it only does it a bit faster than Java.
I have background in Java and I am new to Python. I want to make sure I understand correctly Python terminology before I go ahead.
My understanding of a module is: a script which can be imported by many scripts, to make reading easier. Just like in java you have a class, and that class can be imported by many other classes.
My understanding of a library is: A library contains many modules which are separated by its use.
My question is: Are libraries like packages, where you have a package e.g. called food, then:
chocolate.py
sweets.py
biscuts.py
are contained in the food package?
Or do libraries use packages, so if we had another package drink:
milk.py
juice.py
contained in the package. The library contains two packages?
Also, an application programming interface (API) usually contains a set of libraries is this at the top of the hierarchy:
API
Library
Package
Module
Script
So an API will consist off all from 2-5?
From The Python Tutorial - Modules
Module:
A module is a file containing Python definitions and statements. The file name is the module name with the suffix .py appended.
Package:
Packages are a way of structuring Python’s module namespace by using “dotted module names”.
If you read the documentation for the import statement gives more details, for example:
Python has only one type of module object, and all modules are of this
type, regardless of whether the module is implemented in Python, C, or
something else. To help organize modules and provide a naming
hierarchy, Python has a concept of packages.
You can think of packages as the directories on a file system and
modules as files within directories, but don’t take this analogy too
literally since packages and modules need not originate from the file
system. For the purposes of this documentation, we’ll use this
convenient analogy of directories and files. Like file system
directories, packages are organized hierarchically, and packages may
themselves contain subpackages, as well as regular modules.
It’s important to keep in mind that all packages are modules, but not
all modules are packages. Or put another way, packages are just a
special kind of module. Specifically, any module that contains a
__path__ attribute is considered a package.
Hence the term module refers to a specific entity: it's a class whose instances are the module objects you use in python programs. It is also used, by analogy, to refer to the file in the file system from which these instances "are created".
The term script is used to refer to a module whose aim is to be executed. It has the same meaning as "program" or "application", but it is usually used to describe simple and small programs(i.e. a single file with at most some hundreds of lines). Writing a script takes minutes or few hours.
The term library is simply a generic term for a bunch of code that was designed with the aim of being usable by many applications. It provides some generic functionality that can be used by specific applications.
When a module/package/something else is "published" people often refer to it as a library. Often libraries contain a package or multiple related packages, but it could be even a single module.
Libraries usually do not provide any specific functionality, i.e. you cannot "run a library".
The API can have different meanings depending on the context. For example:
it can define a protocol like the DB API or the buffer protocol.
it can define how to interact with an application(e.g. the Python/C API)
when related to a library/package it simply the interface provided by that library for its functionality(set of functions/classes/constants etc.)
In any case an API is not python code. It's a description which may be more or less formal.
Only package and module have a well-defined meaning specific to Python.
An API is not a collection of code per se - it is more like a "protocol" specification how various parts (usually libraries) communicate with each other. There are a few notable "standard" APIs in python. E.g. the DB API
In my opinion, a library is anything that is not an application - in python, a library is a module - usually with submodules. The scope of a library is quite variable - for example the python standard library is vast (with quite a few submodules) while there are lots of single purpose libraries in the PyPi, e.g. a backport of collections.OrderedDict for py < 2.7
A package is a collection of python modules under a common namespace. In practice one is created by placing multiple python modules in a directory with a special __init__.py module (file).
A module is a single file of python code that is meant to be imported. This is a bit of a simplification since in practice quite a few modules detect when they are run as script and do something special in that case.
A script is a single file of python code that is meant to be executed as the 'main' program.
If you have a set of code that spans multiple files, you probably have an application instead of script.
Library : It is a collection of modules.
(Library either contains built in modules(written in C) + modules written in python).
Module : Each of a set of standardized parts or independent units that can be used to construct a more complex structure.
Speaking in informal language, A module is set of lines of code which are used for a specific purpose and can be used in other programs as it is , to avoid DRY(Don’t Repeat Yourself) as a team and focusing on main requirement. source
API is an interface for other applications to interact with your library without having direct access.
Package is basically a directory with files.
Script means series of commands within a single file.
I will try to answer this without using terms the earliest of beginners would use,and explain why or how they used differently, along with the most "official" and/or most understood or uniform use of the terms.
It can be confusing, and I confused myself thinking to hard, so don't think to much about it. Anyways context matters, greatly.
Library- Most often will refer to the general library or another collection created with a similar format and use. The General Library is the sum of 'standard', popular and widely used Modules, witch can be thought of as single file tools, for now or short cuts making things possible or faster. The general library is an option most people enable when installing Python. Because it has this name "Python General Library" it is used often with similar structure, and ideas. Witch is simply to have a bunch of Modules, maybe even packages grouped together, usually in a list. The list is usually to download them. Generally it is just related files, with similar interests. That is the easiest way to describe it.
Module- A Module refers to a file. The file has script 'in it' and the name of the file is the name of the module, Python files end with .py. All the file contains is code that ran together makes something happen, by using functions, strings ect.
Main modules you probably see most often are popular because they are special modules that can get info from other files/modules.
It is confusing because the name of the file and module are equal and just drop the .py. Really it's just code you can use as a shortcut written by somebody to make something easier or possible.
Package- This is a termis used to generally sometimes, although context makes a difference. The most common use from my experience is multiple modules (or files) that are grouped together. Why they are grouped together can be for a few reasons, that is when context matters.
These are ways I have noticed the term package(s) used. They are a group of Downloaded, created and/or stored modules. Which can all be true, or only 1, but really it is just a file that references other files, that need to be in the correct structure or format, and that entire sum is the package itself, installed or may have been included in the python general library. A package can contain modules(.py files) because they depend on each other and sometimes may not work correctly, or at all. There is always a common goal of every part (module/file) of a package, and the total sum of all of the parts is the package itself.
Most often in Python Packages are Modules, because the package name is the name of the module that is used to connect all the pieces. So you can input a package because it is a module, also allows it to call upon other modules, that are not packages because they only perform a certain function, or task don't involve other files. Packages have a goal, and each module works together to achieve that final goal.
Most confusion come from a simple file file name or prefix to a file, used as the module name then again the package name.
Remember Modules and Packages can be installed. Library is usually a generic term for listing, or formatting a group of modules and packages. Much like Pythons general library. A hierarchy would not work, APIs do not belong really, and if you did they could be anywhere and every ware involving Script, Module, and Packages, the worl library being such a general word, easily applied to many things, also makes API able to sit above or below that. Some Modules can be based off of other code, and that is the only time I think it would relate to a pure Python related discussion.
I have a python modules question:
I need to write a script that will take the output from various pieces of hardware and then store them in a database. I know about the current hardware but I do not know about any additional hardware that will be added in the future. Sorry let me be more specific.
The script will retrieve power information from hardware modules connected to power distribution boards. These modules take the raw sensor information and make it available over tcp/ip. I have written functions into the test script that interrogates the hardware modules and gathers the info. The problem is that this will need to be deployed where different hardware modules will be used, not just the ones that I know about. All the hardware modules will make the same info available but they will all do it different ways. Some use telnet or ssh, some provide a web page, some provide and xml output, some will use snmp. I want to create a mechanism that a python module can be written for a new device that pulls out the info and provides it to the script in a standard way so that the script can populate the database and life goes on as normal.
So Instead of adding a new function to the script each time a new hardware module type is encountred and then wrapping it in 'if' statements etc... is there a mechanism to offboard that function into a module that can be dynamically added to the script. A plugin type mechanism comes to mind I suppose. But I don't think that quite fits.
Any suggestions or directions to places where something like this has been used or implemented would be greatly appreciated.
You could use import function in your code. Depending on your Python version, try importlib (Added in Python 2.7) or __import__ function.
You could create plugins for each hardware device you're going to support. Each such plugin could then register itself somewhere (main module path put in Environment variable or known path in Windows registry). Your code would then check that location (periodically, on startup - depending on your needs) and load each plugin found. You could also use distutils and have plugins packed as egg packages.
Please note that while it works quite well when you're running from Python sources, it's getting more complex if you plan to build executables with PyInstaller or Py2exe.
What is the best way to organize and develop a project composed of many small scripts sharing one (or more) larger Python libraries?
We have a bunch of programs in our repository that all use the same libraries stored in the same repository. So in other words, a layout like
trunk
libs
python
utilities
projects
projA
projB
When the official runs of our programs are done, we want to record what version of the code was used. For our C++ executables, things are simple because as long as the working copy is clean at compile time, everything is fine. (And since we get the version number programmatically, it must be a working copy, not an export.) For Python scripts, things are more complicated.
The problem is that, often one project (e.g. projA) will be running, and projB will need to be updated. This could cause the working copy revision to appear mixed to projA during runtime. (The code takes hours to run, and can be used as inputs for processes that take days to run, hence the strong traceability goal.)
My current workaround is, if necessary, check out another copy of the trunk to a different location, and run off there. But then I need to remember to change my PYTHONPATH to point to the second version of lib/python, not the one in the first tree.
There's not likely to be a perfect answer. But there must be a better way.
Should we be using subversion keywords to store the revision number, which would allow the data user to export files? Should we be using virtualenv? Should we be going more towards a packaging and installation mechanism? Setuptools is the standard, but I've read mixed things about it, and it seems designed for non-developer end users (of which we have none).
The much better solution involves not storing all your projects and their shared dependencies in the same repository.
Use one repository for each project, and externals for the shared libraries.
Make use of tags in the shared library repositories, so consumer projects may use exactly the version they need in their external.
Edit: (just copying this from my comment) use virtualenv if you need to provide isolated runtime environments for the different apps on the same server. Then each environment can contain a unique version of the library it needs.
If I'm understanding your question properly, then you definitely want virtualenv. Add in some virtualenvwrapper goodness to make it that much better.