I am planning to implement some functionality in Python to be used as part of a larger non-Python project, so as to take advantage of some Python libraries. I have done some scripting in Python before, but nothing this substantial.
From the advice I've gotten, it seems like we will definitely want to use a virtual environment to manage dependencies. I am exploring venv and conda and haven't committed to either yet, though it seems like conda would have the advantage of providing pre-built versions of Cython dependencies.
With both conda and venv, though, the documentation I've been finding seems to be oriented toward working interactively inside the environment. For our purposes, I want to be able to run the programs we've written in Python programmatically, without going through the system shell.
Is there an established, recommended way to do this?
I've been trying to look at what the Bash scripts to activate a virtual environment actually do, and it looks like they basically just set up some environment variables. Both add their virtual environment's bin directory to the beginning of the PATH, venv sets VIRTUAL_ENV, and conda sets a bunch of CONDA_ environment variables. Interestingly, it doesn't look like either sets, say, PYTHONPATH.
For programmatic use, is it sufficient to set these environment variables and then run the equivalent of python3 -m mymodule, or is there more setup that needs to be done? I would particularly like to know if this is documented anywhere, for conda, venv, or both: relying on having figured out what environment variables need to be set to what values seems a bit fragile.
Related
TL/DR: Is there really no way to just tell jupyter console to run in some conda environment, without first unnecessarily installing (and hence depending on) Jupyter in that environment?
I really did try to make this not appear entirely like a rant... I hope you see there is an actual question here.
It seems like getting Jupyter to work with conda environments requires either
Installing a new Jupyter in every conda environment you want to use, or
Installing ipykernel in every conda environment you want to use (which depends on the jupyter package...), and creating a new kernel from within the environment.
I find this a bit astonishing, since I do not think of Jupyter as a requirement of the project, but rather as just another editor/IDE-like thing, making use of environments. Conda's purpose is to manage reproducible dependencies; Jupyter's should be to interpret code within the environment I tell it. Since I'd like to store the environment.yml in git and share it with others, I see no purpose in also requiring them to install Jupyter; they might not even use it.
Yet, it seems not to work that way at all. It feels like when I would like to use Emacs to make use of an environment, I'd have to install an "emacskernel" package in every environment. That's not how it works.
What I would like is to have one globally installed Jupyter, which can just be pointed to different environments -- similar to how the Julia REPL with julia --project=... works (yeah I know that conda is not a built-into-the-language package manager, but you should get the analogy...). (This would kinda work if conda environments would "inherit", i.e., fall back to the "global environment" for unfound dependencies, and you could just use the global Jupyter from within each one; but as I understand, they don't?)
Is this possible at all? What am I missing? Are there any better alternatives providing global Jupyter + local environments? (I must admit I have never used virtualenv or the like...)
(This older question seems to cover the same topic for pipenv, but there's not real answer there... neither a definite NO, nor an explaination why.)
Is it correct to expect a Conda environment to provide complete isolation and containment for pip/pipenv usage?
Let's say I create and activate a Conda environment and name it "pip-pip", then proceed with my project, which uses pipenv, while completely ignoring the fact that this is happening with a Conda environment activated.
Will all traces of that pipenv project be contained in "pip-pip", or is there a possibility of a spillover?
Will the fact that pip/pipenv is used from within "pip-pip" negatively affect the experience in any way?
This arrangement should work fine, as long as your shell and environment variables are configured correctly.
If you try to activate the Pipenv without the "Pip-pip" Conda environment active, you might have breakage or other unpredictable behavior, as Pipenv was installed with one Python and is being run with another. The extent of the breakage depends on the implementation details of Pipenv.
As a general rule, it should be possible to nest such "environment" programs arbitrarily, as long as they are well-designed, and as long as you activate the chain of environments in the order that they were originally installed. Whether this negatively affects your experience depends on your tolerance for annoyance.
However, Pipenv by default creates virtual environments in a global location. I'm not sure what that location is, but it's possible that you could end up with Pipenv environments installed alongside each other that depend on different Python versions. This, I think, might constitute "spillover" in the sense of your question.
I currently have a rather complex Python configuration that has evolved over the years, and I'd like to clean it up and "modernize" it.
The existing configuration has a the default macOS Python, and Homebrew's Python 3 and Python 2 all existing side-by-side, along with their associated Pips. I also have some python command line tools that these Pythons or their associated installed packages have created, and which I use more or less frequently.
What I'd like to do is:
Leave macOS Python untouched
Eliminate all Homebrew Python's
Remove non-macOS Python 2 entirely
Switch to Conda Python as my Python 3
Have access to mkvirtualenv (as an alternative to creating environments) with virtualenvwrapper
Have access to Jupyter
I'm not sure how to do this without creating problems, and want to confirm that the obvious thing is the safe thing:
use Homebrew to uninstall its Pythons,
install Conada, and then
use (Conda's) pip to install mkvirtualenv, virtualenvwrapper, and Jupyter (and any other tools I subsequently need)
Is that the correct procedure? Is so are there particular forms of the commands I should use or options I should chose for them?
The biggest and/or first issue is how to not break existing functionality that relies on Python. There are two broad camps here:
1) tools and other scripts that hard-code the Python executable's location, and
2) tools and other scripts that rely on the/a system PATH variable.
#1 is the easier one. If you aren't going to remove any Python versions, then these are no work at all...these will keep working. If you do want to uninstall some Python versions, then you have to work to switch any tools relying on those versions you want to remove to another version that also works for that tool. The path in question is commonly in a shebang ('#! xxx') line at the top of each main Python binary, but there are other ways that the path to the Python binary can be formed. In short, why uninstall anything? Disk space is cheap. Maybe instead just make sure that these unwanted versions are not referenced by any PATH variables.
#2 is the hard one. It isn't necessarily the case that all of the tools in this category are using the version of Python you get when you just type "python" at a command prompt for your primary account. There can be other modes of operation that initialize the execution environment (the PATH variable) in different ways, and so may be running different Python versions despite depending on the value of PATH.
Part of #2 is worrying about not just "python" references, but "python2", "python3", and possibly other variants as well.
Only once you've got a plan for dealing with the above so you don't break things can you worry about possibly getting rid of Python versions and installing new ones. Hopefully, Brew does a good job of uninstalling the versions it's installed, so if you can remove dependencies on one or more of them, they can potentially be easily removed. If you've got self-installed Python versions, those should be easy to uninstall as well by just removing references to them in PATH variables (or not...shouldn't be a big problem if you miss some) and then deleting the install directory.
Then there's adding the new version(s) of Python. This can only affect #2 above. You have to think about that one and know what affect you're going to have if the new install(s) manipulate any PATH variables. If it only manipulates your own user's PATH, or it leaves it to you to do so, this is a much easier to understand task, but any change to the environment is a chance to break existing functionality.
Finally, there's the mechanisms for choosing different Python versions for new development, including the use of virtual envs. This is probably the easiest part, as you can do research, try things, and test that you can do whatever you want to do. This part of the problem is the best bounded.
I don't know anything about Jupyter, other than knowing vaguely what it is, so I don't know how that complicates all this.
UPDATE: A final note. As you may already know, Python does a good job of isolating itself in terms of each version keeping its unique identity. If you use the right 'pip' and 'easy_install' that are sitting right next to the 'python' binary you're going to run with, you should be cleanly affecting just that one environment. I can't know that it's this easy for all Python versions, but I've never seen this convention broken by a version of Python that I've used. The complications here, of course, involve which versions of these tools you're getting in various situations when they are found via a PATH variable.
First, install anaconda or miniconda. The installation is non-destructive and does not conflict with your other Python installations. Check that it works before you consider removing homebrew installed Pythons.
The conda command is used both as a package manager and as an environment manager. You cannot avoid creating conda environments: the default installation is already part of an environment named base. I'm not sure why you would want to, either.
You can use pip to install any package you choose into a conda environment, but since you can use conda install for any package available on any conda channel (e.g. 'defaults', 'conda-forge'), using pip often is redundant.
You could use non-conda virtual environments, but again: why? conda create -n foo python=x.x jupyter #etc and then conda activate foo is all you need to get one up and running.
I am a beginner in Python.
I read virtualenv is preferred during Python project development.
I couldn't understand this point at all. Why is virtualenv preferred?
Virtualenv keeps your Python packages in a virtual environment localized to your project, instead of forcing you to install your packages system-wide.
There are a number of benefits to this,
the first and principle one is that you can have multiple virtulenvs, so you
can have multiple sets of packages that for different projects, even
if those sets of packages would normally conflict with one another.
For instance, if one project you are working on runs on Django 1.4
and another runs on Django 1.6, virtualenvs can keep those projects
fully separate so you can satisfy both requirements at once.
the second, make it easy for you to release your project with its own dependent
modules.Thus you can make it easy to create your requirements.txt
file.
the third, is that it allows you to switch to another installed python interpreter for that project*. Very useful (Think old 2.x scripts), but sadly not available in the now built-in venv.
Note that virtualenv is about "virtual environments" but is not the same as "virtualization" or "virtual machines" (this is confusing to some). For instance, VMWare is totally different from virtualenv.
A Virtual Environment, put simply, is an isolated working copy of Python which allows you to work on a specific project without worry of affecting other projects.
For example, you can work on a project which requires Django 1.3 while also maintaining a project which requires Django 1.0.
VirtualEnv helps you create a Local Environment(not System wide) Specific to the Project you are working upon.
Hence, As you start working on Multiple projects, your projects would have different Dependencies (e.g different Django versions) hence you would need a different virtual Environment for each Project. VirtualEnv does this for you.
As, you are using VirtualEnv.. Try VirtualEnvWrapper : https://pypi.python.org/pypi/virtualenvwrapper
It provides some utilities to create switch and remove virtualenvs easily, e.g:
mkvirtualenv <name>: To create a new Virtualenv
workon <name> : To use a specified virtualenv
and some others
Suppose you are working on multiple projects, one project requires certain version of python and other project requires some other version. In case you are not working on virtual environment both projects will access the same version installed in your machine locally which in case can give error for one.
While in case of virtual environment you are creating a new instance of your machine where you can store all the libraries, versions separately. Each time you can create a new virtual environment and work on it as a new one.
There is no real point to them in 2022, they are a mechanism to accomplish what C#, Java, Node, and many other ecosystems have done for years without virtual environments.
Projects need to be able to specify their package and interpreter dependencies in isolation from other projects. Virtual Environments are a fine but legacy solution to that issue. (Vs a config file which specifies interpreter version and local __pypackages__)
pep 582 aims to address this lack of functionality in the python ecosystem.
I have looked into other python module distribution questions. My need is a bit different (I think!, I am python newbie+)
I have a bunch of python scripts that I need to execute on remote machines. Here is what the target environment looks like;
The machines will have base python run time installed
I will have a SSH account; I can login or execute commands remotely using ssh
i can copy files (scp) into my home dir
I am NOT allowed to install any thing on the machine; the machines may not even have access to Internet
my scripts may use some 'exotic' python modules -- most likely they won't be present in the target machine
after the audit, my home directory will be nuked from the machine (leave no trace)
So what I like to do is:
copy a directory structure of python scripts + modules to remote machine (say in /home/audituser/scripts). And modules can be copied into /home/audituser/scripts/pythhon_lib)
then execute a script (say /home/audituser/scripts/myscript.py). This script will need to resolve all modules used from 'python_lib' sub directory.
is this possible? or is there a better way of doing this? I guess what I am looking is to 'relocate' the 3rd party modules into the scripts dir.
thanks in advance!
Are the remote machines the same as each other? And, if so, can you set up a development machine that's effectively the same as the remote machines?
If so, virtualenv makes this almost trivial. Create a virtualenv on your dev machine, use the virtualenv copy of pip to install any third-party modules into it, build your script within it, then just copy that entire environment to each remote machine.
There are three things that make it potentially non-trivial:
If the remote machines don't (and can't) have virtualenv installed, you need to do one of the following:
In many cases, just copying a --relocatable environment over just works. See the documentation section on "Making Environments Relocatable".
You can always bundle virtualenv itself, and pip install --user virtualenv (and, if they don't even have pip, a few steps before that) on each machine. This will leave the user account in a permanently-changed state. (But fortunately, your user account is going to be nuked, so who cares?)
You can write your own manual bootstrapping. See the section on "Creating Your Own Bootstrap Scripts".
By default, you get a lot more than you need—the Python executable, the standard library, etc.
If the machines aren't identical, this may not work, or at least might not be as efficient.
Even if they are, you're still often making your bundle orders of magnitude bigger.
See the documentation sections on Using Virtualenv without bin/python, --system-site-packages, and possibly bootstrapping.
If any of the Python modules you're installing also need C libraries (e.g., libxml2 for lxml), virtualenv doesn't help with that. In fact, you will need the C libraries to be almost exactly the same (same path, compatible version).
Three other alternatives:
If your needs are simple enough (or the least-simple parts involve things that virtualenv doesn't help with, like installing libxml2), it may be easier to just bundle the .egg/.tgz/whatever files for third-party modules, and write a script that does a pip install --user and so on for each one, and then you're done.
Just because you don't need a full app-distribution system doesn't mean you can't use one. py2app, py2exe, cx_freeze, etc. aren't all that complicated, especially in simple cases, and having a click-and-go executable to copy around is even easier than having an explicit environment.
zc.buildout is an amazingly flexible and manageable tool that can do the equivalent of any of the three alternatives. The main downside is that there's a much, much steeper learning curve.
You can use virtualenv to create a self-contained environment for your project. This can house your own script, as well as any dependency libraries. Then you can make the env relocatable (--relocatable), and sync it over to the target machine, activate it, and run your scripts.
If these machines do have network access (not internet, but just local network), you can also place the virtualenv on a shared location and activate from there.
It looks something like this:
virtualenv --no-site-packages portable_proj
cd portable_proj/
source bin/activate
# install some deps
pip install xyz
virtualenv --relocatable .
Now portable_proj can be disted to other machines.