Here's an excerpt from my environment.yml:
name: my-project
channels:
- pytorch-nightly
- defaults
dependencies:
- pytorch=1.13.0.*
- pip:
- https://github.com/explosion/spacy-models/releases/download/nb_core_news_md-3.3.0/nb_core_news_md-3.3.0-py3-none-any.whl
prefix: ~/opt/miniconda3/envs/my-project
When I create my environment (conda env create -f environment.yml) and re-export it to environment.yml (conda env export > environment.yml), the file gets changed:
name: my-project
channels:
- pytorch-nightly
- defaults
dependencies:
- pytorch=1.13.0.dev20220614=py3.9_0
- pip:
- nb-core-news-md==3.3.0
prefix: ~/opt/miniconda3/envs/my-project
Then, when I re-create my environment the next day, Conda complains that pytorch=1.13.0.dev20220614=py3.9_0 does not exist because it was replaced by a new PyTorch Preview (Nightly) build. dev20220614=py3.9_0 is no longer available.
Conda also complains that nb-core-news-md==3.3.0 does not exist. It was installed via a URL directly to the whl. That URL was removed from the environment.yml.
How can I prevent conda env export from changing these two dependencies? I still want Conda to lock down the specifics for all other dependencies, just not for these two.
I get why PyTorch is deleting daily builds, but it's kind of disruptive for the workflow you have. What you're asking for cannot be expressed with Conda CLI commands. Instead, consider exporting the regular YAML, but then run some sed commands to replace the particular requirements prior to recreating the environment.
Related
I am attempting to use Conda to create an environment from a Pip requirements file. The contents of the file are
requirements.txt
numpy==1.18.2
torch==1.4.0
torchvision==0.5.0
scikit-learn==0.22.2.post1
Pillow==8.3.2
pydicom==1.4.2
pandas==1.0.3
Running the command
conda create -n $name --file requirements.txt
gives a PackageNotFound error as the channels are missing.
How do I amend this?
Possible Issues
There are a few potential issues.
Conda pytorch
First, not all packages in Conda go by the same name as they do in other repositories. Part of this is due to the nature of Conda being a general package repository, rather than a language-specific one. In particular, the torch module is delivered via the Conda pytorch package.
So that has to change.
NumPy version unavailable
That particular build of NumPy does not appear to be available in either defaults or conda-forge channels.
$ mamba search numpy=1.18.2
No match found for: numpy=1.18.2. Search: *numpy*=1.18.2
PackagesNotFoundError: The following packages are not available from current channels:
- numpy=1.18.2
Current channels:
- https://conda.anaconda.org/conda-forge/osx-64
- https://conda.anaconda.org/conda-forge/noarch
- https://conda.anaconda.org/bioconda/osx-64
- https://conda.anaconda.org/bioconda/noarch
- https://repo.anaconda.com/pkgs/main/osx-64
- https://repo.anaconda.com/pkgs/main/noarch
- https://repo.anaconda.com/pkgs/r/osx-64
- https://repo.anaconda.com/pkgs/r/noarch
Why would this happen? For most Python packages, Conda works downstream of the PyPI repository. When new releases come out, the Conda Forge bot (for example) will auto-generate a pull request to corresponding feedstock. Sometimes these don't "just work" and need some troubleshooting to get built. Occasionally, the process to get the builds working won't finish before a new release hits. This results in a newer pull request superceding the previous one, and can lead to the old pull request being abandoned. This results in gaps in the coverage of PyPI by Conda Forge, and is exactly what happened here.
If you can tolerate a different version, conda-forge does provide v1.18.1 (below) and v1.18.4 (above).
Otherwise, if you require exact replication of package versions, then you will have to source this from PyPI. I'll show this in the end.
Channel issues
Missing channels
OP does not indicate the channel configuration. The torchvision==0.5.0 package, for example, only is available through the pytorch channel.
Masked channels
Another issue here could be the use of the channel_priority: strict setting. If this setting were used, it is possible a channel with the version required might be a priori excluded by the SAT solver simply because the package (but not the correct version) is available in a higher priority channel. These days channel_priority: flexible is the default and can be set with:
conda config --set channel_priority flexible
Solutions
Exact replication (PyPI only)
Give the package names and versions, these packages likely originated from PyPI. If you need to exactly replicate the original environment - say, for reproducing scientific results - then I'd recommend sourcing everything from PyPI. The best way to do this is to use Conda to source Python and Pip, then let Pip install the requirements.txt.
Judging from the package versions, we're talking Python 3.7 or 3.8. You'd probably be fine with just python=3.8, but [a precise guesstimate from release dates would be python=3.8.2. So, try something like:
environment.yaml
name: my_env
channels:
- conda-forge
dependencies:
- python=3.8.2
- pip
- pip:
- -r requirements.txt
Then create the environment with
conda env create -n $name -f environment.yaml
making sure the requirements.txt is in the folder with the YAML.
If adding packages to this environment later, I would recommend only using pip install. Otherwise, Conda may have issues.
Conda-only environment
Assuming the numpy=1.18.2 can be substituted, a Conda-only environment might be something like:
environment.yaml
name: my_env
channels:
- pytorch
- conda-forge
dependencies:
- python=3.8
- numpy=1.18.1 # alternatively, 1.18.4
- pytorch=1.4.0
- torchvision=0.5.0
- scikit-learn=0.22.2.post1
- pillow=8.3.2
- pydicom=1.4.2
- pandas=1.0.3
Again, creating with:
conda env create -n $name -f environment.yaml
Note that in YAML only one = is used. This would be the best approach if you plan to install additional packages through Conda in an ad hoc manner (e.g., conda install).
Mixed Conda-Pip environment
You could also try a mixed environment mostly similar to the last one, but having Pip specifically provide numpy==1.18.2. I wouldn't recommend this, since the other dependencies with definitely bring in NumPy first from Conda, and then Pip will clobber it to provide the exact version.
I am attempting to use Conda to create an environment from a Pip requirements file. The contents of the file are
requirements.txt
numpy==1.18.2
torch==1.4.0
torchvision==0.5.0
scikit-learn==0.22.2.post1
Pillow==8.3.2
pydicom==1.4.2
pandas==1.0.3
Running the command
conda create -n $name --file requirements.txt
gives a PackageNotFound error as the channels are missing.
How do I amend this?
Possible Issues
There are a few potential issues.
Conda pytorch
First, not all packages in Conda go by the same name as they do in other repositories. Part of this is due to the nature of Conda being a general package repository, rather than a language-specific one. In particular, the torch module is delivered via the Conda pytorch package.
So that has to change.
NumPy version unavailable
That particular build of NumPy does not appear to be available in either defaults or conda-forge channels.
$ mamba search numpy=1.18.2
No match found for: numpy=1.18.2. Search: *numpy*=1.18.2
PackagesNotFoundError: The following packages are not available from current channels:
- numpy=1.18.2
Current channels:
- https://conda.anaconda.org/conda-forge/osx-64
- https://conda.anaconda.org/conda-forge/noarch
- https://conda.anaconda.org/bioconda/osx-64
- https://conda.anaconda.org/bioconda/noarch
- https://repo.anaconda.com/pkgs/main/osx-64
- https://repo.anaconda.com/pkgs/main/noarch
- https://repo.anaconda.com/pkgs/r/osx-64
- https://repo.anaconda.com/pkgs/r/noarch
Why would this happen? For most Python packages, Conda works downstream of the PyPI repository. When new releases come out, the Conda Forge bot (for example) will auto-generate a pull request to corresponding feedstock. Sometimes these don't "just work" and need some troubleshooting to get built. Occasionally, the process to get the builds working won't finish before a new release hits. This results in a newer pull request superceding the previous one, and can lead to the old pull request being abandoned. This results in gaps in the coverage of PyPI by Conda Forge, and is exactly what happened here.
If you can tolerate a different version, conda-forge does provide v1.18.1 (below) and v1.18.4 (above).
Otherwise, if you require exact replication of package versions, then you will have to source this from PyPI. I'll show this in the end.
Channel issues
Missing channels
OP does not indicate the channel configuration. The torchvision==0.5.0 package, for example, only is available through the pytorch channel.
Masked channels
Another issue here could be the use of the channel_priority: strict setting. If this setting were used, it is possible a channel with the version required might be a priori excluded by the SAT solver simply because the package (but not the correct version) is available in a higher priority channel. These days channel_priority: flexible is the default and can be set with:
conda config --set channel_priority flexible
Solutions
Exact replication (PyPI only)
Give the package names and versions, these packages likely originated from PyPI. If you need to exactly replicate the original environment - say, for reproducing scientific results - then I'd recommend sourcing everything from PyPI. The best way to do this is to use Conda to source Python and Pip, then let Pip install the requirements.txt.
Judging from the package versions, we're talking Python 3.7 or 3.8. You'd probably be fine with just python=3.8, but [a precise guesstimate from release dates would be python=3.8.2. So, try something like:
environment.yaml
name: my_env
channels:
- conda-forge
dependencies:
- python=3.8.2
- pip
- pip:
- -r requirements.txt
Then create the environment with
conda env create -n $name -f environment.yaml
making sure the requirements.txt is in the folder with the YAML.
If adding packages to this environment later, I would recommend only using pip install. Otherwise, Conda may have issues.
Conda-only environment
Assuming the numpy=1.18.2 can be substituted, a Conda-only environment might be something like:
environment.yaml
name: my_env
channels:
- pytorch
- conda-forge
dependencies:
- python=3.8
- numpy=1.18.1 # alternatively, 1.18.4
- pytorch=1.4.0
- torchvision=0.5.0
- scikit-learn=0.22.2.post1
- pillow=8.3.2
- pydicom=1.4.2
- pandas=1.0.3
Again, creating with:
conda env create -n $name -f environment.yaml
Note that in YAML only one = is used. This would be the best approach if you plan to install additional packages through Conda in an ad hoc manner (e.g., conda install).
Mixed Conda-Pip environment
You could also try a mixed environment mostly similar to the last one, but having Pip specifically provide numpy==1.18.2. I wouldn't recommend this, since the other dependencies with definitely bring in NumPy first from Conda, and then Pip will clobber it to provide the exact version.
I'm trying to save my conda environment so I can send it to others to reproduce my work. Within my activated environment:
(env_name) c:\eric\ conda env export > environment.yml
This environment has dozens of packages installed (including numpy, matplotlib). When I open the resulting environment.yml file, I get only:
name: env_name
channels:
- conda-forge
- defaults
prefix: C:\Users\eric\Miniconda3\envs\env_name\envs\env_name
That is the entire file: there isn't even a line for dependencies: that shows up, which usually does show up for the packages in the env.
I am using miniconda, version 4.12.0, and have run conda update conda.
Conda issues
I found two issues at conda that have come up with this:
https://github.com/conda/conda/issues/8839
https://github.com/conda/conda/issues/10997
One solution at the first issue is to use the following command:
conda env export -p path-to-folder > environment.yml
Unfortunately this did not work for me, but it seems to have worked for many.
Comparison to related question
Note indeed the prefix value is pretty strange it typically is something like:
C:\Users\eric\.conda\envs\env_name
Frankly I am not concerned about that, unlike the related question focused on removing that prefix and the accepted answer does that bash shell commands: Anaconda export Environment file.
My question is more first-order: why aren't dependencies showing up in my yaml file?
e.g., something like the following should be appearing:
dependencies:
- pandas=1.0.3=py37h47e9c7a_0
- qt=5.9.7=vc14h73c81de_0
- pip:
- imageio==2.9.0
- scikit-image==0.17.2
But literally there are zero dependencies, and not even a dependencies argument.
It's possible the environment isn't properly activated and therefore it's not actually exporting what is wanted. Fortunately, most Conda commands provide arguments to specify an environment explicitly. Specifically, try using the --name,-n argument, like
conda env export -n env_name > env_name.yaml
Personally, I try to always use an --name,-n or --prefix,-p flag, because I find context-sensitive commands are more prone to errors (e.g., installing in incorrect environments).
Does it make sense to use Conda + Poetry for a Machine Learning project? Allow me to share my (novice) understanding and please correct or enlighten me:
As far as I understand, Conda and Poetry have different purposes but are largely redundant:
Conda is primarily a environment manager (in fact not necessarily Python), but it can also manage packages and dependencies.
Poetry is primarily a Python package manager (say, an upgrade of pip), but it can also create and manage Python environments (say, an upgrade of Pyenv).
My idea is to use both and compartmentalize their roles: let Conda be the environment manager and Poetry the package manager. My reasoning is that (it sounds like) Conda is best for managing environments and can be used for compiling and installing non-python packages, especially CUDA drivers (for GPU capability), while Poetry is more powerful than Conda as a Python package manager.
I've managed to make this work fairly easily by using Poetry within a Conda environment. The trick is to not use Poetry to manage the Python environment: I'm not using commands like poetry shell or poetry run, only poetry init, poetry install etc (after activating the Conda environment).
For full disclosure, my environment.yml file (for Conda) looks like this:
name: N
channels:
- defaults
- conda-forge
dependencies:
- python=3.9
- cudatoolkit
- cudnn
and my poetry.toml file looks like that:
[tool.poetry]
name = "N"
authors = ["B"]
[tool.poetry.dependencies]
python = "3.9"
torch = "^1.10.1"
[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"
To be honest, one of the reasons I proceeded this way is that I was struggling to install CUDA (for GPU support) without Conda.
Does this project design look reasonable to you?
I have experience with a Conda + Poetry setup, and it's been working fine. The great majority of my dependencies are specified in pyproject.toml, but when there's something that's unavailable in PyPI, or installing it with Conda is easier, I add it to environment.yml. Moreover, Conda is used as a virtual environment manager, which works well with Poetry: there is no need to use poetry run or poetry shell, it is enough to activate the right Conda environment.
Tips for creating a reproducible environment
Add Poetry, possibly with a version number (if needed), as a dependency in environment.yml, so that you get Poetry installed when you run conda create, along with Python and other non-PyPI dependencies.
Add conda-lock, which gives you lock files for Conda dependencies, just like you have poetry.lock for Poetry dependencies.
Consider using mamba which is generally compatible with conda, but is better at resolving conflicts, and is also much faster. An additional benefit is that all users of your setup will use the same package resolver, independent from the locally-installed version of Conda.
By default, use Poetry for adding Python dependencies. Install packages via Conda if there's a reason to do so (e.g. in order to get a CUDA-enabled version). In such a case, it is best to specify the package's exact version in environment.yml, and after it's installed, to add an entry with the same version specification to Poetry's pyproject.toml (without ^ or ~ before the version number). This will let Poetry know that the package is there and should not be upgraded.
If you use a different channels that provide the same packages, it might be not obvious which channel a particular package will be downloaded from. One solution is to specify the channel for the package using the :: notation (see the pytorch entry below), and another solution is to enable strict channel priority. Unfortunately, in Conda 4.x there is no way to enable this option through environment.yml.
Note that Python adds user site-packages to sys.path, which may cause lack of reproducibility if the user has installed Python packages outside Conda environments. One possible solution is to make sure that the PYTHONNOUSERSITE environment variable is set to True (or to any other non-empty value).
Example
environment.yml:
name: my_project_env
channels:
- pytorch
- conda-forge
# We want to have a reproducible setup, so we don't want default channels,
# which may be different for different users. All required channels should
# be listed explicitly here.
- nodefaults
dependencies:
- python=3.10.* # or don't specify the version and use the latest stable Python
- mamba
- pip # pip must be mentioned explicitly, or conda-lock will fail
- poetry=1.* # or 1.1.*, or no version at all -- as you want
- tensorflow=2.8.0
- pytorch::pytorch=1.11.0
- pytorch::torchaudio=0.11.0
- pytorch::torchvision=0.12.0
# Non-standard section listing target platforms for conda-lock:
platforms:
- linux-64
virtual-packages.yml (may be used e.g. when we want conda-lock to generate CUDA-enabled lock files even on platforms without CUDA):
subdirs:
linux-64:
packages:
__cuda: 11.5
First-time setup
You can avoid playing with the bootstrap env and simplify the example below if you have conda-lock, mamba and poetry already installed outside your target environment.
# Create a bootstrap env
conda create -p /tmp/bootstrap -c conda-forge mamba conda-lock poetry='1.*'
conda activate /tmp/bootstrap
# Create Conda lock file(s) from environment.yml
conda-lock -k explicit --conda mamba
# Set up Poetry
poetry init --python=~3.10 # version spec should match the one from environment.yml
# Fix package versions installed by Conda to prevent upgrades
poetry add --lock tensorflow=2.8.0 torch=1.11.0 torchaudio=0.11.0 torchvision=0.12.0
# Add conda-lock (and other packages, as needed) to pyproject.toml and poetry.lock
poetry add --lock conda-lock
# Remove the bootstrap env
conda deactivate
rm -rf /tmp/bootstrap
# Add Conda spec and lock files
git add environment.yml virtual-packages.yml conda-linux-64.lock
# Add Poetry spec and lock files
git add pyproject.toml poetry.lock
git commit
Usage
The above setup may seem complex, but it can be used in a fairly simple way.
Creating the environment
conda create --name my_project_env --file conda-linux-64.lock
conda activate my_project_env
poetry install
Activating the environment
conda activate my_project_env
Updating the environment
# Re-generate Conda lock file(s) based on environment.yml
conda-lock -k explicit --conda mamba
# Update Conda packages based on re-generated lock file
mamba update --file conda-linux-64.lock
# Update Poetry packages and re-generate poetry.lock
poetry update
To anyone using #michau's answer but having issues including poetry in the environment.yml. Currently, poetry versions 1.2 or greater aren't supported by conda-forge. You can still include poetry v1.2 in the .yml with the below as an alternative:
dependencies:
- python=3.9.*
- mamba
- pip
- pip:
- "poetry>=1.2"
I have a process to setup an initial project whereby I create a standardised conda environment based on a yaml file.
conda env create -f environment.yaml
The yaml file looks something like this
name: testA
channels:
- conda-forge
- defaults
- anaconda
dependencies:
- python=3.6.3=1
- yaml=0.1.7=0
- connexion=1.1.10=py36_0
- setuptools=38.4.0=py36_0
- pymongo=3.4.0=py36_0
- gunicorn=19.8.1=py36_0
- flask-cors=3.0.6=py36_0
As the project progresses a user might add another library
conda install scikit-learn
Is it possible to append the updated conda environment to original environment.yaml ??? - I know I can overwrite with
conda env export > environment.yaml