Poetry Install crashes because excluded dependency cannot be found - python

One of our repositories relies on another first-party one. Because we're in the middle of a migration from (a privately hosted) gitlab to azure, some of our dependencies aren't available in gitlab, which is where the problem comes up.
Our pyproject.toml file has this poetry group:
# pyproject.toml
[tool.poetry.group.cli.dependencies]
cli-parser = { path = "../cli-parser" }
In the Gitlab-CI, this cannot resolve. Therefore, we want to run the pipelines without this dependency. There is no code being run that actually relies on this library, nor files being imported. Therefore, we factored it out into a separate poetry group. In the gitlab-ci, that looks like this:
# .gitlab-ci.yml
install-poetry-requirements:
stage: install
script:
- /opt/poetry/bin/poetry --version
- /opt/poetry/bin/poetry install --without cli --sync
As visible, poetry is instructed to omit the cli dependency group. However, it still crashes on it:
# Gitlab CI logs
$ /opt/poetry/bin/poetry --version
Poetry (version 1.2.2)
$ /opt/poetry/bin/poetry install --without cli --sync
Directory ../cli-parser does not exist
If I comment out the cli-parser line in pyproject.toml, it will install successfully (and the pipeline passes), but we cannot do that because we need it in production.
I can't find another way to tell poetry to omit this library. Is there something I missed, or is there a workaround?

Good, Permanent Solution
As finswimmer mentioned in a comment, Poetry 1.4 should handle this perfectly good. If you're reading this question after it is published, the code in the question should resolve any issues.
Hacky, Bad, Temporary Solution
Since the original problem was in gitlab CI pipelines, I used a workaround there. Right in front of the install command, I used the following command:
sed -i '/cli-parser/d' pyproject.toml
This modifies the projects' pyproject.toml in-place to remove the line that has my module. This prevents poetry from ever parsing the dependency.
See the sed man page for more information on how this works.
Keep in mind that if your pipeline has any permanent effects, like for example turning your package into an installable wheel, or build artifacts being used elsewhere, this WILL break your setup.

Related

Poetry and buildkit mount=type=cache not working when building over airflow image

I have 2 examples of docker file and one is working and another is not. The main difference between the 2 is the base image.
Simple python base image docker file:
# syntax = docker/dockerfile:experimental
FROM python:3.9-slim-bullseye
RUN apt-get update -qy && apt-get install -qy \
build-essential tini libsasl2-dev libssl-dev default-libmysqlclient-dev gnutls-bin
RUN pip install poetry==1.1.15
COPY pyproject.toml .
COPY poetry.lock .
RUN poetry config virtualenvs.create false
RUN --mount=type=cache,mode=0777,target=/root/.cache/pypoetry poetry install
Airflow base image docker file:
# syntax = docker/dockerfile:experimental
FROM apache/airflow:2.3.3-python3.9
USER root
RUN apt-get update -qy && apt-get install -qy \
build-essential tini libsasl2-dev libssl-dev default-libmysqlclient-dev gnutls-bin
USER airflow
RUN pip install poetry==1.1.15
COPY pyproject.toml .
COPY poetry.lock .
RUN poetry config virtualenvs.create false
RUN poetry config cache-dir /opt/airflow/.cache/pypoetry
RUN --mount=type=cache,uid=50000,mode=0777,target=/opt/airflow/.cache/pypoetry poetry install
Before building the docker file run poetry lock in the same folder as the pyproject.toml file!
pyproject.toml file:
[tool.poetry]
name = "Airflow-test"
version = "0.1.0"
description = ""
authors = ["Lorem ipsum"]
[tool.poetry.dependencies]
python = "~3.9"
apache-airflow = { version = "2.3.3", extras = ["amazon", "crypto", "celery", "postgres", "hive", "jdbc", "mysql", "ssh", "slack", "statsd"] }
prometheus_client = "^0.8.0"
isodate = "0.6.1"
dacite = "1.6.0"
sqlparse = "^0.3.1"
python3-openid = "^3.1.0"
flask-appbuilder = ">=3.4.3"
alembic = ">=1.7.7"
apache-airflow-providers-google = "^8.1.0"
apache-airflow-providers-databricks = "^3.0.0"
apache-airflow-providers-amazon = "^4.0.0"
pendulum = "^2.1.2"
[tool.poetry.dev-dependencies]
[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"
In order to build the images this is the command that I use:
DOCKER_BUILDKIT=1 docker build --progress=plain -t airflow-test -f Dockerfile .
For both images the first time they build poetry install will need to download all dependencies. The interesting part is, the second time I build the image, the python-based image is a lot faster as the dependencies are already cached, but the airflow-based image will try and download all 200 dependencies once again.
From what O know by specifying --mount=type=cache that directory will be stored in the image repository so it can be reused next time the image is build. By this you trim the final image size.
When running the image how do the dependencies appear? If I run docker run -it --user 50000 --entrypoint /bin/bash image a simple python import is working on the airflow image but not on the python image. When and how will the dependencies be reattached to the image?
If you want to try it out, here is a dummy project that can be cloned locally and played around with:
https://github.com/ioangrozea/Docker-dummy
Maybe it is not answering the question directly but I think what you are trying to do makes very little sense in the first place, so I would recommend you to change the approach, completely, especially that what you are trying to achieve is very well described in The Airflow Official image documentation including plenty of examples to follow. And what you are trying to achieve will (no matter how hard you try) end up with the image that is more than 200 MB bigger (at least 20%) than what you can try to get it if you follow the official documentation.
Using poetry to build that image makes very little sense and is not recommended (and there is absolutely no need to use poetry in this case).
See the comment here.
While there are some successes with using other tools like poetry or
pip-tools, they do not share the same workflow as pip - especially
when it comes to constraint vs. requirements management. Installing
via Poetry or pip-tools is not currently supported. If you wish to
install airflow using those tools you should use the constraints and
convert them to appropriate format and workflow that your tool
requires.
Poetry and pip have completely different way of resolving dependencies and while poetry is a cool tool for managing dependencies of small projects and I really like poetry, they opinionated choice of treating libraries and applications differently, makes it not suitable to manage dependencies for Airflow which is a both - application to install and library for developers to build on top of and Poetry's limitation are simply not working for Airflow.
I explained it more in the talk I gave last year and you can see it if you are interested in "why".
Then - how to solve your problem? Don't use --mount-type cache in this case and poetry. Use multi-segmented image of Apache Airflow and "customisation" option rather than "extending" the image. This will give you a lot more savings - because you will not have "build-essentials" added to your final image (on their own they add ~200 MB to the image size and the only way to get rid of them is to split your image into two segments - the one that has "build-essentials" and allows you to build Python packages, and the one that you use as "runtime" where you only copy the "build" python libraries.
This is exactly the approach that Airfow Official Python image takes - it's highly optimised for size and speed of rebuilds and while internals of it are pretty complex, the actual building of your highly optimised, completely custom image are as simple as downloading the airflow Dockerfile and running the right docker buildx build . --build-arg ... --build-arg ... command - and the Airflow Dockerfile will do all the optimisations for you - resulting in as small image as humanly possible, and also it allows you to reuse build cache - especially if you use buildkit - which is a modern, slick and very well optimised way of building the images (Airflow Dockerfile requires buildkit as of Airflow 2.3).
You can see all the details on how to build the customised image here - where you have plenty of examples and explanation why it works the way it works and what kind of optimisations you can get. There are examples on how you can add dependencies, python packages etc. While this is pretty sophisticated, you seem to do sophisticated thing with your image, that's why I am suggesting you follow that route.
Also, if you are interested in other parts of reasoning why it makes sense, you can watch my talk from Airflow Summit 2020 - while the talk was given 2 years ago and some small details changed, the explanation on how and why building the image the way we do in Airflow still holds very strongly. It got a little simpler since the talk was give (i.e. the only thing you need not is Dockerfile, no full Airflow sources are needed) and you need to use buildkit - all the rest remains the same however.

Poetry could not find a pyproject.toml file in C:\

I'm running Python 3.9.1 and i have successfully installed poetry version 1.1.4.
When I am trying to add requests ($ poetry add requests) I am facing
RuntimeError
Poetry could not find a pyproject.toml file in C:\...
I have just installed it and I am not sure if I have missed something.
Could anyone advise please?
You have to create a pyproject.toml first. Go into your project folder, run poetry init and follow the instructions. As an alternative you can run poetry new myproject to create a basic folder structure and a pyproject.toml. Also have a look into the docs.
In my case, I was in a Docker container of a legacy Python version (3.6). I had to use pip instead of conda and therefore, I installed Poetry to keep the dependencies right.
bash-4.4# docker exec -it MY_CONTAINER bash
starts the command prompt of the container.
Now turning to answer the question, which is not a Docker question.
In the next command, you might need to write /usr/local/bin/poetry instead of just poetry instead.
bash-4.4# poetry init
This command will guide you through creating your pyproject.toml config.
Package name []: test
Version [0.1.0]: 1.0.0
Description []: test
Author [None, n to skip]: n
License []:
Compatible Python versions [^3.6]:
Would you like to define your main dependencies interactively? (yes/no) [yes] no
Would you like to define your development dependencies interactively? (yes/no) [yes] no
Generated file
[tool.poetry]
name = "test"
version = "0.1.0"
description = "test"
authors = ["Your Name <you#example.com>"]
[tool.poetry.dependencies]
python = "^3.6"
[tool.poetry.dev-dependencies]
[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"
Do you confirm generation? (yes/no) [yes]
Very easy side remark which should be clear to the most: If you press Enter at a field of filled brackets, it will just enter what is written in the brackets. For example, if you press Enter at Version [0.1.0]:, you will make it the version 0.1.0, unless you enter your own. That is also to say that those brackets [] do not mean that you have to enter a list, it is just to show what is entered when you just press Enter.
After this, I could run:
bash-4.4# poetry add pandas
Another Docker side note: It turned out that apk (Alpine containers) on legacy Python 3.6 cannot handle basic packages well enough, with or without Poetry, see Installing pandas in docker Alpine. I had to switch to a newer Python version. And you can install Poetry already by means of the Dockerfile and not just in the container bash, see Integrating Python Poetry with Docker
Wrapping up:
It was strange to me that I had to enter things that I would only use when I published a self-written single package (see: Package name []), although I was expecting a general setup of a package manager of many packages as a whole. In the end, I just followed the menu by entering some irrelevant placeholders. The right Python version as the only important core of the pyproject.toml file was already automatically suggested. This was all that was needed.

Python program runs through sublime but not command line

I am attempting to make a web app with flask, and when I attempt to run my script through the command line, I get a "ModuleNotFoundError: No module named 'google.cloud'". However, when I run the script in Sublime, I do not get this error.
I have already attempted installing google, google-cloud, and conda using pip.
Here are the lines that are involved in importing from google.cloud. The console states that the first line is the one the compilation is failing at.
from google.cloud import vision
from google.cloud.vision import types
I was expecting the code to be output to my localhost, but this compile time error is preventing this.
The library|package that you need is called google-cloud-vision, see:
https://pypi.org/project/google-cloud-vision/
You could add this directly to your project (at its current version) using:
pip install "google-cloud-vision==0.36.0"
However...
Your problem may be a consequence of different python environments and I encourage you to review virtualenv:
https://virtualenv.pypa.io/en/latest/
Among other things, virtualenv enables (a) the creation of isolated python environments; (b) "clean room" like behavior wherein you can recreate python environments easily and predictably. This latter benefit may help with your "it works .... but doesn't work ... " issue.
One additional good practice, with|without virtualenv is to persist pip install ... to (conventionally) requirements.txt:
pip install -r requirements.txt
And, in this case, to have requirements.txt similar to:
flask==1.0.2
google-cloud-vision==0.36.0

Use a filesystem directory instead of pypi behind a corporate firewall?

I am working on a product with a large number of python dependencies within a corporation that does not permit servers to contact external machines. Any attempt to circumvent this rule would be judged harshly.
The application is deployed via a batch-script (it's 32 bit windows) into a virtualenv. This batch script (ideally) should do nothing more than
# Precondition: Source code has been checked-out into myprog/src
cd myprog/src
setup.py install # <-- fails because of dependencies
myprog.exe
The problem comes with managing the dependencies - since it's impossible for the server to connect to the outside world my only solution is to have the script easy_install each of the dependencies before the setup starts, something like this:
cd myproc/deps/windows32
easy_install foo-1.2.3.egg
easy_install bar-2.3.4.egg
easy_install baz-3.4.5.egg <-- works but is annoying/wrong
cd ../../myprog/src
setup.py install
myprog.exe
What I'd like to do is make it so that the setup.py script knows where to fetch it's dependencies from. Ideally this should be set as a command-line argument or environment variable, that way I'm not going to hard-code the location of the dependencies into the project.
Ideally I'd like all of the eggs to be part of a 'distributions' directory: This can be on a network drive, shared on a web-server or possibly even be deployed to a local folder on each of the servers.
Can this be done?
I think what you are looking for is these options to pip: --no-index and --find-links:
--no-index
--find-links /my/local/archives
--find-links http://some.archives.com/archives
Docs are here.

how to pip uninstall with virtualenv on heroku cedar stack?

I tried to uninstall a module on heroku with:
heroku run bin/python bin/pip uninstall whatever
Pip shows the module in the /app tree then claims to have uinstalled the module, but running the same command again shows it installed in the same location in the /app tree.
Is there a way to get pip uinstall to succeed?
Heroku run instantiates a new dyno and runs the command specified in that dyno only. Dynos are ephemeral which is why the results of the pip uninstall don't stick.
Updated 2013-09-30: the current way to clear the virtualenv seems to specify a different python runtime version in runtime.txt as stated on Github and in the Heroku's devcenter reference.
Be aware that Heroku currently "only endorses and supports the use of Python 2.7.4 and 3.3.2" so unless your application supports both Python 2.7.4 and 3.3.2, you may want to test it with the runtime you'll want to switch to (currently available at http://envy-versions.s3.amazonaws.com/$PYTHON_VERSION.tar.bz2, though it shouldn't be an issue to switch e.g. between 2.7.4 and 2.7.3 in most cases).
Thanks #Jesse for your up-to-date answer and to the commenters who made me aware of the issue.
Was up-to-date in ~november 2012 (I haven't since updated the linked buildpack, my pull request was closed and the CLEAN_VIRTUALENV feature was dropped at some point by the official buildpack):
As David explained, you cannot pip uninstall one package but you can purge and reinstall the whole virtualenv. Use the user-env-compile lab feature with the CLEAN_VIRTUALENV option to purge the virtualenv:
heroku labs:enable user-env-compile
heroku config:add CLEAN_VIRTUALENV=true
Currently this won't work because there is a bug. You'll need to use my fork of the buildpack until this get fixed upstream (pull request was closed) :
heroku config:add BUILDPACK_URL=https://github.com/blaze33/heroku-buildpack-python.git
Now push your new code and you'll notice that the whole virtualenv gets reinstalled.
Andrey's answer no longer works since March 23 2012. The new style virtualenv commit moved the virtual env from /app to /app/.heroku/venv but the purge branch wasn't updated to catch up so that you end up with a virtualenv not being in PYTHONHOME.
To avoid reinstalling everything after each push, disable the option:
heroku labs:disable user-env-compile
heroku config:remove CLEAN_VIRTUALENV BUILDPACK_URL
There is now a simpler way to clear the pip cache. Just change the runtime environment, for example from 'python-2.7.3' to 'python-2.7.2', or vice versa.
To do this add a file called runtime.txt to the root of your repository that contains just the runtime string (as show above) in it.
For this to work you need to have turned on the Heroku labs user-env-compile feature. See https://devcenter.heroku.com/articles/labs-user-env-compile
By default virtualenv is cached between deploys.
To avoid caching of packages you can run:
heroku config:add BUILDPACK_URL=git#github.com:heroku/heroku-buildpack-python.git#purge
That way everything will be built from scratch after you push some changes. To enable the caching just remove the BUILDPACK_URL config variable.
Now to uninstall specific package(s):
Remove the corresponding record(s) from the requirements.txt;
Commit and push the changes.
Thanks to Lincoln from Heroku Support Team for the clarifications.
I've created some fabfile recipes for Maxime and Jesse answers that allow re-installing the requirements with one fab command: https://gist.github.com/littlepea/5096814 (look at the docstrings for explanation and examples).
For Maxime's answer I've created a task 'heroku_clean' (or 'hc'), it'll look something like that:
fab heroku_clean
Or using an alias and specifying a heroku app:
fab hc:app=myapp
For Jesse's answer I've created a task 'heroku_runtime' (or 'hr'), it sets heroku python runtime and commits runtime.txt (also creates it if it didn't exist):
fab heroku_runtime:2.7.2
If runtime version is not passed it will just toggle it between 2.7.2 and 2.7.3, so the easiest way to change and commit the runtime is:
fab hr
Then you can just deploy (push to heroku origin) you app and the virtualenv will be rebuilt. I've also added a 'heroku_deploy' task ('hr') that I use for heroku push and scale that also can be user together with 'heroku_runtime' task. This is my preferred method of deploying and rebuilding virtualenv - everything happens in one command and I can choose when to rebuild it, I don't like to do it every time like Maxime's answer suggest because it can take a long time:
fab hd:runtime=yes
This is an equivalent of:
fab heroku_runtime
fab heroku_deploy

Categories

Resources