How to a list private Python packages as Conda requirement?

How to a list private Python packages as Conda requirement? - python

I've got a need to create and ship conda envs that list packages that need to remain private. It would be especially handy to list dependencies using an URL to a (company internal) GitLab instance.
Is there a way to register dependencies with conda using a repo URL? Is there also some other way to include Python packages you have a source distribution for, but cannot be hosted on a regular channel?
Thanks.

If you know before hand what needs to remain private ship direct-reference eggs, or used zoned index-urls, and extra-index-urls, or in the conda-meta stuff like here:
# requirements.txt
gevent
publicthing==1.2
someother==0.1
# private packages
file://package/egg/here
-e git+ssh://priv.gitlab.some.org/some/privpack.git#egg=privpack
--extra-index-url https://build.priv.gitlab.some.org/some/pypi/simple
I'd guess private here would mean sdist/dist build artifacts like tars, eggs, wheels, some URI/URL only accessible on a local network.
Like where the package is hosted should be indicator enough of labeling something as "private". Like the build artifacts are available, or they are not through some availability mechanism. (network location, building locally, shipped binaries, etc)
using pypi/pip.
https://pip.readthedocs.io/en/1.1/requirements.html#requirements-file-format
conda meta build info :
source:
- url: https://build.priv.gitlab.some.org/some/pypi/simple/privpack/a.tar.bz2
folder: stuff
- url: https://build.priv.gitlab.some.org/some/pypi/simple/privpack/b.tar.bz2
folder: stuff
https://conda.io/docs/user-guide/tasks/build-packages/define-metadata.html
examples:
https://github.com/conda/conda-recipes
https://github.com/conda/conda-recipes/blob/c2eb600f8545cd21aa9e50a8bb8a81df7fd3c915/r-packages/r-yaml/meta.yaml#L10
https://github.com/conda/conda-recipes/blob/a796713805ac8eceed191c0cb475b51f4d00718c/python/pyserial/meta.yaml#L5
https://conda.io/docs/user-guide/tasks/build-packages/define-metadata.html#source-from-git
https://conda.io/docs/user-guide/tasks/build-packages/define-metadata.html#source-from-a-local-path
related :
https://docs.anaconda.com/anaconda-repository/admin-guide/install/config/config-client#kerberos-configuration
https://docs.anaconda.com/anaconda-repository/admin-guide/install/config/kerberos-example
https://docs.anaconda.com/anaconda-repository/admin-guide/install/config/config-client#pip-configuration
https://pip.readthedocs.io/en/1.1/requirements.html#git

Related

Poetry Install crashes because excluded dependency cannot be found

One of our repositories relies on another first-party one. Because we're in the middle of a migration from (a privately hosted) gitlab to azure, some of our dependencies aren't available in gitlab, which is where the problem comes up.
Our pyproject.toml file has this poetry group:
# pyproject.toml
[tool.poetry.group.cli.dependencies]
cli-parser = { path = "../cli-parser" }
In the Gitlab-CI, this cannot resolve. Therefore, we want to run the pipelines without this dependency. There is no code being run that actually relies on this library, nor files being imported. Therefore, we factored it out into a separate poetry group. In the gitlab-ci, that looks like this:
# .gitlab-ci.yml
install-poetry-requirements:
stage: install
script:
- /opt/poetry/bin/poetry --version
- /opt/poetry/bin/poetry install --without cli --sync
As visible, poetry is instructed to omit the cli dependency group. However, it still crashes on it:
# Gitlab CI logs
$ /opt/poetry/bin/poetry --version
Poetry (version 1.2.2)
$ /opt/poetry/bin/poetry install --without cli --sync
Directory ../cli-parser does not exist
If I comment out the cli-parser line in pyproject.toml, it will install successfully (and the pipeline passes), but we cannot do that because we need it in production.
I can't find another way to tell poetry to omit this library. Is there something I missed, or is there a workaround?

Good, Permanent Solution
As finswimmer mentioned in a comment, Poetry 1.4 should handle this perfectly good. If you're reading this question after it is published, the code in the question should resolve any issues.
Hacky, Bad, Temporary Solution
Since the original problem was in gitlab CI pipelines, I used a workaround there. Right in front of the install command, I used the following command:
sed -i '/cli-parser/d' pyproject.toml
This modifies the projects' pyproject.toml in-place to remove the line that has my module. This prevents poetry from ever parsing the dependency.
See the sed man page for more information on how this works.
Keep in mind that if your pipeline has any permanent effects, like for example turning your package into an installable wheel, or build artifacts being used elsewhere, this WILL break your setup.

what does # mean in case of `pip install <package> # path`?

I just came across this from a project on GitHub
pip install colorama # file:///home/conda/feedstock_root/build_artifacts/colorama_1602866480661/work
what does # do ?
assuming it decides the path where to install, I tried using any path and it wouldn't work like that
Also, why would we want to do so?
Also, what is the significance of file:///
Here is the link to the project
https://github.com/sstzal/DFRF/blob/main/requirements.txt
Thanks for your attention

This notation with an # is specified here in the "Direct references" section of PEP 440.
The part after the # is a direct reference to a location where the project can be installed from (i.e. the source, not the destination).
It can be URL to a file archive (sdist or wheel), or a VCS repository (git, for example).
The file:// notation is meant for the (local) filesystem protocol (as opposed to the https:// protocol for example).

aws_cdk python error: Unzipped size must be smaller than 262144000 bytes

I use CDK to deploy a lambda function that uses several python modules.
But I got the following error at the deployment.
Unzipped size must be smaller than 262144000 bytes (Service: AWSLambdaInte
rnal; Status Code: 400; Error Code: InvalidParameterValueException;
I have searched following other questions, related to this issue.
question1
question2
But they focus on serverless.yaml and don't solve my problem.
Is there any way around for this problem?
Here is my app.py for CDK.
from aws_cdk import (
aws_events as events,
aws_lambda as lam,
core,
)
class MyStack(core.Stack):
def __init__(self, app: core.App, id: str) -> None:
super().__init__(app, id)
layer = lam.LayerVersion(
self, "MyLayer",
code=lam.AssetCode.from_asset('./lib'),
);
makeQFn = lam.Function(
self, "Singleton",
function_name='makeQ',
code=lam.AssetCode.from_asset('./code'),
handler="makeQ.main",
timeout=core.Duration.seconds(300),
layers=[layer],
runtime=lam.Runtime.PYTHON_3_7,
)
app = core.App()
MyStack(app, "MS")
app.synth()
In ./lib directory, I put python modules like,
python -m pip install numpy -t lib/python

Edit: Better method!
Original:
There is now an experimental package, aws_lambda_python_alpha which you can use to automatically bundle packages listed in a requirements.txt, but I'm unfortunately still running into the same size issue. I'm thinking now to try layers.
For anyone curious, here's a sample of bundling using aws_lambda_python_alpha:
from aws_cdk import aws_lambda_python_alpha as _lambda_python
self.prediction_lambda = _lambda_python.PythonFunction(
scope=self,
id="PredictionLambda",
# entry points to the directory
entry="lambda_funcs/APILambda",
# index is the file name
index="API_lambda.py",
# handler is the function entry point name in the lambda.py file
handler="handler",
runtime=_lambda.Runtime.PYTHON_3_9,
# name of function on AWS
function_name="ExampleAPILambda",
)

I would suggest checking out the aws-cdk-lambda-asset project, which will help bundle internal project dependencies stored in a requirements.txt file. How this works is, it installs dependencies specified in the requirements file to a local folder, then bundles it up in a zip file which is then used for CDK deployment.
For non-Linux environments like Windows/Mac, it will install the dependencies in a Docker image, so first ensure that you have Docker installed and running on your system.
Note that the above code seems to use poetry, which is a dependency management tool. I have no idea what poetry does or why it is used in lieu of a setup.py file. Therefore, I've created a slight modification in a gist here, in case this is of interest; this will install all local dependencies using a regular pip install command instead of the poetry tool, which I'm not too familiar with.

Thanks a lot.
In my case, the issue is solved just by removing all __pycache__ in the local modules before the deployment.
I hope the situation will be improved and we only have to upload requirements.txt instead of preparing all the modules locally.

Nope, there is no way around that limit in single setup.
What you have to do instead is install your dependencies into multiple zips that become multiple layers.
Basically, install several dependencies to a python folder. Then zip that folder into something like intergration_layer. Clear the python folder and install the next set and name it something else. Like data_manipilation.
Then you have two layers in cdk (using aws_lambda.LayerVersion) and add those layers to each lambda. You'll have to break the layers up to be small enough.
You can use a makefile to generate the layers automatically and then tie the makefile, cdk deploy, and some clean up inside a bash script yo tie them all together.
Note. You are still limited on space with layers and limited to 5 layers. If your dependencies outgrow that then look into Elastic File System. You can install dependencies there and tie that single EFS to any lambda to reference them through PYTHONPATH manipulation built into the EFS lambda connection
Basic makefile idea (but not exact for CDK, but still close)
(and if you never have, a good tutorial on how to use a makefile)
CDK documentation for Layers (use from AssetCode for the location)
AWS dev blog on EFS for Lambdas

Deploy Ablog to github pages

I've just started using the ABlog plugin for sphinx to create a static-site blog.
Is it easy to change ablog deploy to deploy to a different location,
e.g. ../username.github.io/ instead of ./username.github.io/?
I have my ABlog project under source control in a git repository. Creating my username.github.io inside the current ABlog project creates a repo inside a repo and this causes errors (also I don't want to store the built site along with the ABlog repository -- although I could add a .gitignore).

Is it easy to change ablog deploy to deploy to a different location,
e.g. ../username.github.io/ instead of ./username.github.io/?
For ABlog ≥ 0.8.0, yes
For ablog-0.8.0 and above, you can use the -p option to specify a github repo location other than the default (<location of conf.py>/<your username>.github.io):
ablog deploy -p /the/path/for/your/local/github/pages/repo
i.e., in your case
ablog deploy -p ../username.github.io/
How to install the most recent ABlog version
Until version 0.8.0 is available on pypi, you can tell pip to install ablog directly from git:
pip install git+https://github.com/abakan/ablog.git
For Ablog < 0.8.0, no
For versions prior to 0.8.0, the old version of this answer applies:
With the current implementation of ABlog-internal function
ablog_deploy,
the location of the target repository cannot be changed:
String gitdir (holding the path where the local repository will be
created) is set
to
<confdir>/<github_pages option>.github.io but the `github_pages` option is also [used to choose the remote
repository](https://github.com/abakan/ablog/blob/0ed765d95a23ad7dce48c755773ac60dd08cf319/ablog/commands.py#L338),
so passing something else than the GitHub account name will make the
process fail.
Manipulating confdir would be difficult and would result in the
configuration
file
not being found and probably a bunch of other side effects.
However, if you're willing to modify ABlog's source code, it would not
be hard to adapt the assignment of gitdir as you see fit (maybe
introducing another option) to produce the decided effect. (E.g., make
it use confdir if your new option hasn't been set, and have it use
your new option instead if that option has been set.)

Use a filesystem directory instead of pypi behind a corporate firewall?

I am working on a product with a large number of python dependencies within a corporation that does not permit servers to contact external machines. Any attempt to circumvent this rule would be judged harshly.
The application is deployed via a batch-script (it's 32 bit windows) into a virtualenv. This batch script (ideally) should do nothing more than
# Precondition: Source code has been checked-out into myprog/src
cd myprog/src
setup.py install # <-- fails because of dependencies
myprog.exe
The problem comes with managing the dependencies - since it's impossible for the server to connect to the outside world my only solution is to have the script easy_install each of the dependencies before the setup starts, something like this:
cd myproc/deps/windows32
easy_install foo-1.2.3.egg
easy_install bar-2.3.4.egg
easy_install baz-3.4.5.egg <-- works but is annoying/wrong
cd ../../myprog/src
setup.py install
myprog.exe
What I'd like to do is make it so that the setup.py script knows where to fetch it's dependencies from. Ideally this should be set as a command-line argument or environment variable, that way I'm not going to hard-code the location of the dependencies into the project.
Ideally I'd like all of the eggs to be part of a 'distributions' directory: This can be on a network drive, shared on a web-server or possibly even be deployed to a local folder on each of the servers.
Can this be done?

I think what you are looking for is these options to pip: --no-index and --find-links:
--no-index
--find-links /my/local/archives
--find-links http://some.archives.com/archives
Docs are here.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to a list private Python packages as Conda requirement? - python

Related

Poetry Install crashes because excluded dependency cannot be found

what does # mean in case of `pip install <package> # path`?

aws_cdk python error: Unzipped size must be smaller than 262144000 bytes

Deploy Ablog to github pages

Use a filesystem directory instead of pypi behind a corporate firewall?

Categories

Resources