I am working on consolidating a set of RPM packages into a new, more organized set of Yum repositories. I have already repackaged a subset of them by hand and uploaded them into the repositories, but I have a much larger set that either build and package automatically, or have newer versions that are available from third party sources.
I need to be able to, given a package name (and optionally, a list of repository ids), programmatically check and see if it is already available, and if not, upload it into the repository. I have played around with repoquery and yum search, but neither seem sufficiently scriptable for my purposes.
I have a similar requirement for one of my scripts. I use the repoquery command to check and see if a particular package/version exists in the remote repository.
Using the below command, you can easily see if a particular package (and all it's versions) exist.
repoquery --repoid=<myrepository_id> --qf="package|%{name}|%{version}|%{release}|%{arch}" <packagename_of_interest>
Related
I searched a bit but could not find a clear answer.
The goal is, to have two pip indexes, one is a private index, that will be a first priority. And one is the standard PyPI. The priority is there to prevent the security risk of code injection.
Say I have library named lib, and I configure index_url = http://my_private_pypi_repo and extra_index_url = https://pypi.org/simple
If I pip install lib, and lib exists in both indexes. What index will get the priority? From where it is going to be installed from?
Also, if I pip install lib=0.0.2 but lib exists in my private index at version 0.0.1. Is it going to look at PyPI as well?
And what is a good way to be in control, that certain libraries will only be fetched from the private index if they exists there, and will not be looked for at PyPI?
The short answer is: there is no prioritization and you probably should avoid using --extra-index-url entirely.
This is asked and answered here: https://github.com/pypa/pip/issues/5045#issuecomment-369521345
Question:
I have this in my pip.conf:
[global]
index-url = https://myregistry-xyz.com
extra-index-url = https://pypi.python.org/pypi
Let's assume packageX exists in both registries and I run pip install packageX.
I expect pip to install packageX from https://myregistry-xyz.com, but pip will use https://pypi.python.org/pypi instead.
If I switch the values for index-url and extra-index-url I get the same result. pypi is always prioritized.
Answer:
Packages are expected to be unique up to name and version, so two wheels with the same package name and version are treated as indistinguishable by pip. This is a deliberate feature of the package metadata, and not likely to change.
I would also recommend reading this discussion: https://discuss.python.org/t/dependency-notation-including-the-index-url/5659
There are quite a lot of things that are addressed in this discussion, some that is clearly out of scope for this question, but everything is very informative anyway.
In there, there should be the key takeaway for you:
Pip does not really prioritize one index over the other in theory. In practice, because of a coincidence in the way things are implemented in code, it might be that one is always checked first, but it is not a behavior you should rely on.
And what is a good way to be in control, that certain libraries will only be fetched from the private index if they exists there, and will not be looked for at PyPI?
You should setup and curate your own package index (devpi, pydist, jfrog artifactory, sonatype nexus, etc.) and use it exclusively, meaning: never use --extra-index-url. This is the only way you can have exact control over what gets downloaded. This custom repository might function mostly a proxy for the public PyPI, except for a couple of dependencies.
Related:
pip: selecting index url based on package name?
The title of this question feels a bit like an instance of XY problem. If you would elaborate more on what you want to achieve and what your constraints are we may be able to give you a better answer.
That said, sinoroc's suggestion to curate your own package index and use only that is a good one. A few other ideas also come to mind:
Update: Turns out pip may run distributions other than those in the constraints file so this method should probably be considered insecure. Additionally hashes are kind of broken on recent releases of pip.
Using a constraints file with hashes. This file can be generated using pip-tools like pip-compile --generate-hashes assuming you have documented your dependencies in a file named requirements.in. You can then install packages like pip install -c requirements.txt some_package.
Pro: What may be installed is documented alongside your code in your VCS.
Con: Controlling what is downloaded the first time is either tricky or laborious.
Con: Hash checking can be slow.
Con: You run into issues more frequently than when not using hashes. Some can be worked around others cannot; it is for instance not possible to combine constraints like -e file://` with hashes.
Use an alternative packaging tool like pipenv. It works similarly to the previous suggestion.
Pro: Easy to use
Con: Harder to integrate into your workflow if it does not fit naturally.
Curate packages locally. Packages and dependencies can be downloaded like pip download --dest some_dir some_package and installed like pip install --no-index --find-links some_dir.
Pro: What may be installed can be documented alongside your code, if you track the artifacts in VCS e.g. git lfs.
Con: Either all packages are downloaded or none are.
Use a hermetic build system. I know bazel advertise this as a feature, not sure about others like pants and buck.
Pro: May be the ultimate solution if you want control over your builds.
Con: Does not integrate well with open source python ecosystem afaik.
Con: A lot of overhead.
1: https://en.wikipedia.org/wiki/XY_proble
I'd like to write a bash script that will be executed on new users' linux machines. The script should make sure the machine is ready to compile and run several fortran and python scripts. My intent was to check for GCC and subsequently check for gfortran in GCC and then use homebrew to install the gfortran formula if not present. Similarly with python, I would use pip to install python3 if python isn't up to date.
I would like some advice before I begin. What are some best practices for checking for programs and installing them through a behind the scenes bash script? What are things to be careful for? I realize this question is vague but literally any advice is greatly appreciated.
A possible different approach to consider.
If the packages that you want are available for installation via the standard package manager for the Linux distribution that you are using, you could build a metapackage which, when installed, will force the installation of all of the components. (A metapackage is one which does not directly contain any software itself but which lists a number of dependencies that the package manager will need to satisfy when installing it.)
If you also set up a repository on an http server containing your metapackage, and configure the package manager on the clients to point to it, then if you later decide to update the list of packages contained in your standard software suite across all the client machines, all you need to do is to publish a new version of your metapackage to the repository, and have the client machines rerun their operating system's software updater. For the individual software components listed in the metapackage, the software update will pick up any updates to these in the ordinary way, provided that the dependency information in the metapackage does not specify an exact version (you could make the metapackage depend on >= some version number).
This is from experience where this approach has worked well in setting up and maintaining a common software platform across a large number of machines (including nodes of a compute cluster) -- although in cases where the required software components are not available in standard package repositories, it is then necessary to build a package for it locally and publish that to your repository; this can then be added as a requirement in your metapackage.
The above is in general terms, but taking the specific example on which this experience is based, if the client machines are running CentOS, then you would create a meta-RPM with dependencies on packages in standard repositories (e.g. base and EPEL) plus any other RPMs that you need to build locally, publish it to the repository on the http server and use createrepo to construct the necessary repository metadata, and create a config file in /etc/yum.repo.d on the clients to point to your repository and run yum install <your_meta_package>. Any subsequent yum update on the clients will pick up any changes that you push to your repository. The http server itself is simply serving static content, so does not need special configuration.
For Ubuntu etc, you could do similarly but with deb packages and APT.
One thing worth mentioning is that the above is aimed at a rather homogeneous environment, where the same operating system is in use on all the client machines. If you want to cater for a more heterogeneous environment, then an alternative approach would be to use conda. You could similarly maintain a conda channel containing a metapackage.
I am tasked to create and use a local PIP repo.
(the reason being that we'll be using Python 2.7 for at least one more year and fear of packages or older versions being removed)
I am looking at bandersnatch and it is not clear to me whether it is an on-line mirroring tool which i need to run as a service, or can be used to offload a one-off copy?
I'd prefer a second option (don't want to complicate the system unnecessarily), and would be satisfied by running an update say daily or even weekly.
An alternative approach would be to download only the packages and version we actually use by looking at the requirements.txt file, but this would require running an update every time a developer wants to add or update a package.
A way to create a local python package repository is throught Sonatype Nexus, with Nexus you can create some kinds of repos:
Hosted repo (our own and internal repo)
Proxy repo (proxy others repo)
Group repo (group and priority sort a list of hosted and proxied repos)
For example, you can create a group repo with the following logic order:
- First search the package in my own repo
- If it not exists, search it on global public repo.
It is a transparent way to your app.
https://help.sonatype.com/repomanager3/formats/pypi-repositories
There is a Docker image if you want too. https://hub.docker.com/r/sonatype/nexus3
I used it before to different purposes and I see it very mature and complete.
a script that generates a simple repository with N recent versions of 4000 most used packages on pypi. Advantage is it can hold multiple versions as in pypi. https://gist.github.com/harisankar-krishna-swamy/cac5d1e6c1ae074b39286c1336bff63d
I'm tryin' to find a way to install a python package with its docs.
I have to use this on machines that have no connection to the internet and so online help is not a solution to me. Similar questions already posted here are telling that this is not possible. Do you see any way to make this easier as I'm currently doing this:
downloading the source archive
extracting the docs folder
running sphinx
launching the index file from a browser (firefox et al.)
Any ideas?
P.S. I'm very new to Python, so may be I'm missing something... And I'm using Windows (virtual) machines...
Edit:
I'm talking about two possible ways to install a package:
installing the package via easy_install (or any other to me unknown way) on a machine while I'm online, and then copying the changes to my installation to the target machine
downloading the source package (containing sphinx compatible docs) and installing the package on the target machine off-line
But in any case I do not know a way to install the package in a way that the supplied documentations are installed alltogether with module!
You might know that there exists a folder for the docs: <python-folder>/Doc which will contain only python278.chm after installation of Python 2.78 on Windows. So, I expect that this folder will also contain the docs for a newly installed package. This will avoid looking at docs for a different package version on the internet as well as my specific machine setup problems.
Most packages I'm currently using are supplied with documentation generated with sphinx, and their source package contains all the files necessary to generate the docs offline.
So what I'm looking for is some cli argument for a package installer like it's common for unix/linux based package managers. I did expect something like:
easy_install a_package --with-html-docs.
Here are some scenarios:
packages have documentation included within the zip/tar
packages have a -docs to download/install seperately
packages that have buildable documentation
packages that only have online documentation
packages with no documentation other than internal.
packages with no documentation anywhere.
The sneaky trick that you can use for options 1 & 3 is to download the package as a tar or zip and then use easy-install archive_name on the target machine this will install the package from the zip or tar file including (I believe) any documentation. You will find that there are dependencies that are unmet in some packages - those should give an error on the easy install mentioning what is missing - you will need to get those and use the same trick.
A couple of things that are very handy - virtual-env will let you have a library free version of python running so you can get the requirements and pip -d <dir> which will download without installing storing your packages in dir.
You should be able to use the same trick for option 2.
With packages that only have on-line documentation you could look to see if there is a downloadable version or could scrape the web pages and use a tool like pandoc to convert to something useful.
In the 5 scenario I would suggest raising a ticket on the package stating that lack of accessible documentation makes it virtually unusable and running sphinx on it.
In scenario 6 I suggest raising the ticket but missing out virtually and avoiding the use of that package on the basis that if it has no documentation it probably has a lot of other problems as well - if you are a package author feeling slandered reading this then you should be feeling ashamed instead.
Mirror/Cache PyPi
Another possibly is to have a linux box, or VM, initially outside of your firewall, running a cached or mirroring service e.g. pipyserver, install the required packages through it to populate the cache and then move it, (or its cache to another pip server), inside the firewall and you can then use pip with the documented settings to do all your installs inside the firewall. See also the answer here.
I have a framework for a site that I want to use in multiple projects but I don't want to submit my framework to PyPi. Is there anyway I can tell my setup.py to install the framework from a specific location?
Here is my current setup.py
from setuptools import setup
setup(
name='Website',
version='0.2.1',
install_requires=[
'boto>=2.6',
'fabric>=1.4',
'lepl>=5.1',
'pygeoip>=0.2.4',
'pylibmc>=1.2.3',
'pymongo>=2.2',
'pyyaml>=3.1',
'requests>=0.12',
'slimit>=0.7.4',
'thrift>=0.8.0',
'tornado>=2.3',
],
)
Those dependencies are actually all the dependencies for my framework so if I could include it somehow I could only have the framework listed here.
It looks like all of your requirements are public (on PyPI), and you don't need specific versions of them, just "new enough". In 2016, when you can count on everyone having a recent-ish version of pip, there's really nothing to do. If you just pip install . from the source directory or pip install git+https://url/to/package or similar, it will just pull the latest versions of the dependencies off the net. The fact that your package isn't on PyPI won't stop pip from finding its dependencies there.
Or, if you want to stash them all locally, you can set up a local PyPI index. Although in that case, it probably would be simpler to push your package to that same local index, and install it from there.
If you need anything more complicated, a requirements file can take care of that for you.
In particular, if you need to distribute the package to other people in your organization who may not have your team's local index set up, or for some reason you can't set up a local index in the first place, you can put all the necessary information in the requirements file--or, if it's more appropriate, on the command line used to install your package (which even works if you're stuck with easy_install or ancient versions of pip).
The documentation gives full details, and this blog post explains it very nicely, but the short version is this:
If you have a local PyPI index, provide --extra-index-url=http://my.server/path/to/my/pypi/.
If you've got an HTTP server that you can drop the packages on, and you can enable the "auto index directory contents" option in your server, just provide --find-links=http://my.server/path/to/my/packages/.
If you want to use local files (or SMB/AFP/etc. file sharing), create a trivial HTML file with nothing but links to all local packages, and provide --find-links=file:///path/to/my/index.html.
Again, these can go on the command line of a "to install this package, run this" (or a curl | sh install script), but usually you just want to put them in a requirements file. If so, make sure to use only one value per option (e.g., if you want to add two extra indexes, add two --extra-index-url params) and put each one on its own line.
A requirements file also lets you specify specific versions of each package, so you can be sure people are deploying with the same code you developed and tested with, which is often useful in these kinds of situations.