Use setuptools to install from location

Use setuptools to install from location - python

I have a framework for a site that I want to use in multiple projects but I don't want to submit my framework to PyPi. Is there anyway I can tell my setup.py to install the framework from a specific location?
Here is my current setup.py
from setuptools import setup
setup(
name='Website',
version='0.2.1',
install_requires=[
'boto>=2.6',
'fabric>=1.4',
'lepl>=5.1',
'pygeoip>=0.2.4',
'pylibmc>=1.2.3',
'pymongo>=2.2',
'pyyaml>=3.1',
'requests>=0.12',
'slimit>=0.7.4',
'thrift>=0.8.0',
'tornado>=2.3',
],
)
Those dependencies are actually all the dependencies for my framework so if I could include it somehow I could only have the framework listed here.

It looks like all of your requirements are public (on PyPI), and you don't need specific versions of them, just "new enough". In 2016, when you can count on everyone having a recent-ish version of pip, there's really nothing to do. If you just pip install . from the source directory or pip install git+https://url/to/package or similar, it will just pull the latest versions of the dependencies off the net. The fact that your package isn't on PyPI won't stop pip from finding its dependencies there.
Or, if you want to stash them all locally, you can set up a local PyPI index. Although in that case, it probably would be simpler to push your package to that same local index, and install it from there.
If you need anything more complicated, a requirements file can take care of that for you.
In particular, if you need to distribute the package to other people in your organization who may not have your team's local index set up, or for some reason you can't set up a local index in the first place, you can put all the necessary information in the requirements file--or, if it's more appropriate, on the command line used to install your package (which even works if you're stuck with easy_install or ancient versions of pip).
The documentation gives full details, and this blog post explains it very nicely, but the short version is this:
If you have a local PyPI index, provide --extra-index-url=http://my.server/path/to/my/pypi/.
If you've got an HTTP server that you can drop the packages on, and you can enable the "auto index directory contents" option in your server, just provide --find-links=http://my.server/path/to/my/packages/.
If you want to use local files (or SMB/AFP/etc. file sharing), create a trivial HTML file with nothing but links to all local packages, and provide --find-links=file:///path/to/my/index.html.
Again, these can go on the command line of a "to install this package, run this" (or a curl | sh install script), but usually you just want to put them in a requirements file. If so, make sure to use only one value per option (e.g., if you want to add two extra indexes, add two --extra-index-url params) and put each one on its own line.
A requirements file also lets you specify specific versions of each package, so you can be sure people are deploying with the same code you developed and tested with, which is often useful in these kinds of situations.

Related

Python code checker for modules included in requirements.txt but unused? [duplicate]

Is there any easy way to delete no-more-using packages from requirements file?
I wrote a bash script for this task but, it doesn't work as I expected. Because, some packages are not used following their PyPI project names. For example;
dj-database-url
package is used as
dj_database_url
My project has many packages in its own requirements file, so, searching them one-by-one is too messy, error-prone and takes too much time. As I searched, IDEs don't have this property, yet.

You can use Code Inspection in PyCharm.
Delete the contents of your requirements.txt but keep the empty file.
Load your project in,
PyCharm go to Code -> Inspect code....
Choose Whole project option in dialog and click OK.
In inspection results panel locate Package requirements section under Python (note that this section will be showed only if there is any requirements.txt or setup.py file).
The section will contain one of the following messages:
Package requirement '<package>' is not satisfied if there is any package that is listed in requirements.txt but not used in any .py file.
Package '<package>' is not listed in project requirements if there is any package that is used in .py files, but not listed in requirements.txt.
You are interested in the second inspection.
You can add all used packages to requirements.txt by right clicking the Package requirements section and selecting Apply Fix 'Add requirements '<package>' to requirements.txt'. Note that it will show only one package name, but it will actually add all used packages to requirements.txt if called for section.
If you want, you can add them one by one, just right click the inspection corresponding to certain package and choose Apply Fix 'Add requirements '<package>' to requirements.txt', repeat for each inspection of this kind.
After that you can create clean virtual environment and install packages from new requirements.txt.
Also note that PyCharm has import optimisation feature, see Optimize imports.... It can be useful to use this feature before any other steps listed above.

The best bet is to use a (fresh) python venv/virtual-env with no packages, or only those you definitely know you need, test your package - installing missing packages with pip as you hit problems which should be quite quick for most software then use the pip freeze command to list the packages you really need. Better you you could use pip wheel to create a wheel with the packages in.
The other approach would be to:
Use pylint to check each file for unused imports and delete them, (you should be doing this anyway),
Run your tests to make sure that it was right,
Use a tool like snakefood or snakefood3 to generate your new list of dependencies
Note that for any dependency checking to work well it is advisable to avoid conditional import and import within functions.
Also note that to be sure you have everything then it is a good idea to build a new venv/virtual-env and install from your dependencies list then re-test your code.

You can find obsolete dependencies by using deptry, a command line utility that checks for various issues with a project's dependencies, such as obsolete, missing or transitive dependencies.
Add it to your project with
pip install deptry
and then run
deptry .
Example output:
-----------------------------------------------------
The project contains obsolete dependencies:
Flask
scikit-learn
scipy
Consider removing them from your projects dependencies. If a package is used for development purposes, you should add
it to your development dependencies instead.
-----------------------------------------------------
Note that for the best results, you should be using a virtual environment for your project, see e.g. here.
Disclaimer: I am the author of deptry.

In pycharm go to Tools -> Sync Python Requirements. There's a 'Remove unused requirements' checkbox.

I've used with success pip-check-reqs.
With command pip-extra-reqs your_directory it will check for all unused dependencies in your_directory
Install it with pip install pip-check-reqs.

python pip priority order with index-url and extra-index-url

I searched a bit but could not find a clear answer.
The goal is, to have two pip indexes, one is a private index, that will be a first priority. And one is the standard PyPI. The priority is there to prevent the security risk of code injection.
Say I have library named lib, and I configure index_url = http://my_private_pypi_repo and extra_index_url = https://pypi.org/simple
If I pip install lib, and lib exists in both indexes. What index will get the priority? From where it is going to be installed from?
Also, if I pip install lib=0.0.2 but lib exists in my private index at version 0.0.1. Is it going to look at PyPI as well?
And what is a good way to be in control, that certain libraries will only be fetched from the private index if they exists there, and will not be looked for at PyPI?

The short answer is: there is no prioritization and you probably should avoid using --extra-index-url entirely.
This is asked and answered here: https://github.com/pypa/pip/issues/5045#issuecomment-369521345
Question:
I have this in my pip.conf:
[global]
index-url = https://myregistry-xyz.com
extra-index-url = https://pypi.python.org/pypi
Let's assume packageX exists in both registries and I run pip install packageX.
I expect pip to install packageX from https://myregistry-xyz.com, but pip will use https://pypi.python.org/pypi instead.
If I switch the values for index-url and extra-index-url I get the same result. pypi is always prioritized.
Answer:
Packages are expected to be unique up to name and version, so two wheels with the same package name and version are treated as indistinguishable by pip. This is a deliberate feature of the package metadata, and not likely to change.
I would also recommend reading this discussion: https://discuss.python.org/t/dependency-notation-including-the-index-url/5659
There are quite a lot of things that are addressed in this discussion, some that is clearly out of scope for this question, but everything is very informative anyway.
In there, there should be the key takeaway for you:
Pip does not really prioritize one index over the other in theory. In practice, because of a coincidence in the way things are implemented in code, it might be that one is always checked first, but it is not a behavior you should rely on.
And what is a good way to be in control, that certain libraries will only be fetched from the private index if they exists there, and will not be looked for at PyPI?
You should setup and curate your own package index (devpi, pydist, jfrog artifactory, sonatype nexus, etc.) and use it exclusively, meaning: never use --extra-index-url. This is the only way you can have exact control over what gets downloaded. This custom repository might function mostly a proxy for the public PyPI, except for a couple of dependencies.
Related:
pip: selecting index url based on package name?

The title of this question feels a bit like an instance of XY problem. If you would elaborate more on what you want to achieve and what your constraints are we may be able to give you a better answer.
That said, sinoroc's suggestion to curate your own package index and use only that is a good one. A few other ideas also come to mind:
Update: Turns out pip may run distributions other than those in the constraints file so this method should probably be considered insecure. Additionally hashes are kind of broken on recent releases of pip.
Using a constraints file with hashes. This file can be generated using pip-tools like pip-compile --generate-hashes assuming you have documented your dependencies in a file named requirements.in. You can then install packages like pip install -c requirements.txt some_package.
Pro: What may be installed is documented alongside your code in your VCS.
Con: Controlling what is downloaded the first time is either tricky or laborious.
Con: Hash checking can be slow.
Con: You run into issues more frequently than when not using hashes. Some can be worked around others cannot; it is for instance not possible to combine constraints like -e file://` with hashes.
Use an alternative packaging tool like pipenv. It works similarly to the previous suggestion.
Pro: Easy to use
Con: Harder to integrate into your workflow if it does not fit naturally.
Curate packages locally. Packages and dependencies can be downloaded like pip download --dest some_dir some_package and installed like pip install --no-index --find-links some_dir.
Pro: What may be installed can be documented alongside your code, if you track the artifacts in VCS e.g. git lfs.
Con: Either all packages are downloaded or none are.
Use a hermetic build system. I know bazel advertise this as a feature, not sure about others like pants and buck.
Pro: May be the ultimate solution if you want control over your builds.
Con: Does not integrate well with open source python ecosystem afaik.
Con: A lot of overhead.
1: https://en.wikipedia.org/wiki/XY_proble

When working with a venv virtual environment, which files should I be commiting to my git repository?

Using GitHub's .gitignore, I was able to filter out some files and directories. However, there's a few things that left me a little bit confused:
GitHub's .gitignore did not include /bin and /share created by venv. I assumed they should be ignored by git, however, as the user is meant to build the virtual environment themselves.
Pip generated a pip-selfcheck.json file, which seemed mostly like clutter. I assume it usually does this, and I just haven't seen the file before because it's been placed with my global pip.
pyvenv.cfg is what I really can't make any sense of, though. On one hand, it specifies python version, which ought to be needed for others who want to use the project. On the other hand, it also specifies home = /usr/bin, which, while perhaps probably correct on a lot of Linux distributions, won't necessarily apply to all systems.
Are there any other files/directories I missed? Are there any stricter guidelines for how to structure a project and what to include?

Although venv is a very useful tool, you should not assume (unless you have good reason to do so) that everyone who looks at your repository uses it. Avoid committing any files used only by venv; these are not strictly necessary to be able to run your code and they are confusing to people who don't use venv.
The only configuration file you need to include in your repository is the requirements.txt file generated by pip freeze > requirements.txt which lists package dependencies. You can then add a note in your readme instructing users to install these dependencies with the command pip install -r requirements.txt. It would also be a good idea to specify the required version of Python in your readme.

Easy-install live python libraries/scripts

I have a number of python "script suites" (as I call them) which I would like to make easy-to-install for my colleagues. I have looked into pip, and that seems really nice, but in that regimen (as I understand it) I would have to submit a static version and update them on every change.
As it happens I am going to be adding and changing a lot of stuff in my script suites along the way, and whenever someone installs it, I would like them to get the newest version. With pip, that means that on every commit to my repository, I will also have to re-submit a package to the PyPI index. That's a lot of unnecessary work.
Is there any way to provide an easy cross-platform installation (via pip or otherwise) which pulls the files directly from my github repo?

I'm not sure if I understand your problem entirely, but you might want to use pip's editable installs[1]
Here's a brief example: In this artificial example let's suppose you want to use git as CVS.
git clone url_to_myrepo.git path/to/local_repository
pip install [--user] -e path/to/local_repository
The installation of the package will reflect the state of your local repository. Therefore there is no need to reinstall the package with pip when the remote repository gets updated. Whenever you pull changes to your local repository, the installation will be up-to-date as well.
[1] http://pip.readthedocs.org/en/latest/reference/pip_install.html#editable-installs

python packages: how to depend on the latest version of a separate package

I'm developing coding a test django site, which I keep in a bitbucket repository in order to be able to deploy it easily on a remote server, and possible share development with a friend. I use hg for version control.
The site depends on a 3rd party app (django-registration), which I needed to customize for my site, so I forked the original app and created a 2nd repository for it (the idea being that this way I can keep up with updates on the original, which wouldnt be possible if I just pasted the code into my main site, plus add to my own custom code) (You can see some more details on this question)
My question is, how do I specify requirements on my setup.py file so that when I install my django site I get the latest version of my fork for the 3rd party app (I use distribute rather than setuptools in case that makes a difference)?
I have tried this:
install_requires = ['django', 'django-registration'],
dependency_links = ['https://myuser#bitbucket.org/myuser/django-registration#egg=django_registration']
but this gets me the latest named version on the original trunk (so not even the tip version)
Using a pip requirements file however works well:
hg+https://myuser#bitbucket.org/myuser/django-registration#egg=django-registration
gets me the latest version from my fork.
Is there a way to get this same behaviour directly from the setup.py file, without having to install first the code for the site, then running pip install -r requirements.txt?
This question is very informative, but seems to suggest I should depend on version 'dev' or the 3rd party package, which doesn't work (I guess there would have to be a specific version tagged as dev for that)
Also I'm a complete newbie in packaging / distribute / setuptools, so dont hold back spelling out the steps :)
Maybe I should change the setup.py file on my fork of the 3rd party app, and make sure it mentions a version number. Generally I'm curious to know what's a source distribution, as opposed to simply having my code on a public repository, and what would be a binary distribution in my case (an egg file?), and whether that would be any more practical for me when deploying remotely / have my friend deploy on his pc. And also would like to know how do I tag a version on my repository for setup.py to refer to it, is it simply a version control tag (hg in my case)?. Feel free to comment on any details you think are important for the starter packager :)
Thanks!

put this:
dependency_links=['https://bitbucket.org/abraneo/django-registration/get/tip.tar.gz#egg=django-registration']
in the dependency_links you have to pass a download url like that one.
"abraneo" is a guy who forked this project too, replace his name by yours.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.