Installing a package to a specific relative directory with pip

Installing a package to a specific relative directory with pip - python

I have several packages that I need to install in a directory like this:
../site-packages/mynamespace/packages
Mainly, this is for historical reasons so imports don't break.
There are several such packages and we would need to be able to pick and choose which package we need based on the needs of the project.
So, this all needs to happen in the requirements.txt file, so something like this in the requirements.txt
requests
celery
beautifulsoup4
path/to/mypackage1
path/to/mypackage3
path/to/mypackage5
(for example)
And then I need
path/to/mypackage1
path/to/mypackage3
path/to/mypackage5
to be installed to:
../site-packages/mynamespace/packages
Accordingly, would mypackage1, mypackage3, and mypacakge5 all to be available from
from mynamespace.packages import mypackage1 #(and 3 and 5)
So far what I've tried to do is create a site.cfg file to go along with the setup.py, but I think pip might be creating it's own install values and bypassing the site.cfg, that looks like this:
[install]
install-base=$HOME
install-purelib=Lib\site-packages\mynamespace\packages
Also, each environment runs in a virtual environment.
I thought of having two separate requirements files to run, but this isn't workable. I need to have all the packages in a single requirements.txt file that is invoked simply by pip install -r requirements.txt
I thought of having them install directly to site-packages and then adding the imports to mynamespace.packages.__init__.py but this is also isn't workable. The packages need to physically be located under site-pacakges.mynamespace.packages.
Furthermore, --target will not work for me because A.) this is absolute path only (relative paths work, but this is based on where the user is currently located) B.) this violates the above requirement of a single requirements.txt file (otherwise requests, etc would all go to the other directory)
I'm thinking there must be some way to hack site.cfg.
Ideally, if I could add --target=relative_path within the requirements.txt that would solve my problem, but I don't think this is possible. Only a few options are allowed within a requirements file (-e being one) but not -t. Even so, it would have to be relative based on the site-packages directory used for the install and not relative where the user is located.
Also, this needs to work in all operating systems.
Thank you

Related

Python code checker for modules included in requirements.txt but unused? [duplicate]

Is there any easy way to delete no-more-using packages from requirements file?
I wrote a bash script for this task but, it doesn't work as I expected. Because, some packages are not used following their PyPI project names. For example;
dj-database-url
package is used as
dj_database_url
My project has many packages in its own requirements file, so, searching them one-by-one is too messy, error-prone and takes too much time. As I searched, IDEs don't have this property, yet.

You can use Code Inspection in PyCharm.
Delete the contents of your requirements.txt but keep the empty file.
Load your project in,
PyCharm go to Code -> Inspect code....
Choose Whole project option in dialog and click OK.
In inspection results panel locate Package requirements section under Python (note that this section will be showed only if there is any requirements.txt or setup.py file).
The section will contain one of the following messages:
Package requirement '<package>' is not satisfied if there is any package that is listed in requirements.txt but not used in any .py file.
Package '<package>' is not listed in project requirements if there is any package that is used in .py files, but not listed in requirements.txt.
You are interested in the second inspection.
You can add all used packages to requirements.txt by right clicking the Package requirements section and selecting Apply Fix 'Add requirements '<package>' to requirements.txt'. Note that it will show only one package name, but it will actually add all used packages to requirements.txt if called for section.
If you want, you can add them one by one, just right click the inspection corresponding to certain package and choose Apply Fix 'Add requirements '<package>' to requirements.txt', repeat for each inspection of this kind.
After that you can create clean virtual environment and install packages from new requirements.txt.
Also note that PyCharm has import optimisation feature, see Optimize imports.... It can be useful to use this feature before any other steps listed above.

The best bet is to use a (fresh) python venv/virtual-env with no packages, or only those you definitely know you need, test your package - installing missing packages with pip as you hit problems which should be quite quick for most software then use the pip freeze command to list the packages you really need. Better you you could use pip wheel to create a wheel with the packages in.
The other approach would be to:
Use pylint to check each file for unused imports and delete them, (you should be doing this anyway),
Run your tests to make sure that it was right,
Use a tool like snakefood or snakefood3 to generate your new list of dependencies
Note that for any dependency checking to work well it is advisable to avoid conditional import and import within functions.
Also note that to be sure you have everything then it is a good idea to build a new venv/virtual-env and install from your dependencies list then re-test your code.

You can find obsolete dependencies by using deptry, a command line utility that checks for various issues with a project's dependencies, such as obsolete, missing or transitive dependencies.
Add it to your project with
pip install deptry
and then run
deptry .
Example output:
-----------------------------------------------------
The project contains obsolete dependencies:
Flask
scikit-learn
scipy
Consider removing them from your projects dependencies. If a package is used for development purposes, you should add
it to your development dependencies instead.
-----------------------------------------------------
Note that for the best results, you should be using a virtual environment for your project, see e.g. here.
Disclaimer: I am the author of deptry.

In pycharm go to Tools -> Sync Python Requirements. There's a 'Remove unused requirements' checkbox.

I've used with success pip-check-reqs.
With command pip-extra-reqs your_directory it will check for all unused dependencies in your_directory
Install it with pip install pip-check-reqs.

Moving Anaconda installation from one user account to another

I apologize if this is not the correct site for this. If it is not, please let me know.
Here's some background on what I am attempting. We are working on a series of chat bots that will go into production. Each of them will run on a environment in Anaconda. However, our setup uses tensorflow, which uses gcc to be compiled, and compliance has banned compilers from production. In addition, compliance rules also frown on us using pip or conda install in production.
As a way to get around this, I'm trying to tar the Anaconda 3 folder and move it into prod, with all dependencies already compiled and installed. However, the accounts between environments have different names, so this requires me to go into the bin folder (at the very least; I'm sure I will need to change them in the lib and pckg folders as well) and use sed -i to rename the hard coded paths to point from \home\<dev account>\anaconda to \home\<prod account>\anaconda, and while this seems to work, its also a good way to mangle my installation.
My questions are as follows:
Is there any good way to transfer anaconda from one user to another, without having to use sed -i on these paths? I've already read that Anaconda itself does not support this, but I would like your input.
Is there any way for me to install anaconda in dev so the scripts in it are either hard coded to use the production account name in their paths, or to use ~.
If I must continue to use sed, is there anything critical I should be aware of? For example, when I use grep <dev account> *, I will some files listed as binary file matches. DO I need to do anything special to change these?
And once again, I am well aware that I should just create a new Anaconda installation on the production machine, but that is simply not an option.
Edit:
So far, I've changed the conda.sh and conda.csh files in /etc, as well as the conda, activate, and deactivate files in the root bin. As such, I'm able to activate and deactivate my environment on the new user account. Also, I've changed the files in the bin folder under the bot environment. Right now, I'm trying to train the bot to test if this works, but it keeps failing and stating that a custom action does not exist in the the list. I don't think that is related to this, though.
Edit2:
I've confirmed that the error I was getting was not related to this. In order to get the bot to work properly with a ported version of Anaconda, all I had to change was the the conda.sh and conda.csh files in /etc so their paths to python use ~, do the same for the activate and deactivate files in /bin, and change the shebang line in the conda file in /bin to use the actual account name. This leaves every other file in /bin and lib still using the old account name in their shebang lines and other variable that use the path, and yet the bots work as expected. By all rights, I don't think this should work, but it does.

Anaconda is touchy about path names. They're obviously inserted into scripts, but they may be inserted into binaries as well. Some approaches that come to mind are:
Use Docker images in production. When building the image:
Install compilers as needed.
Build your stuff.
Uninstall the compilers and other stuff not needed at runtime.
Squash the image into a single layer.
This makes sure that the uninstalled stuff is actually gone.
Install Anaconda into the directory \home\<prod account>\anaconda on the development or build systems as well. Even though accounts are different, there should be a way to create a user-writeable directory in the same location.
Even better: Install Anaconda into a directory \opt\anaconda in all environments. Or some other directory that does not contain a username.
If you cannot get a directory outside of the user home, negotiate for a symlink or junction (mklink.exe /d or /j) at a fixed path \opt\anaconda that points into the user home.
If necessary, play it from the QA angle: Differing directory paths in production, as compared to all other environments, introduce a risk for bugs that can only be detected and reproduced in production. The QA or operations team should mandate that all applications use fixed paths everywhere, rather than make an exception for yours ;-)
Build inside a Docker container using directory \home\<prod account>\anaconda, then export an archive and run on the production system without Docker.
It's generally a good idea to build inside a reproducible Docker environment, even if you can get a fixed path without an account name in it.
Bundle your whole application as a pre-compiled Anaconda package, so that it can be installed without compilers.
That doesn't really address your problem though, because even conda install is frowned upon in production. But it could simplify building Docker images without squashing.
I've been building Anaconda environments inside Docker and running them on bare metal in production, too. But we've always made sure that the paths are identical across environments. I found mangling the paths too scary to even try. Life has become much simpler when we switched to Docker images everywhere. But if you have to keep using sed... Good Luck :-)

This is probably what you need : pip2pi.
This only works for pip compatible packages.
As I understand you need to move your whole setup as previously compiled as .tar.gz file, then here are a few things you could try:
Create a requirements.txt. These packages can help :
a. pipreqs
$ pipreqs /home/project/location
Successfully saved requirements file in /home/project/location/requirements.txt
b. snakefood.
Then, install pip2pi
$ pip install pip2pi
$ pip2tgz packages/ foo==1.2
...
$ ls packages/
foo-1.2.tar.gz
bar-0.8.tar.gz
pip2tgz passes package arguments directly to pip, so packages can be specified in any format that pip recognises:
$ cat requirements.txt
foo==1.2
http://example.com/baz-0.3.tar.gz
$ pip2tgz packages/ -r requirements.txt bam-2.3/
...
$ ls packages/
foo-1.2.tar.gz
bar-0.8.tar.gz
baz-0.3.tar.gz
bam-2.3.tar.gz
After getting all .tar.gz files, .tar.gz files can be turned into PyPI-compatible "simple" package index using the dir2pi command:
$ ls packages/
bar-0.8.tar.gz
baz-0.3.tar.gz
foo-1.2.tar.gz
$ dir2pi packages/
$ find packages/
packages/
packages/bar-0.8.tar.gz
packages/baz-0.3.tar.gz
packages/foo-1.2.tar.gz
packages/simple
packages/simple/bar
packages/simple/bar/bar-0.8.tar.gz
packages/simple/baz
packages/simple/baz/baz-0.3.tar.gz
packages/simple/foo
packages/simple/foo/foo-1.2.tar.gz

but they may be inserted into binaries as well
I can confirm that some packages have hard-coded the absolute path (including username) into the compiled binary. But if you restrict usernames to have the same length, you can apply sed on both binary and text files to make almost everything work as perfect.
On the other hand, if you copy the entire folder and use sed to replace usernames on only text files, you can run most of the installed packages. However, operations involving run-time compilation might fail, one example is installing a new package that requires compilation during installation.

When working with a venv virtual environment, which files should I be commiting to my git repository?

Using GitHub's .gitignore, I was able to filter out some files and directories. However, there's a few things that left me a little bit confused:
GitHub's .gitignore did not include /bin and /share created by venv. I assumed they should be ignored by git, however, as the user is meant to build the virtual environment themselves.
Pip generated a pip-selfcheck.json file, which seemed mostly like clutter. I assume it usually does this, and I just haven't seen the file before because it's been placed with my global pip.
pyvenv.cfg is what I really can't make any sense of, though. On one hand, it specifies python version, which ought to be needed for others who want to use the project. On the other hand, it also specifies home = /usr/bin, which, while perhaps probably correct on a lot of Linux distributions, won't necessarily apply to all systems.
Are there any other files/directories I missed? Are there any stricter guidelines for how to structure a project and what to include?

Although venv is a very useful tool, you should not assume (unless you have good reason to do so) that everyone who looks at your repository uses it. Avoid committing any files used only by venv; these are not strictly necessary to be able to run your code and they are confusing to people who don't use venv.
The only configuration file you need to include in your repository is the requirements.txt file generated by pip freeze > requirements.txt which lists package dependencies. You can then add a note in your readme instructing users to install these dependencies with the command pip install -r requirements.txt. It would also be a good idea to specify the required version of Python in your readme.

Difference between adding path to PYTHONPATH and installing your own module

I'm working on a python project that contains a number of routines I use repeatedly. Instead of rewriting code all the time, I just want to update my package and import it; however, it's nowhere near done and is constantly changing. I host the package on a repo so that colleagues on various machines (UNIX + Windows) can pull it into their local repos and use it.
It sounds like I have two options, either I can keeping installing the package after every change or I can just add the folder directory to my system's path. If I change the package, does it need to be reinstalled? I'm using this blog post as inspiration, but the author there doesn't stress the issue of a continuously changing package structure, so I'm not sure how to deal with this.
Also if I wanted to split the project into multiple files and bundle it as a package, at what level in the directory structure does the PTYHONPATH need to be at? To the main project directory, or the .sample/ directory?
README.rst
LICENSE
setup.py
requirements.txt
sample/__init__.py
sample/core.py
sample/helpers.py
docs/conf.py
docs/index.rst
tests/test_basic.py
tests/test_advanced.py
In this example, I want to be able to just import the package itself and call the modules within it like this:
import sample
arg = sample.helper.foo()
out = sample.core.bar(arg)
return out
Where core contains a function called foo

PYTHONPATH is a valid way of doing this, but in my (personal) opinion it's more useful if you have a whole different place where you keep your python variables. Like /opt/pythonpkgs or so.
For projects where I want it to be installed and also I have to keep developing, I use develop instead of install in setup.py:
When installing the package, don't do:
python setup.py install
Rather, do:
python setup.py develop
What this does is that it creates a synlink/shortcut (I believe it's called egglink in python) in the python libs (where the packages are installed) to point to your module's directory. Hence, as it's a shortcut/symlink/egglink when ever you change a python file, it will immediately reflect the next time you import that file.
Note: Using this, if you delete the repository/directory you ran this command from, the package will cease to exist (as its only a shortcut)
The equivalent in pip is -e (for editable):
pip install -e .
Instead of:
pip install .

how to check what packages are used to include in a requirements.txt file?

I'm starting to dive into python but I'm a bit confused with how the requirements.txt file work. How do I know what to include in it?
For example, the current project I'm working on, I only installed Flask. So do I just add only flask to that file? Or are there other packages that I don't know about - if so is there a way to find out (e.g display a full list)?

You could run pip to get the list of requirements for your project.
pip freeze > requirements.txt

You could just "grep" the Python source files in your project for "import " to get an exhaustive list of packages you use. Remove the obvious ones that are part of the standard library, like datetime or whatever, and the rest are what you might include in requirements.txt.
I don't know of a more "automatic" way to do it; another way might be to set up a clean virtualenv or other sandboxed install of Python with no extra packages, and try installing your software in there using only your requirements.txt.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.