Should I be installing my Python modules?

Should I be installing my Python modules? - python

I generally don't bother to install Python modules. I use web2py, and just dump them in the modules folder and let it take care of the local imports. It just always seemed like the most straightforward way of doing things- never felt right about handling dependencies at a system-wide level, and never felt like messing with virtual envs.
On one of my other questions, the answerer said
Generally, the best practice for 3rd party modules is to install them
via pip or easy_install (preferably in a virtualenv), if they're
available on PyPI, rather than copying them somewhere onto your
PYTHONPATH. ... [because that] runs the install scripts hooks
necessary to install executable scripts, build C extensions, etc.,
that isn't done by just copying in a module.
I don't fully understand this. I always thought it was more of a preference, but is it true that it's better practice to install 3rd party modules, and am I potentially causing problems by not doing that? Does using a framework like web2py make a difference?

It depends on the module and what you want to use it for. Some packages come with useful command line tools, which may only be available to you if you install them appropriately.
Conversely, if you're writing code which is to be distributed to environments you don't have much control over, you often have to keep a copy of the code locally within your project, as the target environment may not have the package... web projects often fall into this category, depending on your serving environment, of course.

Related

Is it possible to have users not pip install modules and instead include the modules used in a different folder and then import that?

I want to know if I can create a python script with a folder in the same directory with all the assets of a python module, so when someone wants to use it, they would not have to pip install module, because it would import from the directory.

Yes, you can, but it doesn't mean that you should.
First, ask yourself who is suposed to use that code.
If you plan to give it to consumers, it would be a good idea to use a tool like py2exe and create executable file which would include all modules and not allow for code to be changed.
If you plan to share it with another developer, you might want to look into virtual environments and requirements.txt file.
There are multiple reasons why sharing modules is bad idea:
It is harder to update modules later, at least without upgrading whole project.
It uses more space on version control, which can create issues on huge projects with hundreds of modules and branches
It might be illegal as some licenses specifically forbid including their code in your source code.
The pip install of some module might do different things depending on operating system version or installed packages. The modules on your machine might be suboptimal on someone else's machine, and in some instances might not even work.
And probably more that I can't think of right now.
The only situation where I saw this being unavoidable was when the module didn't support python implementation the application was running on. The module was changed, and its source was put under lib folder with the rest of the libraries.

I think you can add the directory with python modules into PYTHONPATH. Then people want to use those modules just need has this envvar set.
https://docs.python.org/3/using/cmdline.html#envvar-PYTHONPATH

How to easily distribute Python software that has Python module dependencies? Frustrations in Python package installation on Unix

My goal is to distribute a Python package that has several other widely used Python packages as dependencies. My package depends on well written, Pypi-indexed packages like pandas, scipy and numpy, and specifies in the setup.py that certain versions or higher of these are needed, e.g. "numpy >= 1.5".
I found that it's immensely frustrating and nearly impossible for Unix savvy users who are not experts in Python packaging (even if they know how to write Python) to install a package like mine, even when using what are supposed to be easy to use package managers. I am wondering if there is an alternative to this painful process that someone can offer, or if my experience just reflects the very difficult current state of Python packaging and distribution.
Suppose users download your package onto their system. Most will try to install it "naively", using something like:
$ python setup.py install
Since if you google instructions on installing Python packages, this is usually what comes up. This will fail for the vast majority of users, since most do not have root access on their Unix/Linux servers. With more searching, they will discover the "--prefix" option and try:
$ python setup.py install --prefix=/some/local/dir
Since the users are not aware of the intricacies of Python packaging, they will pick an arbitrary directory as an argument to --prefix, e.g. "~/software/mypackage/". It will not be a cleanly curated directory where all other Python packages reside, because again, most users are not aware of these details. If they install another package "myotherpackage", they might pass it "~/software/myotherpackage", and you can imagine how down the road this will lead to frustrating hacking of PYTHONPATH and other complications.
Continuing with the installation process, the call to "setup.py install" with "--prefix" will also fail once users try to use the package, even though it appeared to have been installed correctly, since one of the dependencies might be missing (e.g. pandas, scipy or numpy) and a package manager is not used. They will try to install these packages individually. Even if successful, the packages will inevitably not be in the PYTHONPATH due to the non-standard directories given to "--prefix" and patient users will dabble with modifications of their PYTHONPATH to get the dependencies to be visible.
At this stage, users might be told by a Python savvy friend that they should use a package manager like "easy_install", the mainstream manager, to install the software and have dependencies taken care of. After installing "easy_install", which might be difficult, they will try:
$ easy_install setup.py
This too will fail, since users again do not typically have permission to install software globally on production Unix servers. With more reading, they will learn about the "--user" option, and try:
$ easy_install setup.py --user
They will get the error:
usage: easy_install [options] requirement_or_url ...
or: easy_install --help
error: option --user not recognized
They will be extremely puzzled why their easy_install does not have the --user option where there are clearly pages online describing the option. They might try to upgrade their easy_install to the latest version and find that it still fails.
If they continue and consult a Python packaging expert, they will discover that there are two versions of easy_install, both named "easy_install" so as to maximize confusion, but one part of "distribute" and the other part of "setuptools". It happens to be that only the "easy_install" of "distribute" supports "--user" and the vast majority of servers/sys admins install "setuptools"'s easy_install and so local installation will not be possible. Keep in mind that these distinctions between "distribute" and "setuptools" are meaningless and hard to understand for people who are not experts in Python package management.
At this point, I would have lost 90% of even the most determined, savvy and patient users who try to install my software package -- and rightfully so! They wanted to install a piece of software that happened to be written in Python, not to become experts in state of the art Python package distribution, and this is far too confusing and complex. They will give up and be frustrated at the time wasted.
The tiny minority of users who continue on and ask more Python experts will be told that they ought to use pip/virtualenv instead of easy_install. Installing pip and virtualenv and figuring out how these tools work and how they are different from the conventional "python setup.py" or "easy_install" calls is in itself time consuming and difficult, and again too much to ask from users who just wanted to install a simple piece of Python software and use it. Even those who pursue this path will be confused as to whether whatever dependencies they installed with easy_install or setup.py install --prefix are still usable with pip/virtualenv or if everything needs to be reinstalled from scratch.
This problem is exacerbated if one or more of the packages in question depends on installing a different version of Python than the one that is the default. The difficulty of ensuring that your Python package manger is using the Python version you want it to, and that the required dependencies are installed in the relevant Python 2.x directory and not Python 2.y, will be so endlessly frustrating to users that they will certainly give up at that stage.
Is there a simpler way to install Python software that doesn't require users to delve into all of these technical details of Python packages, paths and locations? For example, I am not a big Java user, but I do use some Java tools occasionally, and don't recall ever having to worry about X and Y dependencies of the Java software I was installing, and I have no clue how Java package managing works (and I'm happy that I don't -- I just wanted to use a tool that happened to be written in Java.) My recollection is that if you download a Jar, you just get it and it tends to work.
Is there an equivalent for Python? A way to distribute software in a way that doesn't depend on users having to chase down all these dependencies and versions? A way to perhaps compile all the relevant packages into something self-contained that can just be downloaded and used as a binary?
I would like to emphasize that this frustration happens even with the narrow goal of distributing a package to savvy Unix users, which makes the problem simpler by not worrying about cross platform issues, etc. I assume that the users are Unix savvy, and might even know Python, but just aren't aware (and don't want to be made aware) about the ins and outs of Python packaging and the myriad of internal complications/rivalries of different package managers. A disturbing feature of this issue is that it happens even when all of your Python package dependencies are well-known, well-written and well-maintained Pypi-available packages like Pandas, Scipy and Numpy. It's not like I was relying on some obscure dependencies that are not properly formed packages: rather, I was using the most mainstream packages that many might rely on.
Any help or advice on this will be greatly appreciated. I think Python is a great language with great libraries, but I find it virtually impossible to distribute the software I write in it (once it has dependencies) in a way that is easy for people to install locally and just run. I would like to clarify that the software I'm writing is not a Python library for programmatic use, but software that has executable scripts that users run as individual programs. Thanks.

We also develop software projects that depend on numpy, scipy and other PyPI packages. Hands down, the best tool currently available out there for managing remote installations is zc.buildout. It is very easy to use. You download a bootstrapping script from their website and distribute that with your package. You write a "local deployment" file, called normally buildout.cfg, that explains how to install the package locally. You ship both the bootstrap.py file and buildout.cfg with your package - we use the MANIFEST.in file in our python packages to force the embedding of these two files with the zip or tar balls distributed by PyPI. When the user unpackages it, it should execute two commands:
$ python bootstrap.py # this will download zc.buildout and setuptools
$ ./bin/buildout # this will build and **locally** install your package + deps
The package is compiled and all dependencies are installed locally, which means that the user installing your package doesn't even need root privileges, which is an added feature. The scripts are (normally) placed under ./bin, so the user can just execute them after that. zc.buildout uses setuptools for interaction with PyPI so everything you expect works out of the box.
You can extend zc.buildout quite easily if all that power is not enough - you create the so-called "recipes" that can help the user to create extra configuration files, download other stuff from the net or instantiate custom programs. zc.buildout website contains a video tutorial that explains in details how to use buildout and how to extend it. Our project Bob makes extensive use of buildout for distributing packages for scientific usage. If you would like, please visit the following page that contains detailed instructions for our developers on how they can setup their python packages so other people can build and install them locally using zc.buildout.

We're currently working to make it easier for users to get started installing Python software in a platform independent manner (in particular see https://python-packaging-user-guide.readthedocs.org/en/latest/future.html and http://www.python.org/dev/peps/pep-0453/)
For right now, the problem with two competing versions of easy_install has been resolved, with the competing fork "distribute" being merged backing into the setuptools main line of development.
The best currently available advice on cross-platform distribution and installation of Python software is captured here: https://packaging.python.org/

How to install python modules on a per project basis, rather than system wide?

I am trying to define a process for migrating django projects from my development server to my production server using git, and it's driving me crazy that distutils installs python modules system-wide. I've read the documentation but unless I'm missing something it seems to be mostly about how to change the installation directory. I need to be able to use different versions of the same module in different projects running on the same server, and deploy projects from git without having to download and install dependencies.
tl;dr: I need to know how to install python modules, using distutils, into my project's source tree for version control without compromising other projects using different versions of the same module.
I'm new to python, so I apologize in advance if this is common knowledge.

Besides the already mentioned virtualenv which is a good option but has the potential drawback of requiring a third-party module, Distutils itself has options to install modules into arbitrary locations. In particular, there is the home scheme which allows you to "build and maintain a personal stash of Python modules". It's described in the Python documentation set here.

Perhaps you are looking for virtualenv. It will allow you to install packages into a separate virtual Python "root".

for completeness sake, virtualenvwrapper makes every day work with virtualenv a lot quicker and simpler once you are working on multiple projects and/or on multiple development platforms at the same time.

If you are looking something akin to npm or yarn of the JavaScript world or composer of the PHP world, then you may want to look at pipenv (not to be confused with pip). Here's a guide to get you started.
Alternatively there is also Poetry, which some people say is even better, but I haven't used it yet.

How can I install specialized environments for different Perl applications?

Is there anything equivalent or close in terms of functionality to Python's virtualenv, but for Perl?
I've done some development in Python and a possibility of having non-system versions of modules installed in a separate environment without creating any mess is a huge advantage. Now I have to work on a new project in Perl, and I'm looking for something like virtualenv, but for Perl. Can you suggest any Perl equivalent or replacement for python's virtualenv?
I'm trying to setup X different sets of non-system Perl packages for Y different applications to be deployed. Even worse, these applications may require different versions of the same package, so each of them may require to be installed in a separate module/library environment. You may want to do this manually for X < Y < 3. But you should not do this manually for 10 > Y > X.
Ideally what I'm looking should work like this:
perl virtualenv.pl my_environment
. my_environment/bin/activate
wget http://.../foo-0.1.tar.gz
tar -xzf foo-0.1.tar.gz ; cd foo-0.1
perl Makefile.pl
make install # <-- package foo-0.1 gets installed inside my_environment
perl -MCPAN -e 'install Bar' # <-- now package Bar with all its deps gets installed inside my_environment

There's a tool called local::lib that wraps up all of the work for you, much like virtualenv. It will:
Set up #INC in the process where it's used.
Set PERL5LIB and other such things for child processes.
Set the right variables to convince CPAN, MakeMaker, Module::Build, etc. to install libraries and store configuration in a local directory.
Set PATH so that installed binaries can be found.
Print environment variables to stdout when used from the commandline so that you can put eval $(perl -Mlocal::lib)
in your .profile and then mostly forget about it.

I've used schroot for this purpose. It is a bit heavier than virtualenv but you can be sure that nothing will leak in that shouldn't.
Schroot manages a chroot environment for you, but mounts your home directory in the chroot so it appears like a normal shell session, just using the binaries and libraries in the chroot.
I think it may be debian/ubuntu only though.
After setting up the schroot, your script above would look like
schroot -c my_perl_dev
wget ...
See http://www.debian-administration.org/articles/566 for an interesting article about it

Also checkout perl-virtualenv , this seems to be wrapper around local::lib as suggested by Hobbs, but creates a bin/activate and bin/deactivate so you can use it just like the python tool.
I've been using it quite successfully for a month or so without realising it wasn't as standards as perhaps it should be.
It makes it lot easier to set up a working virtualenv for perl as while local:lib will tell you what variables you need to set, etc. perl-virtualenv creates an activate script which does it for you.

While investigating, I discovered this and some other pages (this one is too old and misses new technologies, this reddit post is a slight misdirect).
The problem with perlbrew and plenv is that they seem to be replacements for pyenv, not virtualenv. As noted here pyenv is for managing python versions, virtualenv is for managing per-project module versions. So, yes, in some ways similar to local::lib, but with better usability.
I've not seen a proper answer to this question yet, but from what I've read, it looks like the best solution is something along the lines of:
Perl version management: plenv/perlbrew (with most people
favouring the more contemporary bash based plenv over the perl based
perlbrew from what I can see)
Module version management: Carton
Module installation: cpan (well, cpanminus anyway, ymmv)
To be honest, this is not an ideal set up, although I'm still learning, so it may yet be superior. It just doesn't feel right. It certainly isn't a like for like replacement for virtualenv.
There are a couple of posts I've found saying "it is possible" but neither has gone any further.

I am not sure whether this is the same as that virtualenv thing you are talking about, but have a look for the #INC special variable in the perlvar manpage.

Programs can modify what directories they check for libraries uwith use lib. This lib directory can be relative to the current directory. Libraries from these directories will be used before system libraries, as they are placed at the beginning of the #INC array.
I believe cpan can also install libraries to specific directories. Granted, cpan draws from the CPAN site in order to install things, so this may not be the best option.

It looks like you just need to use the INSTALL_BASE configuration for Makefile.PL (or the --install_base option for Build.PL)? What exactly do you need the solution to do for you? It sounds like you just need to get the installed module in the right place. You've presented your problem as an XY Problem by specifying what you think is the solution is rather than letting us help you with your task.
See How do I keep my own module/library directory? in perlfaq8, for instance.
If you are downloading modules from CPAN, the latest cpan command (in App::Cpan) has a -j switch to allow you to choose alternate CPAN.pm configuration files. In those configuration files you can set the CPAN.pm options to install wherever you like.
Based on your clarification, it sounds like local::lib might work for you in single, simple cases, but I do this for industrial strength deployments where I set up custom, private CPANs per application, and install directly from those custom CPANs. See my MyCPAN::App::DPAN module, for instance. From that, I use custom CPAN.pm configs that analyze their environment and set the proper values to each application can install everything in a directory just for that application.
You might also consider distributing your application as a Task::. You install it like any other Perl module, but dependencies share that same setup (i.e. INSTALL_BASE).

What I do is start the CPAN shell (cpan) and install my own Perl 5.10 from it
(I believe the command is install perl-5.10). This will ask for various configuration
settings; I make sure to make it point to paths under /usr/local
(or some other installation location other than the default).
Then I put its resulting location in my executable $PATH before the standard perl, and use its CPAN shell to install the modules I need (usually, a lot).
My Perl scripts all start with the line
#!/usr/bin/env perl
Never had a problem with this approach.

Are there any other good alternatives to zc.buildout and/or virtualenv for installing non-python dependencies?

I am a member of a team that is about to launch a beta of a python (Django specifically) based web site and accompanying suite of backend tools. The team itself has doubled in size from 2 to 4 over the past few weeks and we expect continued growth for the next couple of months at least. One issue that has started to plague us is getting everyone up to speed in terms of getting their development environment configured and having all the right eggs installed, etc.
I'm looking for ways to simplify this process and make it less error prone. Both zc.buildout and virtualenv look like they would be good tools for addressing this problem but both seem to concentrate primarily on the python-specific issues. We have a couple of small subprojects in other languages (Java and Ruby specifically) as well as numerous python extensions that have to be compiled natively (lxml, MySQL drivers, etc). In fact, one of the biggest thorns in our side has been getting some of these extensions compiled against appropriate versions of the shared libraries so as to avoid segfaults, malloc errors and all sorts of similar issues. It doesn't help that out of 4 people we have 4 different development environments -- 1 leopard on ppc, 1 leopard on intel, 1 ubuntu and 1 windows.
Ultimately what would be ideal would be something that works roughly like this, from the dos/unix prompt:
$ git clone [repository url]
...
$ python setup-env.py
...
that then does what zc.buildout/virtualenv does (copy/symlink the python interpreter, provide a clean space to install eggs) then installs all required eggs, including installing any native shared library dependencies, installs the ruby project, the java project, etc.
Obviously this would be useful for both getting development environments up as well as deploying on staging/production servers.
Ideally I would like for the tool that accomplishes this to be written in/extensible via python, since that is (and always will be) the lingua franca of our team, but I am open to solutions in other languages.
So, my question then is: does anyone have any suggestions for better alternatives or any experiences they can share using one of these solutions to handle larger/broader install bases?

Setuptools may be capable of more of what you're looking for than you realize -- if you need a custom version of lxml to work correctly on MacOS X, for instance, you can put a URL to an appropriate egg inside your setup.py and have setuptools download and install that inside your developers' environments as necessary; it also can be told to download and install a specific version of a dependency from revision control.
That said, I'd lean towards using a scriptably generated virtual environment. It's pretty straightforward to build a kickstart file which installs whichever packages you depend on and then boot virtual machines (or production hardware!) against it, with puppet or similar software doing other administration (adding users, setting up services [where's your database come from?], etc). This comes in particularly handy when your production environment includes multiple machines -- just script the generation of multiple VMs within their handy little sandboxed subnet (I use libvirt+kvm for this; while kvm isn't available on all the platforms you have developers working on, qemu certainly is, or you can do as I do and have a small number of beefy VM hosts shared by multiple developers).
This gets you out of the headaches of supporting N platforms -- you only have a single virtual platform to support -- and means that your deployment process, as defined by the kickstart file and puppet code used for setup, is source-controlled and run through your QA and review processes just like everything else.

I always create a develop.py file at the top level of the project, and have also a packages directory with all of the .tar.gz files from PyPI that I want to install, and also included an unpacked copy of virtualenv that is ready to run right from that file. All of this goes into version control. Every developer can simply check out the trunk, run develop.py, and a few moments later will have a virtual environment ready to use that includes all of our dependencies at exactly the versions the other developers are using. And it works even if PyPI is down, which is very helpful at this point in that service's history.

Basically, you're looking for a cross-platform software/package installer (on the lines of apt-get/yum/etc.) I'm not sure something like that exists?
An alternative might be specifying the list of packages that need to be installed via the OS-specific package management system such as Fink or DarwinPorts for Mac OS X and having a script that sets up the build environment for the in-house code?

I have continued to research this issue since I posted the question. It looks like there are some attempts to address some of the needs I outlined, e.g. Minitage and Puppet which take different approaches but both may accomplish what I want -- although Minitage does not explicitly state that it supports Windows. Lacking any better options I will try to make either one of these or just extensive customized use of zc.buildout work for our needs, but I still feel like there must be better options out there.

You might consider creating virtual machine appliances with whatever production OS you are running, and all of the software dependencies pre-built. Code can be edited either remotely, or with a shared folder. It worked pretty well for me in a past life that had a fairly complicated development environment.

Puppet doesn't (easily) support the Win32 world either. If you're looking for a deployment mechanism and not just a "dev setup" tool, you might consider looking into ControlTier (http://open.controltier.com/) which has a open-source cross-platform solution.
Beyond that you're looking at "enterprise" software such as BladeLogic or OpsWare and typically an outrageous pricetag for the functionality offered (my opinion, obviously).
A lot of folks have been aggressively using a combination of Puppet and Capistrano (even non-rails developers) for deployment automation tools to pretty good effect. Downside, again, is that it's expecting a somewhat homogeneous environment.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.