Is it smart to "clean out" my Python environment?

Is it smart to "clean out" my Python environment? - python

I am a Python beginner and am running Python on the Mac. I started to read up on pipenv and virtualenv and checked how my system was configured. Turns out that i have a Godzillian amount of packages installed under my systems python environment. AFAIU from pipenv and virtualenv this is exactly what you don't want to happen.
So now i am looking to correct this and apply best practices by hosting all dependencies in project folders but before to do this my question: Is it smart to "clean out" the system environment and host everything in project folders? Or should i keep the system environment as is it and just start adding dependencies into newly created project folders? I don't care about disk space, i do care about dependency conflicts.

You should absolutely do not want to clear out your system environment. The system version of python will be used by your operating system and if you damage it, who knows what havoc you'll cause.
Ideally, you want to leave your system's version of python alone, and then have your own, separate version of python which you actually use for development, where you control the packages etc.
A popular way to install a version of python and manage packages is to use anaconda: https://www.anaconda.com/ , so I'd suggest looking into that.

Related

Should I purge python from my machine and reinstall everything through conda and do all my projects in virtual environments?

I'm coming to python from R, where package management is really simple. As I learn python (though a lot of googling), I see recommendations about virtual environments (which I never set up), and conda over pip for package management (I did pip because it seemed easier).
So now I've got a bunch of libraries installed globally, spyder is broken after a routine ubuntu ubdate (this problem, I think, which is solved using conda), and I am contemplating starting from scratch. But I know that lots of other system programs depend on python, so I can't just nuke it and start over like I would with R.
So:
Is my instinct correct? Should I "start fresh" with my python environment?
How do I do this in a way that won't disrupt other processes on my machine?
I know that python 2.7 exists on my machine, and I assume that it does something consequential. I use python 3 for manipulating and analyzing data. I think that this is relevant, but I am not sure how.

Cloning development environment across machines

So I'm doing a bit of Python development work right now, and I was wondering if it was possible to "clone" my entire development environment, specifically the Python interpreter and all the libraries I have installed, to my laptop. I currently use GitHub to store and sync my files across machines, and I use Sublime Text as my main code editor so I can just install it on both machines by hand, but I don't want to have to hunt down and re-install every library and their dependencies on the new machine because I don't remember everything I might've installed and doing it by hand might not get me everything I need.
My first guess would be to just copy/paste the Python folder from my main PC to my laptop, but I have no idea how to synchronize it so that updates and changes made to one side can be brought over to the other without hassle.
How do more experienced programmers/developers handle working on large projects across multiple machines?

What I'd do is keep a virtualenv for each project on each machine and check a requirements.txt file into your Git repository, and do
source /path/to/virtualenv/bin/activate
pip install -r /path/to/project/requirements.txt
each time you add or change a library.

bundling/executing python script + modules to a remote machine

I have looked into other python module distribution questions. My need is a bit different (I think!, I am python newbie+)
I have a bunch of python scripts that I need to execute on remote machines. Here is what the target environment looks like;
The machines will have base python run time installed
I will have a SSH account; I can login or execute commands remotely using ssh
i can copy files (scp) into my home dir
I am NOT allowed to install any thing on the machine; the machines may not even have access to Internet
my scripts may use some 'exotic' python modules -- most likely they won't be present in the target machine
after the audit, my home directory will be nuked from the machine (leave no trace)
So what I like to do is:
copy a directory structure of python scripts + modules to remote machine (say in /home/audituser/scripts). And modules can be copied into /home/audituser/scripts/pythhon_lib)
then execute a script (say /home/audituser/scripts/myscript.py). This script will need to resolve all modules used from 'python_lib' sub directory.
is this possible? or is there a better way of doing this? I guess what I am looking is to 'relocate' the 3rd party modules into the scripts dir.
thanks in advance!

Are the remote machines the same as each other? And, if so, can you set up a development machine that's effectively the same as the remote machines?
If so, virtualenv makes this almost trivial. Create a virtualenv on your dev machine, use the virtualenv copy of pip to install any third-party modules into it, build your script within it, then just copy that entire environment to each remote machine.
There are three things that make it potentially non-trivial:
If the remote machines don't (and can't) have virtualenv installed, you need to do one of the following:
In many cases, just copying a --relocatable environment over just works. See the documentation section on "Making Environments Relocatable".
You can always bundle virtualenv itself, and pip install --user virtualenv (and, if they don't even have pip, a few steps before that) on each machine. This will leave the user account in a permanently-changed state. (But fortunately, your user account is going to be nuked, so who cares?)
You can write your own manual bootstrapping. See the section on "Creating Your Own Bootstrap Scripts".
By default, you get a lot more than you need—the Python executable, the standard library, etc.
If the machines aren't identical, this may not work, or at least might not be as efficient.
Even if they are, you're still often making your bundle orders of magnitude bigger.
See the documentation sections on Using Virtualenv without bin/python, --system-site-packages, and possibly bootstrapping.
If any of the Python modules you're installing also need C libraries (e.g., libxml2 for lxml), virtualenv doesn't help with that. In fact, you will need the C libraries to be almost exactly the same (same path, compatible version).
Three other alternatives:
If your needs are simple enough (or the least-simple parts involve things that virtualenv doesn't help with, like installing libxml2), it may be easier to just bundle the .egg/.tgz/whatever files for third-party modules, and write a script that does a pip install --user and so on for each one, and then you're done.
Just because you don't need a full app-distribution system doesn't mean you can't use one. py2app, py2exe, cx_freeze, etc. aren't all that complicated, especially in simple cases, and having a click-and-go executable to copy around is even easier than having an explicit environment.
zc.buildout is an amazingly flexible and manageable tool that can do the equivalent of any of the three alternatives. The main downside is that there's a much, much steeper learning curve.

You can use virtualenv to create a self-contained environment for your project. This can house your own script, as well as any dependency libraries. Then you can make the env relocatable (--relocatable), and sync it over to the target machine, activate it, and run your scripts.
If these machines do have network access (not internet, but just local network), you can also place the virtualenv on a shared location and activate from there.
It looks something like this:
virtualenv --no-site-packages portable_proj
cd portable_proj/
source bin/activate
# install some deps
pip install xyz
virtualenv --relocatable .
Now portable_proj can be disted to other machines.

Moving a Python environment over to a new OS install

I have reinstalled my operating system (moved from windows XP to Windows 7).
I have reinstalled Python 2.7.
But i had a lot of packages installed in my old environment.
(Django, sciPy, jinja2, matplotlib, numpy, networkx, to name just a view)
I still have my old Python installation lying around on a data partition, so i wondered if i can just copy-paste the old Python library folders onto the new installation?
Or do i need to reinstall every package?
Do the packages keep any information in registry, system variables or similar?
Does it depend on the package?

That's the point where you must be able to layout your project, thus having special tools for that.
Normally, Python packages do not do such wierd things as dealing with registry (unless they are packaged via MSI installer). The problems may start with packages that contain C extensions, so moving to another version of OS or from 32 to 64-bit architecture will require recompiling/rebuilding of those. So, it would be much better to reinstall all packages to new system as written below.
Your demands may vary, but you definitely must choose the way of building your environment. If you don't have and plan to have a large variety of projects you may consider the first approach as follows below, the second approach is more likely for setting up development environment for different projects or for different versions of the same project.
Global environment (your Python installation in your system along with installed packages).
Here you can consider using pip. In this case your project can have requirements file containing all needed packages for your project. Basically, requirements file is a text file containing package names (on PyPI and their versions).
Isolated environment. It can be achieved using special tools or specially organized path.
Here where pip can be gracefully combined with virtualenv. This way is highly recommended by a lot of developers (I must remind that Python 3.3 that will soon be released contains virtualenv as a part of standard library). This approach assumes creating virtual shell with its own instance of Python interpreter and installed packages.
Another popular tool for achieving isolated environment is called buildout. It lays out your project source and dependencies in one path so you achieve the same effect as virtualenv creates. The great advantage of buildout that it's built upon an idea of pluggable recipes (pieces of code implementing different common project deployment tasks) and there are hundreds of stable and reliable recipes over Internet.
Both virtualenv and buildout help you to remove head-ache when installing dependencies and solve the problem of different versions of the same package kept on a single machine.
Choose your destiny...

The short answer to this question is "no", since packages can execute arbitrary code on installation and do whatever the heck they want wherever they want on your system.
Just reinstall all of them.

How to clean up my Python Installation for a fresh start

I'm developing on Snow Leopard and going through the various "how tos" to get the MySQLdb package installed and working (uphill battle). Things are a mess and I'd like to regain confidence with a fresh, clean, as close to factory install of Python 2.6.
What folders should I clean out?
What should I run?
What symbolic links should I destroy or create?

One thing you should not do is try to remove or change any of the Apple-supplied python files or links: they are in /usr/bin and /System/Library/Frameworks/Python.framework. These are part of OS X and managed by Apple. It is fine to clean up any unnecessary packages you have installed for that Python. They are in /Library/Python. If you installed a python.org Python and want to remove it, most of the files are in /Library/Frameworks/Python.framework. See here for complete instructions on how to remove them. And anything you installed into /usr/local is fair game.
Using virtualenvs is a fine idea but it's slightly less important on OS X where the concept of framework builds makes it easier to support multiple Python versions than on some other platforms.
The bigger issue, especially trying to use MySQL with Python, is getting all of the necessary non-Python libraries installed and built properly which is non-trivial given the variety of options available on OS X. For instance, depending on which Python instance and which OS X level running, you may need 32-bit or 64-bit or, possibly, both versions of things like the MySQL client libraries and the MySQLdb adapter. For that reason, I highly recommend using a complete solution from MacPorts. That way you have a good chance of getting all the right components built compatibly - and easily.
If necessary, install the base MacPorts as described on the MacPorts website then:
$ sudo port selfupdate
$ sudo port install py26-mysql
and that will pull in and build everything you need and make it available in /opt/local/bin. There are also plenty of other ports available, for instance:
$ sudo port install py26-virtualenv

Virtualenv might still work for you. Install it, then create virtual python environments with the --no-site-packages option. This won't clean up your base system, but should allow you to develop in pretty good isolation from the base system.

My experience doing development on MacOSX is that the directories for libraries and installation tools are just different enough to cause a lot of problems that you end up having to fix by hand. Eventually, your computer becomes a sketchy wasteland of files and folders duplicated all over the place in an effort to solve these problems. A lot of hand-tuned configuration files, too. The thought of getting my environment set up again from scratch gives me the chills.
Then, when it's time to deploy, you've got to do it over again in reverse (unless you're deploying to an XServe, which is unlikely).
Learn from my mistake: set up a Linux VM and do your development there. At least, run your development "server" there, even if you edit the code files on your Mac.

when doing an "port selfupdate", rsync timesout with rsync.macports.org. There are mirror sites available to use.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.