set up new development environment with virtualenv and pip in one command

set up new development environment with virtualenv and pip in one command - python

We want to automate the setup of new development environments.
There is a project. We use the name scheme coreapp_c1 (short for customer1)
The project is small. It contains configuration (settings.py (django)) and a requirements.txt
We use these steps:
create virtualenv
pip install -e git... #egg=coreapp_c1
pip install -r src/coreapp-c1/requirements.txt
Unfortunately there are some other steps which need to be done:
Create a postgres-DB, insert an init-script, create a rabbitMQ queue, ... which can't be done as normal user.
What is the most phytonic way to do the stuff which needs to be done as root?

Automating the setting-up of development environments in Python as well as other languages is getting a lot of attention recently. As a result, there is a variety of solutions available depending on your requirements.
Generally, there is a trade-off involved with these solutions: The more isolation a given solution provides, the more overhead it has. Following is a non-exclusive list of stuff that I have tried and found useful. This list is in ascending order of isolation and therefore overhead:
1) Fabric
"It provides a basic suite of operations for executing local or remote shell commands
(normally or via sudo) and uploading/downloading files, as well as auxiliary
functionality such as prompting the running user for input, or aborting execution."
2) cookiecutter
"A command-line utility that creates projects from cookiecutters (project templates).
E.g. Python package projects, jQuery plugin projects."
3) docker
"An open source project to pack, ship and run any application as a lightweight container."
4) Vagrant
"Create and configure lightweight, reproducible, and portable development environments."
I have included Vagrant for completeness. It is essentially a tool for programmatic creation of virtual machines and therefore lies at the bottom of the list. As you have mentioned, if you are not too excited about the overhead of the VM with its own OS and API stack etc. you should go for one of the first three options - all of which are Pythonic to some extent in being DRY etc.
Personally, based on the limited perspective that I can glean from your problem statement, I would look at cookiecutter.

Vagrant is what you need.
Once created, developlemt environment is easily distribute among developers.
Another suggestion that I used for a long time - create binary packages rpm/deb for your applications.
There are pros and cons for both approaches and moreover they can be applied in conjunction.
In my opinion there is no Pythonic way to do the stuff. Fabric could be handy but it have another application area.

Related

How to periodically update the dependencies in a production Python environment?

I am responsible for maintaining a Python environment that is frequently deployed into production. To ensure that the library's dependencies work well together, I use pip to install all the required Python packages, following best practices I pip freeze all of my dependencies into a requirements.txt file.
The benefit of this approach is that I have a very stable environment that is unlikely to break due to a package issue. However, the drawback is that my environment is static while these packages are constantly releasing new versions that could improve performance and most importantly fix vulnerabilities.
I am wondering what the industry standard is for keeping packages up to date in an easy way, perhaps even in an automated way that detects new updates and tests for any potential issues. For instance, apt has a simple apt-get update and apt-get upgrade command to keep your packages constantly updated and be aware of it.
The obvious answer is to just update all packages to the latest official version. My concern is that this can potentially break some dependencies between packages and cause the environment to break.
Can anyone suggest a similar solution for keeping Python packages up to date while ensuring stability?

Well, I can't say what the industry standard is but one of the ways I could think of to update the packages periodically would be as such:
Create a requirements.txt file that contains the packages to update.
Create a python script that contains bash commands to update the packages in the requirements.txt.
Can use pip freeze to update the requirements.txt
Create a cron job or use the python schedule library to trigger a periodic update
You can refer, as an example for
cron job on mac, to: https://www.jcchouinard.com/python-automation-with-cron-on-mac/
schedule in python, to: https://www.geeksforgeeks.org/python-schedule-library/
The above can solve the update issue but may not solve the compatibility issue.

The importance is not so much current but to not have vulnerabilities that may harm customers and your company in any way.
Industristandard is to use Snyk, Safety, Deptrack, Sonatype Lifecycle etc to monitor package/module vulnerabilities. And a Continuous-Integration(CI) system that support this like docker containers in kubernetes.
Deploy the exact version you are running in prod and run vulnerability scans.
(All CI systems make this easy)
(it is important to do on running code because the requirements.txt most likely never show all installed that will be installed)
The vulnerability-scanners (all except owast deptrack) costs money if you like to be current, but they in its simplest form just give you a more refined answer to:
https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=python
You can make your own system based on the safety source
https://github.com/pyupio/safety and the CVE list above and make it fail if any of the versions are found in your code.
When it comes to requirements.txt listings we always use >= in favor of == and deploy every night in test-systems and run test-on them, if they fail we manually have to go though errors and check and pin the releases.

Generally you build Docker images or similar. (There are other VM solutions, but Docker seems to be the most common/popular.) Have apt-get update, apt-get install and pip install -r requirements.txt as part of the Dockerfile, and integrate it into your CI/CD pipeline using GitHub or similar
That builds the image from scratch using a well-defined process. You then run the VM, including all automated unit and integration tests and flag any problems for the Devs. It's then their job to figure out whether they broke something, whether they need a particular version of some dependency, etc.
Once the automated tests pass, the image binary (including its "frozen" dependencies) gets pushed to a place where humans can interact with it and look for anything weird. Typically there are several such environments variously called things like "test", "integration", and "production". The names, numbers, and details vary from place to place - but that triplet is fairly common. They're typically defined as a sequence and gating between them is done manually. This might be a typical spec:
Development - This is the first environment where code gets pushed after the automated testing passes. Failures and weird idiosyncrasies are common. If local developers need to replicate a problem or experiment with a patch that can't be handled on their local machines, they put the code here. Images graduate from Development when the basic manual and automated "smoke tests" all pass and the Dev team agrees the code is stable.
Integration - This environment belongs to QA and dedicated software testers. They may have their own set of scripts to run, which may be automated, manual or ad hoc. This is also a good environment for load testing, or for internal red team attacks on the security, or for testing / exploration by internal trusted users who are willing to risk the occasional crash in order to exercise the newest up-and-coming features. QA flags any problems or oddities and sends issues back to the devs. Images graduate from Integration when QA agrees the code is production-quality.
Production - This environment is running somewhat older code since everything has to go through Dev & QA before reaching it, but it's likely to be quite robust. The binary images here are the only ones that are accessible to the outside world, and the only ones carefully monitored by the SOC.
Since VM images are only compiled once, just prior to sending the code to Dev, all three environments should have the same issues. The problem then becomes getting code through the environments in sequence and doing all the necessary checks in each before the security patches become too outdated... or before the customers get tired of waiting for the new features / bug-fixes.
There are a lot of details in the process, and many questions & answers on StackOverflow's sister site for networking administration

How to have python libraries already installed in python project?

I am working on a python project that requires a few libraries. This project will further be shared with other people.
The problem I have is that I can't use the usual pip install 'library' as part of my code because the project could be shared with offline computers and the work proxy could block the download.
So what I first thought of was installing .whl files and running pip install 'my_file.whl' but this is limited since some .whl files work on some computers but not on others, so this couldn't be the solution of my problem.
I tried sharing my project with another project and i had an error with a .whl file working on one computer but not the other.
What I am looking for is to have all the libraries I need to be already downloaded before sharing my project. So that when the project is shared, the peers can launch it without needing to download the libraries.
Is this possible or is there something else that can solve my problem ?

There are different approaches to the issue here, depending on what the constraints are:
1. Defined Online Dependencies
It is a good practice to define the dependencies of your project (not only when shared). Python offers different methods for this.
In this scenario every developer has access to a pypi repository via the network. Usually the official main mirrors (i.e. via internet). New packages need to be pulled individually from here, whenever there are changes.
Repository (internet) access is only needed when pulling new packages.
Below the most common ones:
1.1 requirements.txt
The requirements.txt is a plain text list of required packages and versions, e.g.
# requirements.txt
matplotlib==3.6.2
numpy==1.23.5
scipy==1.9.3
When you check this in along with your source code, users can freely decide how to install it. The mosty simple (and most convoluted way) is to install it in the base python environment via
pip install -r requirements.txt
You can even automatically generate such a file, if you lost track with pipreqs. The result is usually very good. However, a manual cleanup afterwards is recommended.
Benefits:
Package dependency is clear
Installation is a one line task
Downsides:
Possible conflicts with multiple projects
Not sure that everyone has the exact same version if flexibility is allowed (default)
1.2 Pipenv
There is a nice and almost complete Answer to Pipenv. Also the Pipenv documentation itself is very good.
In a nutshell: Pipenv allows you to have virtual environments. Thus, version conflicts from different projects are gone for good. Also, the Pipfile used to define such an environment allows seperation of production and development dependencies.
Users now only need to run the following commands in the folder with the source code:
pip install pipenv # only needed first time
pipenv install
And then, to activate the virtual environment:
pipenv shell
Benefits:
Seperation between projects
Seperation of development/testing and production packages
Everyone uses the exact same version of the packages
Configuration is flexible but easy
Downsides:
Users need to activate the environment
1.3 conda environment
If you are using anaconda, a conda environment definition can be also shared as a configuration file. See this SO answer for details.
This scenario is as the pipenv one, but with anaconda as package manager. It is recommended not to mix pip and conda.
1.4 setup.py
When you are anyway implementing a library, you want to have a look on how to configure the dependencies via the setup.py file.
2. Defined local dependencies
In this scenario the developpers do not have access to the internet. (E.g. they are "air-gapped" in a special network where they cannot communicate to the outside world. In this case all the scenarios from 1. can still be used. But now we need to setup our own mirror/proxy. There are good guides (and even comlplete of the shelf software) out there, depending on the scenario (above) you want to use. Examples are:
Local Pypi mirror [Commercial solution]
Anaconda behind company proxy
Benefits:
Users don't need internet access
Packages on the local proxy can be trusted (cannot be corrupted / deleted anymore)
The clean and flexible scenarios from above can be used for setup
Downsides:
Network connection to the proxy is still required
Maintenance of the proxy is extra effort
3. Turn key environments
Last, but not least, there are solutions to share the complete and installed environment between users/computers.
3.1 Copy virtual-env folders
If (and only if) all users (are forced to) use an identical setup (OS, install paths, uses paths, libraries, LOCALS, ...) then one can copy the virtual environments for pipenv (1.2) or conda (1.3) between PCs.
These "pre-compiled" environments are very fragile, as a sall change can cause the setup to malfunction. So this is really not recommended.
Benefits:
Can be shared between users without network (e.g. USB stick)
Downsides:
Very fragile
3.2 Virtualisation
The cleanest way to support this is some kind of virtualisation technique (virtual machine, docker container, etc.).
Install python and the dependencies needed and share the complete container.
Benefits:
Users can just use the provided container
Downsides:
Complex setup
Complex maintenance
Virtualisation layer needed
Code and environment may become convoluted
Note: This answer is compiled from the summary of (mostly my) comments

Is there a good reason for setting up virtualenv for python in Docker containers?

Almost all python tutorials suggest that virutalenv be setup as step one to maintain consistency . In working with Docker containers, why or why not should this standard be maintained?

If you intend to run only one version on the container and it is the container's system version, there's no technical reason to use virtualenv in a container. But there could still be non-technical reasons. For example, if your team is used to finding python libraries in ~/some-env, or understands the virtualenv structure better than the container's libs, then you may want to keep using virtualenv anyway.
On the "cons" side, virtualenv on top of an existing system python may make your images slightly larger, too.

When using docker it makes sense to adopt the microservices concept. With microservices each microservice is aligned with a specific business function, and only defines the operations necessary to that business function. This means that each application runs in one or more separate docker images with their specific dependencies (python modules). This makes the use of virtualenv unnecessary.

Tool (or combination of tools) for reproducible environments in Python

I used to be a java developer and we used tools like ant or maven to manage our development/testing/UAT environments in a standardized way. This allowed us to handle library dependencies, setting OS variables, compiling, deploying, running unit tests, and all the required tasks. Also, the scripts generated guaranteed that all the environments were almost equally configured, and all the task were performed in the same way by all the members of the team.
I'm starting to work in Python now and I'd like your advice in which tools should I use to accomplish the same as described for java.

virtualenv to create a contained virtual environment (prevent different versions of Python or Python packages from stomping on each other). There is increasing buzz from people moving to this tool. The author is the same as the older working-env.py mentioned by Aaron.
pip to install packages inside a virtualenv. The traditional is easy_install as answered by S. Lott, but pip works better with virtualenv. easy_install still has features not found in pip though.
scons as a build tool, although you won't need this if you stay purely Python.
Fabric paste, or paver for deployment.
buildbot for continuous integration.
Bazaar, mercurial, or git for version control.
Nose as an extension for unit testing.
PyFit for FIT testing.

I also work with both java and python.
For python development the maven equivalent is setuptools (http://peak.telecommunity.com/DevCenter/setuptools). For web application development I use this in combination with paster (http://pythonpaste.org/) for the deployment process

Other than easy_install?
For our Linux servers, we use easy_install and yum.
For our Windows development laptops, we use easy_install and a few MSI's for some projects.
Most of the Python libraries we use are source-only, so we can use the same distribution on all boxes. If we could have a network shared device, we'd put them all there. Sadly, our infrastructure is kind of scattered, so we have to either move .TAR files around or redo the installs to rebuild the environments.
In a few cases (e.g., PIL), we have to recompile and check the version numbers.

You will want easy_setup to get the eggs (roughly what Maven calls an artifact).
For setting up your environment, have a look at working-env.py
Python is not compiled but you can put all files for a project in an egg. This is done with setuptools
For CI, check this answer.

We would be remiss not to also mention Paver, which was created by Kevin Dangoor of TurboGears fame. The project is still in alpha, but it appears very promising. A snippet from the project page:
Paver is a Python-based build/distribution/deployment scripting tool along the lines of Make or Rake. What makes Paver unique is its integration with commonly used Python libraries. Common tasks that were easy before remain easy. More importantly, dealing with your applications specific needs and requirements is now much easier.

I do exactly this with a combination of setuptools and Hudson. I know Hudson is a java app, but it can run Python stuff just fine.

You might want to check our Devenv. It allows you to standardize the build environments for development, QA and UAT. It's free as in "free beer".
HTH

Are there any other good alternatives to zc.buildout and/or virtualenv for installing non-python dependencies?

I am a member of a team that is about to launch a beta of a python (Django specifically) based web site and accompanying suite of backend tools. The team itself has doubled in size from 2 to 4 over the past few weeks and we expect continued growth for the next couple of months at least. One issue that has started to plague us is getting everyone up to speed in terms of getting their development environment configured and having all the right eggs installed, etc.
I'm looking for ways to simplify this process and make it less error prone. Both zc.buildout and virtualenv look like they would be good tools for addressing this problem but both seem to concentrate primarily on the python-specific issues. We have a couple of small subprojects in other languages (Java and Ruby specifically) as well as numerous python extensions that have to be compiled natively (lxml, MySQL drivers, etc). In fact, one of the biggest thorns in our side has been getting some of these extensions compiled against appropriate versions of the shared libraries so as to avoid segfaults, malloc errors and all sorts of similar issues. It doesn't help that out of 4 people we have 4 different development environments -- 1 leopard on ppc, 1 leopard on intel, 1 ubuntu and 1 windows.
Ultimately what would be ideal would be something that works roughly like this, from the dos/unix prompt:
$ git clone [repository url]
...
$ python setup-env.py
...
that then does what zc.buildout/virtualenv does (copy/symlink the python interpreter, provide a clean space to install eggs) then installs all required eggs, including installing any native shared library dependencies, installs the ruby project, the java project, etc.
Obviously this would be useful for both getting development environments up as well as deploying on staging/production servers.
Ideally I would like for the tool that accomplishes this to be written in/extensible via python, since that is (and always will be) the lingua franca of our team, but I am open to solutions in other languages.
So, my question then is: does anyone have any suggestions for better alternatives or any experiences they can share using one of these solutions to handle larger/broader install bases?

Setuptools may be capable of more of what you're looking for than you realize -- if you need a custom version of lxml to work correctly on MacOS X, for instance, you can put a URL to an appropriate egg inside your setup.py and have setuptools download and install that inside your developers' environments as necessary; it also can be told to download and install a specific version of a dependency from revision control.
That said, I'd lean towards using a scriptably generated virtual environment. It's pretty straightforward to build a kickstart file which installs whichever packages you depend on and then boot virtual machines (or production hardware!) against it, with puppet or similar software doing other administration (adding users, setting up services [where's your database come from?], etc). This comes in particularly handy when your production environment includes multiple machines -- just script the generation of multiple VMs within their handy little sandboxed subnet (I use libvirt+kvm for this; while kvm isn't available on all the platforms you have developers working on, qemu certainly is, or you can do as I do and have a small number of beefy VM hosts shared by multiple developers).
This gets you out of the headaches of supporting N platforms -- you only have a single virtual platform to support -- and means that your deployment process, as defined by the kickstart file and puppet code used for setup, is source-controlled and run through your QA and review processes just like everything else.

I always create a develop.py file at the top level of the project, and have also a packages directory with all of the .tar.gz files from PyPI that I want to install, and also included an unpacked copy of virtualenv that is ready to run right from that file. All of this goes into version control. Every developer can simply check out the trunk, run develop.py, and a few moments later will have a virtual environment ready to use that includes all of our dependencies at exactly the versions the other developers are using. And it works even if PyPI is down, which is very helpful at this point in that service's history.

Basically, you're looking for a cross-platform software/package installer (on the lines of apt-get/yum/etc.) I'm not sure something like that exists?
An alternative might be specifying the list of packages that need to be installed via the OS-specific package management system such as Fink or DarwinPorts for Mac OS X and having a script that sets up the build environment for the in-house code?

I have continued to research this issue since I posted the question. It looks like there are some attempts to address some of the needs I outlined, e.g. Minitage and Puppet which take different approaches but both may accomplish what I want -- although Minitage does not explicitly state that it supports Windows. Lacking any better options I will try to make either one of these or just extensive customized use of zc.buildout work for our needs, but I still feel like there must be better options out there.

You might consider creating virtual machine appliances with whatever production OS you are running, and all of the software dependencies pre-built. Code can be edited either remotely, or with a shared folder. It worked pretty well for me in a past life that had a fairly complicated development environment.

Puppet doesn't (easily) support the Win32 world either. If you're looking for a deployment mechanism and not just a "dev setup" tool, you might consider looking into ControlTier (http://open.controltier.com/) which has a open-source cross-platform solution.
Beyond that you're looking at "enterprise" software such as BladeLogic or OpsWare and typically an outrageous pricetag for the functionality offered (my opinion, obviously).
A lot of folks have been aggressively using a combination of Puppet and Capistrano (even non-rails developers) for deployment automation tools to pretty good effect. Downside, again, is that it's expecting a somewhat homogeneous environment.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.