advice needed: jupyter dashboard, backwards compatibility and deployment

advice needed: jupyter dashboard, backwards compatibility and deployment - python

I (a scientist rather than software developer) have developed an application using jupyter dashboards and would like to make sure, colleagues (no programming skills) can use it in the future. However, jupyter dashboards are incompatible with the newest jupyter versions. We run windows on all of our desktop computers and cannot install software at will but have to use portable apps like anaconda python. For example, the anaconda navigator for example cannot modify the start entry after the installation because it requires admin rights. Furthermore, the firewall blocks conda update
I thought of two solutions:
1) the least complicated (for me)
Provide a .yaml file for the anaconda environment and a tutorial how to
install anaconda and activate the required environment. Problem: the firewall of the company does not allow anaconda to install packages. I can install it, log into my private wlan and cirumvent that but that is not an option for everyone. I would have to deploy somehow the specific anaconda environment offline. I do prefer this solution because it seems to be simpler and least error prone.
2) using docker
There are docker images available. We do have a local PC on which I could install docker and set up everything. Problem. If a new PC is installed, someone else would have to do all that, and honestly I doubt anyone would do that. We have an IT department but that is way out of the box and would require special attention and human ressource as well as a lot of mails and calls to the IT-service line
I would appreciate any advice or ideas how to make sure in the simplest way possible that my work can be used by other scientists with minimal effort.

Out of what you mentioned, I prefer docker approach. It allows you to define a well-controlled environment with a relatively easy setup for new users. Note that Docker has some quirks when running on windows and can sometimes cause weird issues (containers running out of space out of the blue, pathing issues [if running on docker toolbox]) and such.
It is slightly more complicated to setup (than yaml) but as a tradeoff you are much less dependant on every single machine's/network's specifications.
If your workplace has it department and if your team is supposed to share the work, i'd suggest to request them to create a cloud (intranet) jupyter server so your team could have centralized access to the jupyter infrastructure.
In my company we have an even more complex approach, an intranet copy of google colab. That would be the best approach if you can push your it dept this much.
Good luck!!

Related

How to periodically update the dependencies in a production Python environment?

I am responsible for maintaining a Python environment that is frequently deployed into production. To ensure that the library's dependencies work well together, I use pip to install all the required Python packages, following best practices I pip freeze all of my dependencies into a requirements.txt file.
The benefit of this approach is that I have a very stable environment that is unlikely to break due to a package issue. However, the drawback is that my environment is static while these packages are constantly releasing new versions that could improve performance and most importantly fix vulnerabilities.
I am wondering what the industry standard is for keeping packages up to date in an easy way, perhaps even in an automated way that detects new updates and tests for any potential issues. For instance, apt has a simple apt-get update and apt-get upgrade command to keep your packages constantly updated and be aware of it.
The obvious answer is to just update all packages to the latest official version. My concern is that this can potentially break some dependencies between packages and cause the environment to break.
Can anyone suggest a similar solution for keeping Python packages up to date while ensuring stability?

Well, I can't say what the industry standard is but one of the ways I could think of to update the packages periodically would be as such:
Create a requirements.txt file that contains the packages to update.
Create a python script that contains bash commands to update the packages in the requirements.txt.
Can use pip freeze to update the requirements.txt
Create a cron job or use the python schedule library to trigger a periodic update
You can refer, as an example for
cron job on mac, to: https://www.jcchouinard.com/python-automation-with-cron-on-mac/
schedule in python, to: https://www.geeksforgeeks.org/python-schedule-library/
The above can solve the update issue but may not solve the compatibility issue.

The importance is not so much current but to not have vulnerabilities that may harm customers and your company in any way.
Industristandard is to use Snyk, Safety, Deptrack, Sonatype Lifecycle etc to monitor package/module vulnerabilities. And a Continuous-Integration(CI) system that support this like docker containers in kubernetes.
Deploy the exact version you are running in prod and run vulnerability scans.
(All CI systems make this easy)
(it is important to do on running code because the requirements.txt most likely never show all installed that will be installed)
The vulnerability-scanners (all except owast deptrack) costs money if you like to be current, but they in its simplest form just give you a more refined answer to:
https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=python
You can make your own system based on the safety source
https://github.com/pyupio/safety and the CVE list above and make it fail if any of the versions are found in your code.
When it comes to requirements.txt listings we always use >= in favor of == and deploy every night in test-systems and run test-on them, if they fail we manually have to go though errors and check and pin the releases.

Generally you build Docker images or similar. (There are other VM solutions, but Docker seems to be the most common/popular.) Have apt-get update, apt-get install and pip install -r requirements.txt as part of the Dockerfile, and integrate it into your CI/CD pipeline using GitHub or similar
That builds the image from scratch using a well-defined process. You then run the VM, including all automated unit and integration tests and flag any problems for the Devs. It's then their job to figure out whether they broke something, whether they need a particular version of some dependency, etc.
Once the automated tests pass, the image binary (including its "frozen" dependencies) gets pushed to a place where humans can interact with it and look for anything weird. Typically there are several such environments variously called things like "test", "integration", and "production". The names, numbers, and details vary from place to place - but that triplet is fairly common. They're typically defined as a sequence and gating between them is done manually. This might be a typical spec:
Development - This is the first environment where code gets pushed after the automated testing passes. Failures and weird idiosyncrasies are common. If local developers need to replicate a problem or experiment with a patch that can't be handled on their local machines, they put the code here. Images graduate from Development when the basic manual and automated "smoke tests" all pass and the Dev team agrees the code is stable.
Integration - This environment belongs to QA and dedicated software testers. They may have their own set of scripts to run, which may be automated, manual or ad hoc. This is also a good environment for load testing, or for internal red team attacks on the security, or for testing / exploration by internal trusted users who are willing to risk the occasional crash in order to exercise the newest up-and-coming features. QA flags any problems or oddities and sends issues back to the devs. Images graduate from Integration when QA agrees the code is production-quality.
Production - This environment is running somewhat older code since everything has to go through Dev & QA before reaching it, but it's likely to be quite robust. The binary images here are the only ones that are accessible to the outside world, and the only ones carefully monitored by the SOC.
Since VM images are only compiled once, just prior to sending the code to Dev, all three environments should have the same issues. The problem then becomes getting code through the environments in sequence and doing all the necessary checks in each before the security patches become too outdated... or before the customers get tired of waiting for the new features / bug-fixes.
There are a lot of details in the process, and many questions & answers on StackOverflow's sister site for networking administration

How to manage Conda, Python, and dependencies within GitHub?

I am seeking some assistance in software project management. I want to take my project and store it remotely in GitHub, so I can clone the environment and work on it across multiple offices, each with a desktop PC, without moving a laptop back and forth. My knowledge and experience in software project management are a bit old, so as much detail as possible is always welcomed.
I have python code (several classes in corresponding class files), with the main python file executed from a Linux terminal, and the code calls a number of external programs:
Gromacs
Modeller
Several Python packages (new or with specific versions)
I've installed all the external programs/dependencies within a Conda environment.
I am the only person working on the project, but as mentioned above, I will need access to the working and development environment across several machines (not all at once). Is there any way I can "wrap" everything up, conda environment, code, etc., add it to a GitHub repository and clone as and when to each machine?
I would appreciate people's thoughts on how to manage this software project.

Alternative install for python libraries?

I am working on a project in my school which requires me to install a couple python libraries. We have been blocked from using the CMD on the school computers which has proved problematic as this is the only way I know of which allows me to easily install any libraries. I also cannot see the location of the python install, I believe they are on some server which we do not have access too (although I am not confident in the matter). I believe each student is allocated space on the server so I can install some things from the internet (however quite a few things are blocked). Most of my class has overcome this by using their own laptops however this isn't an option for me. If needs be I can try install anything at home to bring into school on a memory stick. The school computers use windows 10 and python 3.6. If there's any more information you may need please say. Thank you in advance.

If you can install PyCharm then it has its terminal/cmd through you can install your python library. But important thing is that you must have permission to do it as you said you are on common server of the school.

Do I really need to use virtualenv with Django?

This may well stink of newbie but...
I'm working on my first Django project, and am reading a lot that using virtualenv is considered Best Practice. I understand that virtualenv sandboxes all my python dependencies but I just don't know if this is necessary if I'm working in sandboxed VM's anyway? I'm developing in Vagrant, and won't be using these VM's for anything else, and I'll be deploying to a VM server that will only have this Django project on it. Is it possible that in the future further Django apps in this project will require different dependencies and so need to be in different virtualenv's? (Not sure if it works like that tbh?)
Am I just showing my inexperience and shortsightedness?

I would always recommend you use a virtualenv as a matter of course. There is almost no overhead in doing so, and it just makes things easier. In conjunction with virtualenvwrapper you can easily just type workon myproject to activate and cd to your virtualenv in one go. You avoid any issues with having to use sudo to install things, as well as any possible version incompatibilities with system-installed packages. There's just no reason not to, really.

I don't have any knowledge on Vagrant but I use virtualenvs for my Django projects. I would recommend it for anyone.
With that said, if you're only going to be using one Django project on a virtual machine you don't need to use a virtualenv. I haven't come across a situation where apps in the same project have conflicting dependencies. This could be a problem if you have multiple projects on the same machine however.

There are many benefit of working with virtual environment on your development machine.
You can go to any version of any supported module to check for issues
Your project runs under separate environment without conflicting with your system wide modules and settings
Testing is easy
Muliple version of same project can co-exist.

if you develop multiple projects with different django versions, virtualenv is just a must thing, there is no other way (not that i know). you feel in heaven in virtualenv if you once experience the dependency hell. Even if you develop one project I would recommend to code inside virtualenv, you never know what comes next, back in the days, my old laptop was almost crashing because of so many dependency problems, after i discovered virtualenv, my old laptop became a brand new laptop for my eyes..

No, in your case, you don't need to bother with virtualenv. Since you're using a dedicated virtual machine it's just a layer of complexity you, as a noob, don't really need.
Virtualenv is pretty simple, in concept and usage, so you'll layer it on simply enough when the need arises. But, imho, there is added value in learning how a python installation is truly laid out before adding indirection. When you hit a problem that it can solve, then go for it. But for now, keep it simple: don't bother.

Are there any other good alternatives to zc.buildout and/or virtualenv for installing non-python dependencies?

I am a member of a team that is about to launch a beta of a python (Django specifically) based web site and accompanying suite of backend tools. The team itself has doubled in size from 2 to 4 over the past few weeks and we expect continued growth for the next couple of months at least. One issue that has started to plague us is getting everyone up to speed in terms of getting their development environment configured and having all the right eggs installed, etc.
I'm looking for ways to simplify this process and make it less error prone. Both zc.buildout and virtualenv look like they would be good tools for addressing this problem but both seem to concentrate primarily on the python-specific issues. We have a couple of small subprojects in other languages (Java and Ruby specifically) as well as numerous python extensions that have to be compiled natively (lxml, MySQL drivers, etc). In fact, one of the biggest thorns in our side has been getting some of these extensions compiled against appropriate versions of the shared libraries so as to avoid segfaults, malloc errors and all sorts of similar issues. It doesn't help that out of 4 people we have 4 different development environments -- 1 leopard on ppc, 1 leopard on intel, 1 ubuntu and 1 windows.
Ultimately what would be ideal would be something that works roughly like this, from the dos/unix prompt:
$ git clone [repository url]
...
$ python setup-env.py
...
that then does what zc.buildout/virtualenv does (copy/symlink the python interpreter, provide a clean space to install eggs) then installs all required eggs, including installing any native shared library dependencies, installs the ruby project, the java project, etc.
Obviously this would be useful for both getting development environments up as well as deploying on staging/production servers.
Ideally I would like for the tool that accomplishes this to be written in/extensible via python, since that is (and always will be) the lingua franca of our team, but I am open to solutions in other languages.
So, my question then is: does anyone have any suggestions for better alternatives or any experiences they can share using one of these solutions to handle larger/broader install bases?

Setuptools may be capable of more of what you're looking for than you realize -- if you need a custom version of lxml to work correctly on MacOS X, for instance, you can put a URL to an appropriate egg inside your setup.py and have setuptools download and install that inside your developers' environments as necessary; it also can be told to download and install a specific version of a dependency from revision control.
That said, I'd lean towards using a scriptably generated virtual environment. It's pretty straightforward to build a kickstart file which installs whichever packages you depend on and then boot virtual machines (or production hardware!) against it, with puppet or similar software doing other administration (adding users, setting up services [where's your database come from?], etc). This comes in particularly handy when your production environment includes multiple machines -- just script the generation of multiple VMs within their handy little sandboxed subnet (I use libvirt+kvm for this; while kvm isn't available on all the platforms you have developers working on, qemu certainly is, or you can do as I do and have a small number of beefy VM hosts shared by multiple developers).
This gets you out of the headaches of supporting N platforms -- you only have a single virtual platform to support -- and means that your deployment process, as defined by the kickstart file and puppet code used for setup, is source-controlled and run through your QA and review processes just like everything else.

I always create a develop.py file at the top level of the project, and have also a packages directory with all of the .tar.gz files from PyPI that I want to install, and also included an unpacked copy of virtualenv that is ready to run right from that file. All of this goes into version control. Every developer can simply check out the trunk, run develop.py, and a few moments later will have a virtual environment ready to use that includes all of our dependencies at exactly the versions the other developers are using. And it works even if PyPI is down, which is very helpful at this point in that service's history.

Basically, you're looking for a cross-platform software/package installer (on the lines of apt-get/yum/etc.) I'm not sure something like that exists?
An alternative might be specifying the list of packages that need to be installed via the OS-specific package management system such as Fink or DarwinPorts for Mac OS X and having a script that sets up the build environment for the in-house code?

I have continued to research this issue since I posted the question. It looks like there are some attempts to address some of the needs I outlined, e.g. Minitage and Puppet which take different approaches but both may accomplish what I want -- although Minitage does not explicitly state that it supports Windows. Lacking any better options I will try to make either one of these or just extensive customized use of zc.buildout work for our needs, but I still feel like there must be better options out there.

You might consider creating virtual machine appliances with whatever production OS you are running, and all of the software dependencies pre-built. Code can be edited either remotely, or with a shared folder. It worked pretty well for me in a past life that had a fairly complicated development environment.

Puppet doesn't (easily) support the Win32 world either. If you're looking for a deployment mechanism and not just a "dev setup" tool, you might consider looking into ControlTier (http://open.controltier.com/) which has a open-source cross-platform solution.
Beyond that you're looking at "enterprise" software such as BladeLogic or OpsWare and typically an outrageous pricetag for the functionality offered (my opinion, obviously).
A lot of folks have been aggressively using a combination of Puppet and Capistrano (even non-rails developers) for deployment automation tools to pretty good effect. Downside, again, is that it's expecting a somewhat homogeneous environment.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.