I have access to a set of cloud machines. Each of these machine is responsible for specific tasks and has a set of tools responsible for the task.
Now these tools are updated weekly adding new functions. All the tools are implemented on the python language.
The problem is that I need to upload every time my code to all of these machines. I want to have a common place for the tools for all the VMs. How can I do that?
My initial idea is to just mount on every VM a service like dropbox. However, I dont know if this is the correct approach for the problem.
Could you please give some suggestions?
Assuming you want to maintain performance then you probably still want to keep the tools on the machines that actually have to use them. In other words whatever you are doing will probably run slower if you have to access some 'off machine' location to get any tool required.
If what you are looking for is a way to more easily manage and distribute your tool updates to multiple machines, you could store all your tools in a repository (like SVN or GIT etc or even a home made one) and have a script on your machines which runs every day (or hour or whatever you require) to update the tools to the latest release.
Ideally you want your update to only include changes since the last update, but most distributed repositories will support this automatically.
Related
The Current State:
I have some non-negligeble amount of microservices written in python.
Each such microservice has its own yaml configuration file that is located in the git repo. We use dynaconf to read the configuraion.
The Problem:
At first it was fine, the configurations were relatively small and it was easy to maintain them. Time went by, and the configurations went larger. It became annoying to change the configurations and it is bad that they are scattered between different git repos, i.e. not centralized.
I want to use "Externalized Configurations" in order to maintain all the configurations in a single repo and that each microservice will read its portion on startup. I have heard about Spring Boot, but it seems to be way too much and apart from it, it seems that the pip libraries seems to be at beta stage, new and unriliable...
Is there another reccomendation in this particular use case? Or should I proceed with Spring Boot?
You can use Microconfig.IO to manage your configuration with powerful templating and distribution. It's language agnostic as long as your configs are YAML or properties format.
I think you could create a default config file for each service and use the external storage idea of the dynaconf.
One possible solution would be to create a simple system to manage these variables with Redis.
And the DynaConf CLI allows you to make changes on the fly.
I have some machines available in our office that I want to use to deploy Python scripts to.
My idea was that I have one central machine that manages the deployment of the python scripts and that each node communicates with that central machine to pick up new scripts, etc.
There is no dependency amongst the scripts, they can just run (scrape) and store the results locally.
I am not sure where to start with this.. I have some ideas on writing apps that do this automatically, but I just can not imagine I am the only one trying to do this.
G.
A Couple things to note. You may not even need to have a management node. I have ran calculations both that had a management node and those that do not. Depending on your cluster, it may honestly be easier to qsub a shell script. There is usually memory in a cluster for this.
Here is a list of other alternatives to IPython, but Peter Sutton's recommendation for IPython is great.
http://mpi4py.scipy.org/docs/usrman/intro.html
If you're still looking for a long-term solution (or this can answer can server for people to come), here I have suggestion. I am assuming, you want some solution for not a just one time task, but a long term management (correct me if I get it wrong).
Use a configuration management tool - as per your case a single management node, and merely deploying .py scripts,
I'd recommend using Ansible.
With Anisble you can start with a controller node, making an inventory (list of hosts/machines), and share the controller's public key on all the nodes/machines.
Here's few very plain English blogs to get started with:
How to install Ansible.
Getting started with Ansible.
Disclaimer: I am the author of above blog posts.
I have several small Python libraries that I wrote with stuff that I find myself wanting over and over again. I think most programmers have something similar. I want to use these libraries from a variety of different machines so I've started keeping this stuff in my DropBox. However, I'd like to be able to use my code on machines on which I can't install DropBox or other cloud storage applications, even in portable form. I can just download the files every time one of them changes (DropBox can provide me a URL for each file in my Public folder), which is only a moderate nuisance. But--and I admit this is a longshot--is there a solution out there that will let me tell Python to load a library from my DropBox via http?
BTW, I'd like to add the whole remove folder to my sys.path, but getting a URL for a folder is complicated, so I'm going to try to walk before I run by starting with individual files.
Yes, it's possible. I think you want the combination of two previous questions:
How to download a file in python over HTTP
How to dynamically load a library in python
So your task basically breaks down into writing a little bit of glue code: download the URL via the first bullet, write it to a local file, and then import that file using the second bullet.
So that's how you'd do that.
BUT - please keep in mind that dynamically downloading and executing code has many potential security downfalls. Will you be doing this over a secure connection? Who else has the ability to manipulate that URL? There are a bunch of security issues inherent in downloading and executing code on the fly. I would ask you to consider going about your solution in a different way, but I'm giving you the answer you're asking for.
As a simple security check, you can establish a known-good hash for your file, and then refuse to import any file other than one that's on the list of known-good hashes. This makes it a pain to update your modules, but gives you a little bit of extra safety.
Don't use DropBox as a Revision control
Pick a real solution like Git
Setup access to the Git repository on one of your servers
Clone the repository to your worker machines and checkout master
Create a develop branch where you put every change you make
Test the changes and when you consider any of them stable, merge it to master
On your worker machines set up a cron job which periodically pulls from master branch of repository (and possibly restarts some Python processes as importing the same module again won't make Python interpreter aware of changes since imported modules are cached)
Enjoy your automatically updated workers :)
Don't feel shame - it happens that even experienced software developers come up with XY problem
I have a directory of python programs, classes and packages that I currently distribute to 5 servers. It seems I'm continually going to be adding more servers and right now I'm just doing a basic rsync over from my local box to the servers.
What would a better approach be for distributing code across n servers?
thanks
I use Mercurial with fabric to deploy all the source code. Fabric's written in python, so it'll be easy for you to get started. Updating the production service is as simple as fab production deploy. Which ends ups doing something like this:
Shut down all the services and put an "Upgrade in Progress" page.
Update the source code directory.
Run all migrations.
Start up all services.
It's pretty awesome seeing this all happen automatically.
First, make sure to keep all code under revision control (if you're not already doing that), so that you can check out new versions of the code from a repository instead of having to copy it to the servers from your workstation.
With revision control in place you can use a tool such as Capistrano to automatically check out the code on each server without having to log in to each machine and do a manual checkout.
With such a setup, deploying a new version to all servers can be as simple as running
$ cap deploy
from your local machine.
While I also use version control to do this, another approach you might consider is to package up the source using whatever package management your host systems use (for example RPMs or dpkgs), and set up the systems to use a custom repository Then an "apt-get upgrade" or "yum update" will update the software on the systems. Then you could use something like "mussh" to run the stop/update/start commands on all the tools.
Ideally, you'd push it to a "testing" repository first, have your staging systems install it, and once the testing of that was signed off on you could move it to the production repository.
It's very similar to the recommendations of using fabric or version control in general, just another alternative which may suit some people better.
The downside to using packages is that you're probably using version control anyway, and you do have to manage version numbers of these packages. I do this using revision tags within my version control, so I could just as easily do an "svn update" or similar on the destination systems.
In either case, you may need to consider the migration from one version to the next. If a user loads a page that contains references to other elements, you do the update and those elements go away, what do you do? You may wish to do something either within your deployment scripting, or within your code where you first push out a version with the new page, but keep the old referenced elements, deploy that, and then remove the referenced elements and deploy that later.
In this way users won't see broken elements within the page.
I have a web service to which users upload python scripts that are run on a server. Those scripts process files that are on the server and I want them to be able to see only a certain hierarchy of the server's filesystem (best: a temporary folder on which I copy the files I want processed and the scripts).
The server will ultimately be a linux based one but if a solution is also possible on Windows it would be nice to know how.
What I though of is creating a user with restricted access to folders of the FS - ultimately only the folder containing the scripts and files - and launch the python interpreter using this user.
Can someone give me a better alternative? as relying only on this makes me feel insecure, I would like a real sandboxing or virtual FS feature where I could run safely untrusted code.
Either a chroot jail or a higher-order security mechanism such as SELinux can be used to restrict access to specific resources.
You are probably best to use a virtual machine like VirtualBox or VMware (perhaps even creating one per user/session).
That will allow you some control over other resources such as memory and network as well as disk
The only python that I know of that has such features built in is the one on Google App Engine. That may be a workable alternative for you too.
This is inherently insecure software. By letting users upload scripts you are introducing a remote code execution vulnerability. You have more to worry about than just modifying files, whats stopping the python script from accessing the network or other resources?
To solve this problem you need to use a sandbox. To better harden the system you can use a layered security approach.
The first layer, and the most important layer is a python sandbox. User supplied scripts will be executed within a python sandbox. This will give you the fine grained limitations that you need. Then, the entire python app should run within its own dedicated chroot. I highly recommend using the grsecurity kernel modules which improve the strength of any chroot. For instance a grsecuirty chroot cannot be broken unless the attacker can rip a hole into kernel land which is very difficult to do these days. Make sure your kernel is up to date.
The end result is that you are trying to limit the resources that an attacker's script has. Layers are a proven approach to security, as long as the layers are different enough such that the same attack won't break both of them. You want to isolate the script form the rest of the system as much as possible. Any resources that are shared are also paths for an attacker.