Flask App Dispatching: Multiple uWSGI instances or dispatch to single instance.

Flask App Dispatching: Multiple uWSGI instances or dispatch to single instance. - python

I'm working on a Flask application, in which we will have multiple clients (10-20) each with their own configuration (for the DB, client specific settings etc.) Each client will have a subdomain, like www.client1.myapp.com, www.cleint2.myapp.com. I'm using uWSGI as the middleware and nginx as the proxy server.
There are a couple of ways to deploy this I suppose, one would be to use application dispatching, and a single instance of uwsgi. Another way would be to just run a separate uwsgi instance for each client, and forward the traffic with nginx based on subdomain to the right app. Does anyone know of the pros and cons of each scenario? Just curious, how do applications like Jira handle this?

I would recommend having multiple instances, forwarded to by nginx. I'm doing something similar with a PHP application, and it works very well.
The reason, and benefit of doing it this way is that you can keep everything completely separate, and if one client's setup goes screwy, you can re-instance it and there's no problem for anyone else. Also, no user, even if they manage to break the application level security can access any other user's data.
I keep all clients on their own databases (one mysql instance, multiple dbs), so I can do a complete sqldump (if using mysql, etc) or for another application which uses sqlite rather than mysql: copy the .sqlite database completely for backup.
Going this way means you can also easily set up a 'test' version of a client's site, as well as as live one. Then you can swap which one is actually live just by changing your nginx settings. Say for doing upgrades, you can upgrade the testing one, check it's all OK, then swap. (Also, for some applications, the client may like having their own 'testing' version, which they can break to their hearts content, and know they (or you) can reinstance it in moments, without harming their 'real' data).
Going with application dispatching, you cannot easily get nginx to serve separate client upload directories, without having a separate nginx config per client (and if you're doing that, then why not go for individual uWSGI instances anyway). Likewise for individual SSL certificates (if you want that...).
Each subdomain (or separate domain entirely for some) has it's own logging, so if a certain client is being DOS'd, or hacked otherwise, it's easy to see which one.
You can set up file-system level size quotas per user, so that if one client starts uploading gBs of video, your server doesn't get filled up as well.
The way I'm working is using ansible to provision and set up the server how I want it, with the client specific details kept in a separate host_var file. So my inventory is:
[servers]
myapp.com #or whatever...
[application_clients]
apples
pears
elephants
[application_clients:vars]
ansible_address=myapp.com
then host_vars/apples:
url=apples.myapp.com
db_user=apples
db_pass=secret
then in the provisioning, I set up a two new users & one group for each client. For instance: apples, web.apples as the two users, and the group simply as apples (which both are in).
This way, all the application code is owned by apples user, but the PHP-FPM instance (or uWSGI instance in your case) is run by web.apples. The permissions of all the code is rwXr-X---, and the permissions of uploads & static directories is rwXrwXr-X. Nginx runs as it's own user, so it can access ONLY the upload/static directories, which it can serve as straight files. Any private files which you want to be served by the uWSGI app can be set that way easily. The web user can read the code, and execute it, but cannot edit it. The actual user itself can read and write to the code, but isn't normally used, except for updates, installing plugins, etc.
I can give out a SFTP user to a client which is chroot'd to their uploads directory if they want to upload outside of the application interface.
Using ansible, or another provisioning system, means there's very little work needed to create a new client setup, and if a client (for whatever reason) wants to move over to their own server, it's just a couple of lines to change in the provisioning details, and re-run the scripts. It also means I can keep a development server installed with the exact same provisioning as the main server, and also I can keep a backup amazon instance on standby which is ready to take over if ever I need it to.
I realise this doesn't exactly answer your question about pros and cons each way, but it may be helpful anyway. Multiple instances of uWSGI or any other WSGI server (mainly I use waitress, but there are plenty of good ones) are very simple to set up and if done logically, with a good provisioning system, easy to administrate.

Related

How to protect database connections from rogue developers?

If you're building a system that has three databases - dev, testing and live, and you write a module to connect to the database, how do you make sure that a developer can not connect to the live database?
For example, you have different database clusters which are identified by variables. How do you stop the developer from simply replacing the "dev" variable with the "live" one?
Some potential solutions:
Use an environment variable (but couldn't the developer just change the environment variable on their local machine?)
Use some sort of config file which you then replace with a script when deploying to production. This makes sense and is quite simple...but is it the way these things are normally done?
The assumptions in this question are:
The database connection module is just part of the codebase, so any developer can see it and use it, and potentially change it. (Is this in itself bad?)
It would be great to know how to approach this issue - the stack is a Python server connecting to a Cassandra DB cluster, but where the cluster changes depending on whether it's dev, testing or live.
Many thanks in advance.

Most common 2 solutions for this are (normally used together):
Firewall, production servers should have strict access rules so that at the very least apart from intra-cluster communication that might be free, all other exterior channels need to be pre-approved, so for examples only IPs assigned to the devops team or DBAs can even try to access the machines.
Use credentials. Most frameworks support some form or the other of application.properties/application.yaml/application.conf. Spring boot for example can read files with the name application.properties that are in the same folder as the jar and then use those values to override the ones bundled into the jar. So that you can override the user and password for the database in the production environment, where the dev should not have access.
So:
Dev has access trough the firewall to the dev server and also the user/password to it. This way he can experiment and develop with no problems. Depending on the organization tho' he might sue a local database so this may not apply.
When you go up to test/preprod/prod, the application should be configured to read the connection details from a file or as starting parameters so that the administrator or the dev-ops team can change them to be medium specific. This is also important as not to have the same credentials across all DBs.
For specific info regarding authentication on Cassandra, you can start here: docs
Hope this helped,
Cheers!

Deployment of django app - Scaling, Static Files, Servers

I am working my way to deploy my django app on a Linux Server. Encountered with various problems and want someone to clarify that for me. I searched for days but the answer I found are either too broad or faded away from topic.
1) According to django docs, it is inefficient to serve static files locally. Does that means that all the static files including html,css,js files should be hosted on another server?
2) I have an AWS S3 bucket hosting all my media files, can I use that same bucket (or create a new bucket) to host my static files if above answer is yes. If so, is that efficient?
3) According to search, in order for django to scale horizontally, it should be a stateless app, does it means that I also have to host my database on a different location than my own Linux server ?

1) It is completely fine to host your staticfiles on the same server as your django application however to serve said files you should use a web server such as NGINX or Apache. Django was not designed to serve static data in a production environment. Nginx and Apache on the other hand do great job at it.
2) You can definitely host your static and media files inside an S3 bucket. This will scale a lot better than hosting them on a single server as they're provided by a separate entity, meaning that no matter how many application servers you're running behind a load balancer, all of them will be requesting staticfiles from the same source. To make it more efficient you can even configure AWS' CloudFront which is Amazons CDN (content delivery network).
3) Ideally your database should be hosted on a separate server. Databases are heavy on resources therefore hosting your database on the same server as your application may lead to slowdowns and sometimes outright crashes. Scaling horizontally, you'd be connecting a lot of application servers to a single database instance; effectively increasing the load on that server.
All of the points above are relative to your use case and resources. If the application you are running doesn't deal with heavy traffic - say a few hundred hits a day - and your server has an adequate amount of resources (RAM, CPU, storage) it's acceptable to run everything off a single box.
However if you're planning to accept tens of thousands of connections every day it's better to separate the responsibilities for optimum scalability. Not only it makes your application more efficient and responsive but it also makes your life easier in the long run when you need to scale further (database clustering, nearline content delivery, etc).
TL;DR: you can run everything off a single server if it's beefy enough but in the long run it'll make your life harder.

How do I deploy this app for my job: EC2, Elastic Beanstalk, something else entirely?

I'm tasked with creating a web app (I think?) for my job that will tracker something in our system. It'll be an internal tool that staff uses to keep track of the status of one of the things we do. It should look like trello, with cards that drag from step to step. That frontend exists, but my job is to make the system update when the cards are dragged. This requires using an API in Python and isn't that complicated to grab from/update. I have no idea how to put all of this together. My job is almost completely nontechnical and there's no one internally who knows what I'm doing except for me. I'm in so over my head here and have no idea where to begin. Is this something I should deploy on Elastic Beanstalk? EC2? How do I tie this together and put it somewhere?

Are you trying to pull in live data from Trello or from your companies own internal project management tool?
An EC2 might be useful, but honestly, it may be completely unnecessary if your company has its own servers. An EC2 is basically just a collection of rental computers to help with scaling. I have never used beanstalk so my input would be useless there.
From what I can assume from the question, you could have a python script running to pull from the API and make the changes without an EC2.

First thing you should do is gather as much information about what the end product should look like. From your question, I have the feeling that you have only a vague idea of what the stakeholders want. Don't be afraid to ask more clarification about an unclear task. It's better to spend 30 minutes discussing and taking note than to show the end-product after a month and realizing that's not what your boss/team wanted.
Question I would Ask
Who is going to be using this app? (technical or non-technical person)
For what purpose is this being developed?
Does it need to be on the web or can it be used locally?
How many users need to have access to this application?
Are we handling sensitive information with this application?
Will this need to be augmented with other functionality at some point?
This is just a sample of what I would ask, during the conversation with the stakeholder a lot more will pop up for sure.
What I think you have to do
You need to make a monitoring system for the tasks that need to be done by your development team (like a Kanban)
What I think you already have
A frontend with the card that are draggable to each bin. I also assume that you can create a new card and delete one in the frontend. The frontend is most likely written in React, Angular or Vue.js. You might also have no frontend framework (a mix of jQuery and vanilla js), but usually frontend developper end up picking a framework of sort to help the development.
A backend API in Python (in Flask or with Django-rest-framework most likely) that is communicating with a SQL database like postgresql or a Document database like MongoDB.
I'm making a lot of assumption here, but your aim should be to understand the technology you will be working with in order to check which hosting would work best. For instance, if the database that is setup is a MySQL database you might have some trouble with some hosting provider.
What I think you are missing
Currently the frontend and the backend don't communicate to each other. When you drag a card it won't persist if you refresh the page. Also, all of this is sitting in your computer and cannot be used by any one from your staff. You need to first connect the frontend with the backend so that the application has persistance. Then you need to deploy this application somewhere so that it is reachable by your staff.
What I would do is first work locally to make sure that the layer of persistance is working. This imply having the API server, the frontend server and the database server running simultaneously on your computer to develop. You should then fetch data from the API to know which cards are there in the database and then create them visually in your frontend at the right spot.
When you drop a card to a new spot after having dragging it should trigger a POST request to your API server in order to update the status of this particular card (look at the documentation of your API to check what you need to send).
The server should be sending back an updated version of the cards status if the POST request was sucessful, so your application should then just redraw the card at the right spot (it won't make a difference for you since they are already at the right spot and your frontend framework will most likely won't act on this change since the state hasn't changed). That's all I would do for that part.
I would then move to the deployment phase to make sure that whatever you did locally can still work online. I would use Heroku to start instead of jumping directly to AWS. Heroku is a service built on top of AWS which manage a lot of the complexity of AWS for you. This is great for prototyping and it means that when your stuff is ready you can migrate to AWS easily and be confident that a setup exist to make your app work. You might also be tied up to your company servers, which is another thing I would ask to the stakeholder (i.e. where can I put this application and where I can't put it).
The flow for a frontend + api + database application on Heroku is usually as follow. You create a github repo for your frontend (make it private) and you create an app on Heroku that will watch this repository for changes. It will re-deploy the application for you when it sees a change at a specific subdomain of Heroku hosting. You will need to configure some procfiles that will tell Heroku what to do with a given application type. This is where you need to double check what frontend you are using since that might change the procfiles used. It's most likely a node.js based frontend (React, Angular or Vue) so head over here for the documentation of how to put that online.
You will need to make a repo for the backend also that is separate from the frontend, these two entities are distinct and they only communicate through HTTP request (frontend->backend) and JSON (backend->frontend). You will need to follow the same idea as with the frontend to deploy, head over here.
Once you have these two online, you need to create a database on Heroku. This is done by adding a datastore to your api, head over here. There are some framework specific configuration you need to do to make the API talk to an online database, but then you will need to find that configuration on the framework documentation. The database could also be already up and living on your server, if this is the case you just need to configure your online backend to talk to that particular database at a particular address.
Once all of the above is done, re-test your application to check if you get the same behavior as before. This is a usable MVP, however there are no layer of security. Anyone with the right URL could just fetch your frontend and start messing around with your data.
There is more engineering that need to be done to make this a viable end product. This leads us to my final remark: why you are not using a product like Trello, Jira, or even Github Project? If it is to save some money on not paying for a subscription I think you should factor in the cost of development, security and maintenance of this application.
Hope it helps!

One simple option is Heroku for deploy your API and your frontend application.

django - split project to two servers (due to Cloudflare) without duplicating code

I currently have a Django project running on a server behind Cloudflare. However, a number of the apps contain functionality that requires synchronizing data with certain web services. This is a security risk, because these web services may reveal the IP address of my server. Therefore I need a solution to prevent this.
So far I came up two alternatives: using a proxy or splitting the project to two servers. One server responsible for responding to requests through Cloudflare and one server responsible for synchronizing data with other web services. The IP address of the latter server will be exposed to the public, however attacks on this server will not cause the website to be offline. I prefer the second solution, because this will also split the load between two servers.
The problem is that I do not know how I should do this with Django without duplicating code. I know I can re-use apps, but for most of them counts that I, for instance, only need the models and the serializers and not the views etc. How should I solve this? What is the best approach to take? In addition, what is an appropriate naming for the two servers? Thanks

This sounds like a single project that is being split as part of the deployment strategy. So it makes sense to use just a single codebase for it, rather than splitting it into two projects. If that's the case, re-use is a non-issue since both servers are using the same code.
To support multiple deployments, then, you just create two settings files and load the appropriate one on the appropriate server. The degree to which they are different is up to you. If you want to support different views you can use different ROOT_URLCONF settings. The INSTALLED_APPS can be different, and so forth.

how can i redirect a user to a different server with python?

I'm trying to write a script which will monitor packets (using pypcap) and redirect certain URLs/IPs to something I choose. I know I could just edit the hosts file, but that won't work because I'm not an admin.
I'm thinking that CGI might be useful, but this one has really got me confused.
EDIT:
sorry if it sounded malicious or like a MITM attack. The reason I need this is because I have an (old) application which grabs a page from a site, but the domain has changed recently causing it to not function anymore. I didn't write the application, so I can't just change the domain it accesses.
I basically need to accomplish what can be done by editing the hosts file without having access to it.

pypcap needs administrative rights, so this is not an option.
And you don't have access to the pcs internals, to the source code or to the webserver.
There are a few options left:
Modify the host name in the applications files with a hexeditor and disassembler.
Modify the loaded application in memory with Cheat Engine and other memory tools.
Start the application in a virtual environment which can modify os api calls. A modified wine might be able to do this.
Modify the request between the pc and the webserver with a (transparent)proxy / modified router.
If the application supports the usage of proxies, it might be the easiest solution to set up a local squid with a redirector.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.