Measure performance and time used by Python web server

Measure performance and time used by Python web server - python

I'm running a website using Bottle and Sqlite, which are both lightweight, on a small private hosting (using Linux).
I'm starting to have more and more users each day.
How to measure if both of them are :
running smoothly, using only < 10% of CPU / RAM of the hosting computer
running with a medium usage, this is a premature sign meaning I should think about using something more production-ready for multi-users than Sqlite
running with a very high CPU or RAM usage, meaning that a pipe is going to break soon ... if I don't modify something !

To simply measure CPU and memory use, you can log into the machine and run top or htop.
I assume, though, that you don't simply want to measure, but that you also want monitoring and alerts. In that case you'll need to run some sort of monitoring framework; which one to choose is a broader question. One popular choice that you might take a look at is Monit.

Related

Can a serverless architecture support high memory needs?

The challenge is to run a set of data processing and data science scripts that consume more memory than expected.
Here are my requirements:
Running 10-15 Python 3.5 scripts via Cron Scheduler
These different 10-15 scripts each take somewhere between 10 seconds to 20 minutes to complete
They run on different hours of the day, some of them run every 10 minute while some run once a day
Each script logs what it has done so that I can take a look at it later if something goes wrong
Some of the scripts sends e-mails to me and to my team mates
None of the scripts have an HTTP/web server component; they all run on Cron schedules and not user-facing
All the scripts' code is fed from my Github repository; when scripts wake up, they first do a git pull origin master and then start executing. That means, pushing to master causes all scripts to be on the latest version.
Here is what I currently have:
Currently I am using 3 Digital Ocean servers (droplets) for these scripts
Some of the scripts require a huge amount of memory (I get segmentation fault in droplets with less than 4GB of memory)
I am willing to introduce a new script that might require even larger memory (the new script currently faults in a 4GB droplet)
The setup of the droplets are relatively easy (thanks to Python venv) but not to the point of executing a single command to spin off a new droplet and set it up
Having a full dedicated 8GB / 16B droplet for my new script sounds a bit inefficient and expensive.
What would be a more efficient way to handle this?

What would be a more efficient way to handle this?
I'll answer in three parts:
Options to reduce memory consumption
Minimalistic architecture for serverless computing
How to get there
(I) Reducing Memory Consumption
Some handle large loads of data
If you find the scripts use more memory than you expect, the only way to reduce the memory requirements is to
understand which parts of the scripts drive memory consumption
refactor the scripts to use less memory
Typical issues that drive memory consumption are:
using the wrong data structure - e.g. if you have numerical data it is usually better to load the data into a numpy array as opposed to a Python array. If you create a lot of objects of custom classes, it can help to use __slots__
loading too much data into memory at once - e.g. if the processing can be split into several parts independent of each other, it may be more efficient to only load as much data as one part needs, then use a loop to process all the parts.
hold object references that are no longer needed - e.g. in the course of processing you create objects to represent or process some part of the data. If the script keeps a reference to such an object, it won't get released until the end of the program. One way around this is to use weak references, another is to use del to dereference objects explicitely. Sometimes it also helps to call the garbage collector.
using an offline algorithm when there is an online version (for machine learning) - e.g. some of scikit's algorithms provide a version for incremental learning such as LinearRegression => SGDRegressior or LogisticRegression => SGDClassifier
some do minor data science tasks
Some algorithms do require large amounts of memory. If using an online algorithm for incremental learning is not an option, the next best strategy is to use a service that only charges for the actual computation time/memory usage. That's what is typically referred to as serverless computing - you don't need to manage the servers (droplets) yourself.
The good news is that in principle the provider you use, Digital Ocean, provides a model that only charges for resources actually used. However this is not really serverless: it is still your task to create, start, stop and delete the droplets to actually benefit. Unless this process is fully automated, the fun factor is a bit low ;-)
(II) Minimalstic Architecture for Serverless Computing
a full dedicated 8GB / 16B droplet for my new script sounds a bit inefficient and expensive
Since your scripts run only occassionally / on a schedule, your droplet does not need to run or even exist all the time. So you could set this is up the following way:
Create a schedulding droplet. This can be of a small size. It's only purpose is to run a scheduler and to create a new droplet when a script is due, then submit the task for execution on this new worker droplet. The worker droplet can be of the specific size to accommodate the script, i.e. every script can have a droplet of whatever size it requires.
Create a generic worker. This is the program that runs upon creation of a new droplet by the scheduler. It receives as input the URL to the git repository where the actual script to be run is stored, and a location to store results. It then checks out the code from the repository, runs the scripts and stores the results.
Once the script has finished, the scheduler deletes the worker droplet.
With this approach there are still fully dedicated droplets for each script, but they only cost money while the script runs.
(III) How to get there
One option is to build an architecture as described above, which would essentially be an implementation of a minimalistic architecture for serverless computing. There are several Python libraries to interact with the Digital Ocean API. You could also use libcloud as a generic multi-provider cloud API to make it easy(ier) to switch providers later on.
Perhaps the better alternative before building yourself is to evaluate one of the existing open source serverless options. An extensive curated list is provided by the good fellows at awesome-serverless. Note at the time of writing this, many of the open source projects are still in their early stages, the more mature options are commerical.
As always with engineering decisions, there is a trade-off between the time/cost required to build or host yourself v.s. the cost of using a readily-available commercial service. Ultimately that's a decision only you can take.

How to manage many highspeed client-server autonomus connections and remote programme execution

Here is an interesting problem. I am building an application in AI, that is due to be fundamentally agnostic to computational architectures(hopefully to include mobile). We have built a python application with a machine learning and deep-learning data structure that connects to despaired systems, and combines data with the new systems usually running a windows app/sometimes mac.
We have been working with net-workers and pen testers to determine the best approach to maintaining multiple client-server connections with asynchronicity and a heavy load, and maintain a positive security posture.
Here are the Upside's:
1. Data from the producers is very fast.
2. Programmatically we have increased the ability to load-balance.
The downsides:
1. The controllers / are not very fast. Here we are using languages (Node.js, go-Lang, cherrypy, rabbitmq) to speed up the connections and multi/Hyper thread. (Raspberry pi). Our goal is to use minimal frameworks pizero, raspberry, Arduino, to install and use our tech in an "instance" (See Tony Stark in avengers, connect the chips to the rail and create a command center).
2.point machines are not always quick, AND networking can suck. Using some fabric (with some hopes and dreams) we are nearly to the point of being able to deploy a system in seconds. Currently, data is slow on the back-end, and DB's are not helping that case. Some of our clients would like to test, and we are not interested in making a massive distro. With the amount of data, we only hope to achieve a "lightweight" client application that will leverage (dissimilar) technologies.
as we connect to "permanent" despaired systems we will pass our computational loads, and use for cold storage.
-We are having issues with application persistence / Surviving reboot
-With the multi-client control, commands are getting dropped.(some get them 2 times +)
-We are running many scripts and programs that get, write, read, put. We have them spread between multiple languages with all different uses and requirements.
- After the application is loaded updates are not always making it to each unit.
Thank you to all whom reply

Limit CPU usage of a Python Bottle webserver

I'm using Python + Bottle as a webserver.
As I use the production server for many other websites, I don't want Python + Bottle to eat 70% of the CPU for example.
How is it possible to limit the CPU usage of a Python Bottle webserver?
I was thinking about using resource.setrlimit, but is this a good way to do it?
With which syntax should we use resource.setrlimit to set the limit to 20% of the CPU for example?

Step 1
You should ask yourself whether resource optimization is really necessary. If you're certain that specific application consumes too much resources than it could or should then go to step 2.
Step 2
When your application consumes too much resources, first thing you should do is try to identify bottlenecks in it and see if they can be optimized away. Python has various tools that can help you (code profilers, PyPy etc.) If there's nothing you can do in that regard, go to step 3.
Step 3
If you absolutely must limit process resources, keep in mind that:
OS has a very sophisticated scheduling mechanisms that do their best to ensure each running process gets a fair share of CPU time. Even on processor overload, things will still run fine (up to some point).
if you deliberately limit CPU time of one of your processes it may respond slowly or not at all due to network timeouts.
My answer to this question is - reduce static priority of your server if you think it may starve other services but then your server may suffer from starvation when processors are overload. Using nice utility would be my choice as it won't litter your code.

Python/WSGI: Dynamically spin up/down server worker processes across installations

The setup
Our setup is unique in the following ways:
we have a large number of distinct Django installations on a single server.
each of these has its own code base, and even runs as a separate linux user. (Currently implemented using Apache mod_wsgi, each installation configured with a small number of threads (2-5) behind a nginx proxy).
each of these installations have a significant memory footprint (20 - 200 MB)
these installations are "web apps" - they are not exposed to the general web, and will be used by a limited nr. of users (1 - 100).
traffic is expected to be in (small) bursts per-installation. I.e. if a certain installation becomes used, a number of follow up requests are to be expected for that installation (but not others).
As each of these processes has the potential to rack up anywhere between 20 and 200 MB of memory, the total memory footprint of the Django processes is "too large". I.e. it quickly exceeds the available physical memory on the server, leading to extensive swapping.
I see 2 specific problems with the current setup:
We're leaving the guessing of which installation needs to be in physical mememory to the OS. It would seem to me that we can do better. Specifically, an installation that currently gets more traffic would be better off with a larger number of ready workers. Also: installations that get no traffic for extensive amounts of time could even do with 0 ready workers as we can deal with the 1-2s for the initial request as long as follow-up requests are fast enough. A specific reason I think we can be "smarter than the OS": after a server restart on a slow day the server is much more responsive (difference is so great it can be observed w/ the naked eye). This would suggest to me that the overhead of presumably swapped processes is significant even if they have not currenlty activily serving requests for a full day.
Some requests have larger memory needs than others. A process that has once dealt with one such a request has claimed the memory from the OS, but due to framentation will likely not be able to return it. It would be worthwhile to be able to retire memory-hogs. (Currenlty we simply have a retart-after-n-requests configured on Apache, but this is not specifically triggered after the fragmentation).
The question:
My idea for a solution would be to have the main server spin up/down workers per installation depending on the needs per installation in terms of traffic. Further niceties:
* configure some general system constraints, i.e. once the server becomes busy be less generous in spinning up processes
* restart memory hogs.
There are many python (WSGI) servers available. Which of them would (easily) allow for such a setup. And what are good pointers for that?

See if uWSGI works for you. I don't think there is something more flexible.
You can have it spawn and kill workers dynamically, set max memory usage etc. Or you might come with better ideas after reading their docs.

Python script load testing web page

I want to do a test load for a web page. I want to do it in python with multiple threads.
First POST request would login user (set cookies).
Then I need to know how many users doing the same POST request simultaneously can server take.
So I'm thinking about spawning threads in which requests would be made in loop.
I have a couple of questions:
1. Is it possible to run 1000 - 1500 requests at the same time CPU wise? I mean wouldn't it slow down the system so it's not reliable anymore?
2. What about the bandwidth limitations? How good the channel should be for this test to be reliable?
Server on which test site is hosted is Amazon EC2 script would be run from another server(Amazon too).
Thanks!

cPython does not take advantage from multiple cores when running multiple threads. It means, that basically, You will only have one core doing the testing job.
There are dedicated tools to do what You want to do. Let me suggest two:
FunkLoad is a functional and load web tester, written in Python, whose main use cases are:
Functional testing of web projects, and thus regression testing as well.
Performance testing: by loading the web application and monitoring
your servers it helps you to pinpoint bottlenecks, giving a detailed
report of performance measurement.
Load testing tool to expose bugs that do not surface in cursory testing,
like volume testing or longevity testing.
Stress testing tool to overwhelm the web application resources and test
the application recoverability.
Writing web agents by scripting any web repetitive task, like checking if
a site is alive.
Tsung is an open-source multi-protocol distributed load testing tool
The purpose of Tsung is to simulate
users in order to test the scalability
and performance of IP based
client/server applications. You can
use it to do load and stress testing
of your servers. Many protocols have
been implemented and tested, and it
can be easily extended. WebDAV, LDAP
and MySQL support have been added
recently (experimental).
It can be distributed on several
client machines and is able to
simulate hundreds of thousands of
virtual users concurrently (or even
millions if you have enough hardware
...).
If You decide to write Your own tool, You will probably want to use Python's multiprocessing module as it would let You use multiple cores. You should also take a look on Twisted as it would let You easily handle multiple sockets while a limited number of threads. That would be much better than spawning a new thread for each socket.
You work with Amazon EC2, so I would recommend using Tsung. You can rent a dozen of multicore servers for a few hours and run some really heavy load tests with Tsung. It scales very well in this kind of configuration.
As for the bandwidth, it's usually not a problem, but it depends on the application. You will have to monitor all Your resources closely while performing a load test.

too many variables. 1000 at the same time... no. in the same second... possibly. bandwidth may well be the bottleneck. this is something best solved by experimentation.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.