Scaling tornado on AWS - python

I am running a tornado client-side application on AWS EC2 with a Linux t2.micro instance which includes 1 vCPU and 1 GiB of RAM. I have noticed that I the application performance and speed slows after 75 simultaneous HTTP connections.
Considering that tornado runs on a single process-thread (using an event loop asynchronous architecture) - I am wondering if upgrading to an AWS t2.medium instance with 2 vCPU's would actually help.
In theory, can a single process with a single thread be run on two CPU's? Or is Amazon's vCPU not a real CPU and just a measurement of processing power?

Tornado supports running multiple Python processes to take advantage of a multi-CPU machine. As described in the documentation, you can use Tornado itself to fork those processes, or you can set a load balancer proxying a number of processes started either manually or using some manager like supervisor.
As for your second question, apparently an AWS vCPU is basically a single hyperthread spawned from a real processor core, which should in Python's case amount to the equivalent of a "real" CPU (but I'm far from being an expert on the topic).

Related

Does Multithreading work on Google Cloud CPU instances?

A friend said he was able to get multithreading in python (with Anaconda's default interpreter, probably cpython?) to utilise all cores on Google Cloud vCPUs. However, ordinarily, multithreading in python is limited to a single core on local machines.
Is this possible? Does this have something to do with the way vCPUs share memory? I assumed that a vCPU looks like a logical core to the OS and the same GIL restrictions would apply.
On Compute Engine, each virtual CPU (vCPU) is implemented as a single hardware hyper-thread on one of the available CPU Platforms. On Intel Xeon processors, Intel Hyper-Threading Technology allows multiple application threads to run on each physical processor core. You configure your Compute Engine virtual machine instances with one or more of these hyper-threads as vCPUs. The machine type specifies the number of vCPUs that your instance has.
You can identify the specific CPU platform for your instance using one of the following options:
See what CPU platforms are available in each of the available regions and zones.
Use the compute.instances.get a method to obtain the CPU platform property for one of your existing instances.
On Linux instances, run cat /proc/cpuinfo.
If you want to change the CPU platform for your instance, you can specify a minimum CPU platform.
Yes. But it depends on the VM type. I think you will want general purpose, high memory, compute intensive, or A2 types.
From GCP Docs(2021-08-17) "By default, Compute Engine enables simultaneous multithreading (SMT) on all virtual machine (VM) instances. With SMT enabled, a single physical CPU core can run 2 virtual CPUs (vCPUs), each as separate threads."
Then when you are creating a new VM you can up the threads per a core:
gcloud beta compute instances create VM_NAME \
--zone=ZONE \
--machine-type=MACHINE_TYPE \
--threads-per-core=THREADS_PER_CORE
"THREADS_PER_CORE: the number of threads per physical core. Current processors support 2 threads per core for SMT, which is enabled by default. To disable SMT, set to 1."

How do I get my application server CPU to 100%?

I have a dedicated application server that does analytics.
I'm running on 2CPU, 8GB RAM machine.
I have two same applications running like below.
python do_analytics.py &
python do_analytics.py &
However, my CPU is below 20%. Can I run more processes to make full use of my CPU? Will it speed up or my single processes will run slower now since I only have 2 CPU?
Thanks.
The fact that your CPU usage is below 20%, means that your CPU can take more load. So yes you can run more processes.
Will it speed up or my single processes will run slower now since I only have 2 CPU?
It depends on other factors of what your application is doing. If most of the analytic logic is just using the processing power and memory. You need not worry. But if more process mean more disk access or shared resource. Then running more process may reduce the overall performance.

running python code on distributed cluster

I need to run some numpy computation on 5000 files in parallel using python. I have the sequential single machine version implemented already. What would be the easiest way to run the code in parallel (say using an ec2 cluster)? Should I write my own task scheduler and job distribution code?
You can have a look pscheduler Python module. It will allow you to queue up your jobs and run them sequentially. The number of concurrent processes will depend upon the available CPU cores. This program can easily scale up and submit your jobs to remote machines but then would require all your remote machines to use NFS.
I'll be happy to help you further.

Long running tasks in Pyramid web app

I need to run some tasks in background of web app (checking the code out, etc) without blocking the views.
The twist in typical Queue/Celery scenario is that I have to ensure that the tasks will complete, surviving even web app crash or restart until those tasks complete, whatever their final result.
I was thinking about recording parameters for multiprocessing.Pool in a database and starting all the incomplete tasks at webapp restart. It's doable, but I'm wondering if there's a simpler or more cost-effective aproach?
UPDATE: Why not Celery itself? Well, I used Celery in some projects and it's really a great solution, but for this task it's on the big side: it requires a separate server, communication, etc., while all I need is spawning a few processes/threads, doing some work in them (git clone ..., svn co ...) and checking whether they succeeded or failed. Another issue is that I need the solution to be as small as possible since I have to make it follow elaborate corporate guidelines, procedures, etc., and the human administrative and bureaucratic overhead I'd have to go through to get Celery onboard is something I'd prefer to avoid if I can.
I would suggest you to use Celery.
Celery does not require its own server, you can have a worker running on the same machine. You can also have a "poor man's queue" using an SQL database instead of a "real" queue/messaging server such as RabbitMQ - this setup would look very much like what you're describing, only with a separate process doing the long-running tasks.
The problem with starting long-running tasks from the webserver process is that in the production environment the web "workers" are normally managed by the webserver - multiple workers can be spawned or killed at any time. The viability of your approach would highly depend on the web server you're using and its configuration. Also, with multiple workers each trying to do a task you may have some concurrency issues.
Apart from Celery, another option is to look at UWSGI's spooler subsystem, especially if you're already using UWSGI.

Python WSGI deployment on Windows for CPU-bound application

What options do I have for the deployment of a CPU bound Python-WSGI application on Windows?
The application benefits greatly from multiple CPUs (image manipulation/encoding) but the GIL prevents it from using them.
My understanding is:
mod_wsgi has no support for WSGIDaemonProcess on Windows and Apache itself only runs with one process
all fork based solutions (flup, spawning, gunicorn) only work on unix
Are there any other deployment options I'm missing?
PS: I asked that on serverfault but someone suggested to ask here.
I have successfully used isapi-wsgi to deploy WSGI web apps on Windows IIS (I assume that since you're deploying on Windows, IIS is an option).
Create an IIS Application Pool to host your application in and configure it as a Web Garden (Properties | Performance | Maximum number of worker processes).
Disclaimer: I have never used this feature myself (I have always used the default App Pool configuration, where Max. number of worker processes is 1). But it is my understanding that this will spin up more processes to handle requests.
It would be a bit of a mess, but you could use the subprocess module to fire off worker processes yourself. I'm pretty sure that Popen.wait() and/or Popen.communicate() ought to release the GIL. You've still got the process creation overhead though, so you might not gain a lot/anything over standard CGI.
Another option is to have separate server/worker processes running the whole time and use some form of IPC, although this isn't going to be an easy option. Have a look at the multiprocessing module, and potentially also Pyro.

Categories

Resources