Does Multithreading work on Google Cloud CPU instances?

Does Multithreading work on Google Cloud CPU instances? - python

A friend said he was able to get multithreading in python (with Anaconda's default interpreter, probably cpython?) to utilise all cores on Google Cloud vCPUs. However, ordinarily, multithreading in python is limited to a single core on local machines.
Is this possible? Does this have something to do with the way vCPUs share memory? I assumed that a vCPU looks like a logical core to the OS and the same GIL restrictions would apply.

On Compute Engine, each virtual CPU (vCPU) is implemented as a single hardware hyper-thread on one of the available CPU Platforms. On Intel Xeon processors, Intel Hyper-Threading Technology allows multiple application threads to run on each physical processor core. You configure your Compute Engine virtual machine instances with one or more of these hyper-threads as vCPUs. The machine type specifies the number of vCPUs that your instance has.
You can identify the specific CPU platform for your instance using one of the following options:
See what CPU platforms are available in each of the available regions and zones.
Use the compute.instances.get a method to obtain the CPU platform property for one of your existing instances.
On Linux instances, run cat /proc/cpuinfo.
If you want to change the CPU platform for your instance, you can specify a minimum CPU platform.

Yes. But it depends on the VM type. I think you will want general purpose, high memory, compute intensive, or A2 types.
From GCP Docs(2021-08-17) "By default, Compute Engine enables simultaneous multithreading (SMT) on all virtual machine (VM) instances. With SMT enabled, a single physical CPU core can run 2 virtual CPUs (vCPUs), each as separate threads."
Then when you are creating a new VM you can up the threads per a core:
gcloud beta compute instances create VM_NAME \
--zone=ZONE \
--machine-type=MACHINE_TYPE \
--threads-per-core=THREADS_PER_CORE
"THREADS_PER_CORE: the number of threads per physical core. Current processors support 2 threads per core for SMT, which is enabled by default. To disable SMT, set to 1."

Related

How to apply GoogleColab stronger CPU and more RAM?

I use GoogleColab to test data stuctures like chain-hashmap,probe-hashmap,AVL-tree,red-black-tree,splay-tree(written in Python),and I store very large dataset(key-value pairs) with these data stuctures to test some operation running time,its scale just like a small wikipedia,so run these python script will use very much memory(RAM),GoogleColab offers a approximately 12G RAM but not enough for me,these python scripts will use about 20-30G RAM,so when I run python program in GoogleColab,will often raise an exception that"your program run over 12G upper bound",and often restarts.On the other hand,I have some PythonScript to do some recursion algorithm,as is seen to all,recursion algorithm use CPU vety mush(as well as RAM),when I run these algorithm with 20000+ recursion,GoogleColab often fails to run and restart,I knew that GoogleColab uses two cores of Intel-XEON CPU,but how do I apply more cores of CPU from Google?

You cannot upgrade the GPU and CPU but you can increase the RAM from 12 gb to 25gb just by crashing the session with just by any non ending while loop.
l=[]
while 1:
l.append('nothing')

There is no way to request more CPU/RAM from Google Colaboratory at this point, sorry.

Google Colab Pro recently launched for $9.99 a month (Feb. 2020). Users in the US can get higher resource limits and more frequent access to better resources.
Q&A from the signup page is below:
What kinds of GPUs are available in Colab Pro?
With Colab Pro you get priority access to our fastest GPUs. For example, you may get access to T4 and P100 GPUs at times when non-subscribers get K80s. You also get priority access to TPUs. There are still usage limits in Colab Pro, though, and the types of GPUs and TPUs available in Colab Pro may vary over time.
In the free version of Colab there is very limited access to faster GPUs, and usage limits are much lower than they are in Colab Pro.
How long can notebooks run in Colab Pro?
With Colab Pro your notebooks can stay connected for up to 24 hours, and idle timeouts are relatively lenient. Durations are not guaranteed, though, and idle timeouts may sometimes vary.
In the free version of Colab notebooks can run for at most 12 hours, and idle timeouts are much stricter than in Colab Pro.
How much memory is available in Colab Pro?
With Colab Pro you get priority access to high-memory VMs. These VMs generally have double the memory of standard Colab VMs, and twice as many CPUs. You will be able to access a notebook setting to enable high-memory VMs once you are subscribed. Additionally, you may sometimes be automatically assigned a high-memory VM when Colab detects that you are likely to need it. Resources are not guaranteed, though, and there are usage limits for high memory VMs.
In the free version of Colab the high-memory preference is not available, and users are rarely automatically assigned high memory VMs.

For a paid, high-capability solution, you may want to try Google Cloud Datalab instead

Scaling tornado on AWS

I am running a tornado client-side application on AWS EC2 with a Linux t2.micro instance which includes 1 vCPU and 1 GiB of RAM. I have noticed that I the application performance and speed slows after 75 simultaneous HTTP connections.
Considering that tornado runs on a single process-thread (using an event loop asynchronous architecture) - I am wondering if upgrading to an AWS t2.medium instance with 2 vCPU's would actually help.
In theory, can a single process with a single thread be run on two CPU's? Or is Amazon's vCPU not a real CPU and just a measurement of processing power?

Tornado supports running multiple Python processes to take advantage of a multi-CPU machine. As described in the documentation, you can use Tornado itself to fork those processes, or you can set a load balancer proxying a number of processes started either manually or using some manager like supervisor.
As for your second question, apparently an AWS vCPU is basically a single hyperthread spawned from a real processor core, which should in Python's case amount to the equivalent of a "real" CPU (but I'm far from being an expert on the topic).

How to use multiple GPU on each replica in Distributed Tensorflow Inception V3?

The Code Here shows how to set each replica which has a single tower that uses one GPU.
The way I currently used for using all GPU on a worker machine is starting the number of workers that equal to the number of GPUs. then the workers can communicate to each other as if they are not on one machine. That is slower than if I can start a woker that control more than one GPU.
I'm wondering if there is a way changing this code a little bit to make use of multiple GPU on one machine like that example.

Using a higher-level library will help. Slim has utilities to deploy a model on multiple local GPUs.

How do I get my application server CPU to 100%?

I have a dedicated application server that does analytics.
I'm running on 2CPU, 8GB RAM machine.
I have two same applications running like below.
python do_analytics.py &
python do_analytics.py &
However, my CPU is below 20%. Can I run more processes to make full use of my CPU? Will it speed up or my single processes will run slower now since I only have 2 CPU?
Thanks.

The fact that your CPU usage is below 20%, means that your CPU can take more load. So yes you can run more processes.
Will it speed up or my single processes will run slower now since I only have 2 CPU?
It depends on other factors of what your application is doing. If most of the analytic logic is just using the processing power and memory. You need not worry. But if more process mean more disk access or shared resource. Then running more process may reduce the overall performance.

Parallel Computing with Python on a queued cluster

There are lots of different modules for threading/parallelizing python. Dispy and pp/ParallelPython seem especially popular. It looks like these are all designed for a single interface (e.g. desktop) which has many cores/processors. Is there a module which works on massively parallel architectures which are run by queue systems (specifically: SLURM)?

The most used parallel framework on large compute clusters for scientific/technical applications is MPI. The name of the Python package is MPI4py, which is part of SciPy.
MPI offers a high-level API for creating parallel software using messages for communicating over the network; remote process creation, data scatter/gather, reductions, etc. All implementations are able to take advantage of fast and low-latency networks if present. It is fully integrated with all cluster managers, including Slurm.

Via the ParallelPython main page:
"PP is a python module which provides mechanism for parallel execution of python code on SMP (systems with multiple processors or cores) and clusters (computers connected via network)."

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.