I have python code which runs faster when in low stress but gets slowed down later as the stress increase but when I see task manager Its only using 2% cpu and very little ram is there a way to unlock cpu and use all resource with python or it will run faster as an executable file.
I use GoogleColab to test data stuctures like chain-hashmap,probe-hashmap,AVL-tree,red-black-tree,splay-tree(written in Python),and I store very large dataset(key-value pairs) with these data stuctures to test some operation running time,its scale just like a small wikipedia,so run these python script will use very much memory(RAM),GoogleColab offers a approximately 12G RAM but not enough for me,these python scripts will use about 20-30G RAM,so when I run python program in GoogleColab,will often raise an exception that"your program run over 12G upper bound",and often restarts.On the other hand,I have some PythonScript to do some recursion algorithm,as is seen to all,recursion algorithm use CPU vety mush(as well as RAM),when I run these algorithm with 20000+ recursion,GoogleColab often fails to run and restart,I knew that GoogleColab uses two cores of Intel-XEON CPU,but how do I apply more cores of CPU from Google?
You cannot upgrade the GPU and CPU but you can increase the RAM from 12 gb to 25gb just by crashing the session with just by any non ending while loop.
l=[]
while 1:
l.append('nothing')
There is no way to request more CPU/RAM from Google Colaboratory at this point, sorry.
Google Colab Pro recently launched for $9.99 a month (Feb. 2020). Users in the US can get higher resource limits and more frequent access to better resources.
Q&A from the signup page is below:
What kinds of GPUs are available in Colab Pro?
With Colab Pro you get priority access to our fastest GPUs. For example, you may get access to T4 and P100 GPUs at times when non-subscribers get K80s. You also get priority access to TPUs. There are still usage limits in Colab Pro, though, and the types of GPUs and TPUs available in Colab Pro may vary over time.
In the free version of Colab there is very limited access to faster GPUs, and usage limits are much lower than they are in Colab Pro.
How long can notebooks run in Colab Pro?
With Colab Pro your notebooks can stay connected for up to 24 hours, and idle timeouts are relatively lenient. Durations are not guaranteed, though, and idle timeouts may sometimes vary.
In the free version of Colab notebooks can run for at most 12 hours, and idle timeouts are much stricter than in Colab Pro.
How much memory is available in Colab Pro?
With Colab Pro you get priority access to high-memory VMs. These VMs generally have double the memory of standard Colab VMs, and twice as many CPUs. You will be able to access a notebook setting to enable high-memory VMs once you are subscribed. Additionally, you may sometimes be automatically assigned a high-memory VM when Colab detects that you are likely to need it. Resources are not guaranteed, though, and there are usage limits for high memory VMs.
In the free version of Colab the high-memory preference is not available, and users are rarely automatically assigned high memory VMs.
For a paid, high-capability solution, you may want to try Google Cloud Datalab instead
I am running a tornado client-side application on AWS EC2 with a Linux t2.micro instance which includes 1 vCPU and 1 GiB of RAM. I have noticed that I the application performance and speed slows after 75 simultaneous HTTP connections.
Considering that tornado runs on a single process-thread (using an event loop asynchronous architecture) - I am wondering if upgrading to an AWS t2.medium instance with 2 vCPU's would actually help.
In theory, can a single process with a single thread be run on two CPU's? Or is Amazon's vCPU not a real CPU and just a measurement of processing power?
Tornado supports running multiple Python processes to take advantage of a multi-CPU machine. As described in the documentation, you can use Tornado itself to fork those processes, or you can set a load balancer proxying a number of processes started either manually or using some manager like supervisor.
As for your second question, apparently an AWS vCPU is basically a single hyperthread spawned from a real processor core, which should in Python's case amount to the equivalent of a "real" CPU (but I'm far from being an expert on the topic).
I need to run some numpy computation on 5000 files in parallel using python. I have the sequential single machine version implemented already. What would be the easiest way to run the code in parallel (say using an ec2 cluster)? Should I write my own task scheduler and job distribution code?
You can have a look pscheduler Python module. It will allow you to queue up your jobs and run them sequentially. The number of concurrent processes will depend upon the available CPU cores. This program can easily scale up and submit your jobs to remote machines but then would require all your remote machines to use NFS.
I'll be happy to help you further.
I'm looking at using inotify to watch about 200,000 directories for new files. On creation, the script watching will process the file and then it will be removed. Because it is part of a more compex system with many processes, I want to benchmark this and get system performance statistics on cpu, memory, disk, etc while the tests are run.
I'm planning on running the inotify script as a daemon and having a second script generating test files in several of the directories (randomly selected before the test).
I'm after suggestions for the best way to benchmark the performance of something like this, especially the impact it has on the Linux server it's running on.
I would try and remove as many other processes as possible in order to get a repeatable benchmark. For example, I would set up a separate, dedicated server with an NFS mount to the directories. This server would only run inotify and the Python script. For simple server measurements, I would use top or ps to monitor CPU and memory.
The real test is how quickly your script "drains" the directories, which depends entirely on your process. You could profile the script and see where it's spending the time.