How can I optimize nested for loops in Python?

How can I optimize nested for loops in Python? - python

I'm trying to draw a pattern in a 100x100 window using John Zelle’s graphics module, and I am sure that the nested for loop is not the most efficient way to do it.
Any thoughts on how I can optimize the nested for loop? Here is the code:
def Penultimatedigitdesign(x,y,win,color):
for Y in range(y,y+100,40):
for X in range(x+20,x+100,40):
drawFourcircleInSqureTF(win,X,Y,"white",color)
for X in range(x,x+100,40):
drawFourcircleInSqureTF(win,X,Y,color,"white")
for Y in range(y+20,y+100,40):
for X in range(x+20,x+100,40):
drawFourcircleInSqureTF(win,X,Y,color,"white")
for X in range(x,x+100,40):
drawFourcircleInSqureTF(win,X,Y,"white",color)

In general, the strategy for navigating multi-dimensional space optimally is to vectorize it. In other words, structure your starting state as a matrix, apply a function to the matrix, and ask the GPU to give you the output. In Python, numpy is the key library that does this.
In graphics programming, people write "shaders," which are basically functions that operate on a signle pixel. These can be applied to a matrix (i.e., all the pixels on the screen), so that the GPU, which is optimized for this type of work, can carry it out in parallel.
As far as how to specifically implement this for your use case, and for whatever graphics backend you are targeting, I can't say, because there aren't enough details. If this is just a learning exercise, iterating over each pixel and solving the problem procedurally is probably OK.

Related

Python: replicate Matlab's generateMesh function

I am in the process of converting a codebase from Matlab to Python and noticed that generateMesh gets called on some polygons before carrying out a finite element analysis.
What I need to get as an output is a list of all the elements and nodes, with their respective coordinates. I don't need any GUI, just the output nodes and elements information.
The best solution I came across is something like this done with gmsh. I know gmsh is a pretty big library and I am afraid it might be a little too much for my needs. Is there any other package you'd suggest?
Triangular meshes are fine for the moment, but I would like the package to support tetrahedral meshes as well in case it's needed in the future.
Thank you
edit: I forgot to mention that I am only dealing with 2D geometries, as the triangular and tetrahedral elements imply.

How(/if) to use dask to transpose distributed 3D numpy arrays?

My problem is to perform 3 matrix multiplications on a 3D numpy array A too large to fit in a single processor. In tensorial form I want A_ijk B_km C_jn D_ip (B, C, and D can all fit in memory). I want to know if dask is appropriate for this task (or if another tool might be more suited).
I believe the best approach is to split this operation into each multiplication, and make sure that they are all local. This link has a really useful diagram that summarises what I'm talking about http://www.2decomp.org/1d_mode.html.
In more detail: First, to do A_ijk B_km, I should distribute A over the first two axes, and perform the matrix multiplication over each pencil locally (the first step in the diagram).
Then, I need to transpose the array, making the j axis local to each processor (and splitting over the k (now m) axis), to then perform the next multiplication. (So going from the first to the second step in the diagram). This is where I wonder if dask could help.
I'm aware that this can be done in principle using mpi4py, but the steps are pretty non-trivial, whereas dask arrays have helpful rechunk and transpose methods, which feel relevant to this application.
Does this seem like something well-suited to dask?
If not, is anyone aware of any python libraries that can perform these steps? I know that fftw has routines for doing just this, but I don't know how to write the C-code necessary, or how to get it to interface with python and numpy.
Thanks for any help.

For anyone else in the future, mpi4py does have a transpose method. But it's called Alltoall/Alltoallv. It's not explained in the documentation or tutorial on mpi4py. I found out about it at another tutorial: https://info.gwdg.de/wiki/doku.php?id=wiki:hpc:mpi4py.

Dask implements einsum, which may be what you are after, and there is, of course matmul, if you want to write out the operation. So long as your large matrix A is a Dask array, with reasonable chunk sizes, Dask will parcel out your work without running out of memory.

Seam Carving compute cost in one loop in python

How to compute the energy cost for Seam Carving in one loop iterating through the rows for python?
Seam Carving Wiki
Like the Dynamic programming in wiki ,I need the min_cost on the last row for possible three cell, and store the cost and path.
And , it is very slow by using two loop , so anyone know how to make it more efficiently?

You can use numba.jit to (possibly) speed up calculations, provided you respect the correct typing. There is no way to avoid 2 loops in a dynamic programming, however you can take a look at improved seam carving (which also yields better results in general)
https://github.com/axu2/improved-seam-carving
https://medium.com/#avik.das/improved-seam-carving-with-forward-energy-88ba84dab7e
from numba import jit
#jit
def calc_seam(img):
...

Python slow on for-loops and hundreds of attribute lookups. Use Numba?

i am working on a simple showcase SPH (smoothed particle hydrodynamics, not relevant here though) implementation in python. The code works, but the execution is kinda sluggish. I often have to compare individual particles with a certain amount of neighbours. In an earlier implementation i kept all particle positions and all distances-to-each-existing-particle in large numpy arrays -> to a certain point this was pretty fast. But visually not pleasing and n**2. Now i want it clean and simple with classes + kdTree to speed up the neighbour search.
this all happens in my global Simulation-Class. Additionally there's a class called "particle" that contains all individual informations. i create hundreds of instances before and loop through them.
def calculate_density(self):
#Using scipys advanced nearest neighbour seach magic
tree = scipy.spatial.KDTree(self.particle_positions)
#here we go... loop through all existing particles. set attributes..
for particle in self.my_particles:
#get the indexes for the nearest neighbours
particle.index_neighbours = tree.query_ball_point(particle.position,self.h,p=2)
#now loop through the list of neighbours and perform some additional math
particle.density = 0
for neighbour in particle.index_neighbours:
r = np.linalg.norm(particle.position - self.my_particles[neighbour].position)
particle.density += particle.mass * (315/(64*math.pi*self.h**9)) *(self.h**2-r**2)**3
i timed 0.2717630863189697s for only 216 particles.
Now i wonder: what to do to speed it up?
Most tools online like "Numba" show how they speed up math-heavy individual functions. I dont know which to choose. On a sidenode, i cannot even get Numba to work in this case. I get a looong error message. And i hoped it is as simple as slapping "#jit" in front of it.
I know its the loops with the attribute calls that crush my performance anyway - not the math or the neighbour search. Sadly iam a novice to programming and i liked the clean approach i got to work here :( any thoughts?

These kind of loop-intensive calculations are slow in Python. In these cases, the first thing you want to do is to see if you can vectorize these operations and get rid of the loops. Then actual calculations will be done in C or Fortran libraries and you will get a lot of speed up. If you can do it usually this is the way to go, since it is much easier to maintain your code.
Some operations, however, are just inherently loop-intensive. In these cases using Cython will help you a lot - you can usually expect 60X+ speed up when you cythonize your loop. I also had similar experiences with numba - when my function becomes complicated, it failed to make it faster, so usually I just use Cython.
Coding in Cython is not too bad - much easier than actually code in C because you can access numpy arrays easily via memoryviews. Another advantage is that it is pretty easy to parallelize the loop with openMP, which can gives you additional 4X+ speedups (of course, depending on the number of cores you have in your machine), so your code can be hundreds times faster.
One issue is that to get the optimal speed, you have to remove all the python calls inside your loop, which means you cannot call numpy/scipy functions. So you have to convert tree.query_ball_point and np.linalg.norm part to Cython for optimal speed.

Efficient way of manipulating a disk subsection of square array

I am currently coding numerical simulations of a lattice for a physics project.
I am used to manipulating square subsections of lattices where a variable or degree of freedom sits at each site of the lattice by using a 2d array.
Now, I would like to move to more general subsections, especifically circular ones just as in the picture below:
Only the red area would by dynamical (evolve in time) and so the outside of it would not need to be stored in memory.
I am wondering if you know of an efficient and somewhat natural container for holding such objects.
My only idea is to store the whole thing plus a flat boolean array telling me for each site if it inside or outside, but that could become a huge waste of computational time later.
ps: I will code that in Python and/or c++

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How can I optimize nested for loops in Python? - python

Related

Python: replicate Matlab's generateMesh function

How(/if) to use dask to transpose distributed 3D numpy arrays?

Seam Carving compute cost in one loop in python

Python slow on for-loops and hundreds of attribute lookups. Use Numba?

Efficient way of manipulating a disk subsection of square array

Categories

Resources