Vectorizing loop that uses past result

Vectorizing loop that uses past result - python

I'm using Python 3.8. I'm trying to stop using loops and instead use vectorization to speed up my code. I'm not too sure how to vectorize an equation that uses the result from the step before.
I know how to do basic vectorization like change this:
for i in range(5):
j=i*2
into this
i=range(5)
j=i*2
but how would I translate something like this, that uses the index from the previous step into a vectorized equation?
j=0
for i in range(1,5):
k=i*2+j
j=i

If a value in the vector depends on previous components, this is not possible to fully parallelise. Depending on the particular operation, however, you can make the algorithm a bit more efficient by using other operations in a smart way: For instance, here the cummulative sum is used to make the computation a bit better than a naive for loop. And here you have another alternative (and a benchmark, although it's done in MATLAB)

Related

Any (fast) way to check a function is constant/almost constant?

I encounter this problem when I want to adopt lazy object pattern, and at some point when some function (one of user inputs) might be constant function. I want to check if the function is constant before feeding it into the loop.
My current solution is a somewhat ugly workaround using np.allclose:
def is_constant(func, arr):
return np.allclose(fun(arr), func(arr[0]))
you can also use things like np.maximum == np.minimum kinds of stuff, which can work slightly faster.
But I was wondering if there is any fast way to do that? As the above was still calculate the function over a somewhat large array already.

Seam Carving compute cost in one loop in python

How to compute the energy cost for Seam Carving in one loop iterating through the rows for python?
Seam Carving Wiki
Like the Dynamic programming in wiki ,I need the min_cost on the last row for possible three cell, and store the cost and path.
And , it is very slow by using two loop , so anyone know how to make it more efficiently?

You can use numba.jit to (possibly) speed up calculations, provided you respect the correct typing. There is no way to avoid 2 loops in a dynamic programming, however you can take a look at improved seam carving (which also yields better results in general)
https://github.com/axu2/improved-seam-carving
https://medium.com/#avik.das/improved-seam-carving-with-forward-energy-88ba84dab7e
from numba import jit
#jit
def calc_seam(img):
...

Python slow on for-loops and hundreds of attribute lookups. Use Numba?

i am working on a simple showcase SPH (smoothed particle hydrodynamics, not relevant here though) implementation in python. The code works, but the execution is kinda sluggish. I often have to compare individual particles with a certain amount of neighbours. In an earlier implementation i kept all particle positions and all distances-to-each-existing-particle in large numpy arrays -> to a certain point this was pretty fast. But visually not pleasing and n**2. Now i want it clean and simple with classes + kdTree to speed up the neighbour search.
this all happens in my global Simulation-Class. Additionally there's a class called "particle" that contains all individual informations. i create hundreds of instances before and loop through them.
def calculate_density(self):
#Using scipys advanced nearest neighbour seach magic
tree = scipy.spatial.KDTree(self.particle_positions)
#here we go... loop through all existing particles. set attributes..
for particle in self.my_particles:
#get the indexes for the nearest neighbours
particle.index_neighbours = tree.query_ball_point(particle.position,self.h,p=2)
#now loop through the list of neighbours and perform some additional math
particle.density = 0
for neighbour in particle.index_neighbours:
r = np.linalg.norm(particle.position - self.my_particles[neighbour].position)
particle.density += particle.mass * (315/(64*math.pi*self.h**9)) *(self.h**2-r**2)**3
i timed 0.2717630863189697s for only 216 particles.
Now i wonder: what to do to speed it up?
Most tools online like "Numba" show how they speed up math-heavy individual functions. I dont know which to choose. On a sidenode, i cannot even get Numba to work in this case. I get a looong error message. And i hoped it is as simple as slapping "#jit" in front of it.
I know its the loops with the attribute calls that crush my performance anyway - not the math or the neighbour search. Sadly iam a novice to programming and i liked the clean approach i got to work here :( any thoughts?

These kind of loop-intensive calculations are slow in Python. In these cases, the first thing you want to do is to see if you can vectorize these operations and get rid of the loops. Then actual calculations will be done in C or Fortran libraries and you will get a lot of speed up. If you can do it usually this is the way to go, since it is much easier to maintain your code.
Some operations, however, are just inherently loop-intensive. In these cases using Cython will help you a lot - you can usually expect 60X+ speed up when you cythonize your loop. I also had similar experiences with numba - when my function becomes complicated, it failed to make it faster, so usually I just use Cython.
Coding in Cython is not too bad - much easier than actually code in C because you can access numpy arrays easily via memoryviews. Another advantage is that it is pretty easy to parallelize the loop with openMP, which can gives you additional 4X+ speedups (of course, depending on the number of cores you have in your machine), so your code can be hundreds times faster.
One issue is that to get the optimal speed, you have to remove all the python calls inside your loop, which means you cannot call numpy/scipy functions. So you have to convert tree.query_ball_point and np.linalg.norm part to Cython for optimal speed.

Truly vectorized routines in python?

Are there really good methods in Python to vectorize matrix like data constructs/containers -operations? What are the according data constructs used?
(I could observe and read that pandas and numpy element-wise operations using vectorize or applymap (may also be the case of apply/apply along axis for rows/columns) are not much of a speed progress compared to for loops.
Given that when trying to use them, you have sometimes to mess with the specificities of the datatypes when it is usually a little bit easier in for loops, what are the benefits? Readability?)
Are there ways to achieve a gap of performance similar to what happens in Matlab when comparing for loops and vectorized operations?
(note it is not to bash numpy or pandas, these are great, whole matrix operations are ok, it is just that when you have to do element-wise operations, it becomes slow).
EDIT to explain the context:
I was only wondering because I received more than once answers mentionning the fact that apply and so on are actually similar to for loops. That's why I was wondering if there were similar functions implemented in such way that it would perform better. The actual problems were varied. They just had to be element-wise, actually, not "doing the sum, product, whatever of the whole matrix". I did a lot of comparisons with differential outputs sometimes based on other matrices, so I had to use complex functions for this. But since the matrices are huge and the implementation depended on "for loop like" mechanisms, in the end I felt that my program would not work well on a more important dataset. Hence my question. But I was not looking for review, only knowledge.

You need to provide a specific example.
Normal per-element MATLAB or Python functions cannot be vectorized in general. The whole point of vectorizing, in both MATLAB and Python, is to off-load the operation onto the much faster underlying C or Fortran libraries that are designed to work on arrays of uniform data. This cannot be done on functions that operate on scalars, either in MATLAB or Python.
For functions that operate on arrays or matrices as a whole (such as mathematical operators, sum, square, etc), MATLAB and Python behave the same. In fact they use most of the same underlying C and Fortran libraries to do their calculations.
So you need to show the actual operation you want to do, and then we can see if there is a way to vectorize it.
If it is working code and you just want to improve its performance, then Code Review stack exchange site is probably a better choice.

How could I enhance the speed of this gamma function and complex numbers - related code?

Context: I am trying to make a very simple gamma function fractal plotting using Python and sympy, initially a very simple version to understand how it works (two mapping colors based on the value of counter=0 or 1).
Basically the code (below) calls to the gamma function, and then makes some complex number comparisons: just checks that the complex number "nextcomplex=gamma(mycomplex)" is closer to "1+0i" than the initial "mycomplex" complex number. The final algorithm to make the fractal is more elaborated than that, but the basic calculations are like those ones, so I would need to enhance the speed of that simple code.
For small intervals it works fine and I can plot the values, but is very slow for big intervals, I am running it right now and it is more than 1 hour and still running for an total of test_limitn x test_limitm=1000x1000 elements.
(for instance up to 100x100 goes fine and I can plot the values and see a very basic fractal)
My question is: how could I enhance the code to make it faster? (e.g. other Python libraries, or there are other functions much better to do the comparisons, etc.)
from sympy import gamma,I,re,im,zoo
test_limitn = 1000
test_limitm = 1000
for m in range(-test_limitm,test_limitm):
for n in range(-test_limitn, test_limitn):
counter = 0
mycomplex = m+(n*I)
nextcomplex = gamma(mycomplex).evalf(1)
if mycomplex!=zoo and nextcomplex!=zoo:
absrenextcomplex = re(nextcomplex)
absimnextcomplex = abs(im(nextcomplex))
if (abs(n) > absimnextcomplex) and (abs(1-m) > abs(1-absrenextcomplex)):
counter = 1
Any hint is very welcomed, thank you!

If you are only doing things numerically, you will be much better off using a numerical library like NumPy. SymPy is designed for symbolic calculations, and although it can perform numeric calculations, it isn't very fast at it.
Beyond that, numba may be able to improve the performance of your loops.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.