Seam Carving compute cost in one loop in python - python

How to compute the energy cost for Seam Carving in one loop iterating through the rows for python?
Seam Carving Wiki
Like the Dynamic programming in wiki ,I need the min_cost on the last row for possible three cell, and store the cost and path.
And , it is very slow by using two loop , so anyone know how to make it more efficiently?

You can use numba.jit to (possibly) speed up calculations, provided you respect the correct typing. There is no way to avoid 2 loops in a dynamic programming, however you can take a look at improved seam carving (which also yields better results in general)
https://github.com/axu2/improved-seam-carving
https://medium.com/#avik.das/improved-seam-carving-with-forward-energy-88ba84dab7e
from numba import jit
#jit
def calc_seam(img):
...

Related

How can I optimize nested for loops in Python?

I'm trying to draw a pattern in a 100x100 window using John Zelle’s graphics module, and I am sure that the nested for loop is not the most efficient way to do it.
Any thoughts on how I can optimize the nested for loop? Here is the code:
def Penultimatedigitdesign(x,y,win,color):
for Y in range(y,y+100,40):
for X in range(x+20,x+100,40):
drawFourcircleInSqureTF(win,X,Y,"white",color)
for X in range(x,x+100,40):
drawFourcircleInSqureTF(win,X,Y,color,"white")
for Y in range(y+20,y+100,40):
for X in range(x+20,x+100,40):
drawFourcircleInSqureTF(win,X,Y,color,"white")
for X in range(x,x+100,40):
drawFourcircleInSqureTF(win,X,Y,"white",color)
In general, the strategy for navigating multi-dimensional space optimally is to vectorize it. In other words, structure your starting state as a matrix, apply a function to the matrix, and ask the GPU to give you the output. In Python, numpy is the key library that does this.
In graphics programming, people write "shaders," which are basically functions that operate on a signle pixel. These can be applied to a matrix (i.e., all the pixels on the screen), so that the GPU, which is optimized for this type of work, can carry it out in parallel.
As far as how to specifically implement this for your use case, and for whatever graphics backend you are targeting, I can't say, because there aren't enough details. If this is just a learning exercise, iterating over each pixel and solving the problem procedurally is probably OK.

Vectorizing loop that uses past result

I'm using Python 3.8. I'm trying to stop using loops and instead use vectorization to speed up my code. I'm not too sure how to vectorize an equation that uses the result from the step before.
I know how to do basic vectorization like change this:
for i in range(5):
j=i*2
into this
i=range(5)
j=i*2
but how would I translate something like this, that uses the index from the previous step into a vectorized equation?
j=0
for i in range(1,5):
k=i*2+j
j=i
If a value in the vector depends on previous components, this is not possible to fully parallelise. Depending on the particular operation, however, you can make the algorithm a bit more efficient by using other operations in a smart way: For instance, here the cummulative sum is used to make the computation a bit better than a naive for loop. And here you have another alternative (and a benchmark, although it's done in MATLAB)

Python slow on for-loops and hundreds of attribute lookups. Use Numba?

i am working on a simple showcase SPH (smoothed particle hydrodynamics, not relevant here though) implementation in python. The code works, but the execution is kinda sluggish. I often have to compare individual particles with a certain amount of neighbours. In an earlier implementation i kept all particle positions and all distances-to-each-existing-particle in large numpy arrays -> to a certain point this was pretty fast. But visually not pleasing and n**2. Now i want it clean and simple with classes + kdTree to speed up the neighbour search.
this all happens in my global Simulation-Class. Additionally there's a class called "particle" that contains all individual informations. i create hundreds of instances before and loop through them.
def calculate_density(self):
#Using scipys advanced nearest neighbour seach magic
tree = scipy.spatial.KDTree(self.particle_positions)
#here we go... loop through all existing particles. set attributes..
for particle in self.my_particles:
#get the indexes for the nearest neighbours
particle.index_neighbours = tree.query_ball_point(particle.position,self.h,p=2)
#now loop through the list of neighbours and perform some additional math
particle.density = 0
for neighbour in particle.index_neighbours:
r = np.linalg.norm(particle.position - self.my_particles[neighbour].position)
particle.density += particle.mass * (315/(64*math.pi*self.h**9)) *(self.h**2-r**2)**3
i timed 0.2717630863189697s for only 216 particles.
Now i wonder: what to do to speed it up?
Most tools online like "Numba" show how they speed up math-heavy individual functions. I dont know which to choose. On a sidenode, i cannot even get Numba to work in this case. I get a looong error message. And i hoped it is as simple as slapping "#jit" in front of it.
I know its the loops with the attribute calls that crush my performance anyway - not the math or the neighbour search. Sadly iam a novice to programming and i liked the clean approach i got to work here :( any thoughts?
These kind of loop-intensive calculations are slow in Python. In these cases, the first thing you want to do is to see if you can vectorize these operations and get rid of the loops. Then actual calculations will be done in C or Fortran libraries and you will get a lot of speed up. If you can do it usually this is the way to go, since it is much easier to maintain your code.
Some operations, however, are just inherently loop-intensive. In these cases using Cython will help you a lot - you can usually expect 60X+ speed up when you cythonize your loop. I also had similar experiences with numba - when my function becomes complicated, it failed to make it faster, so usually I just use Cython.
Coding in Cython is not too bad - much easier than actually code in C because you can access numpy arrays easily via memoryviews. Another advantage is that it is pretty easy to parallelize the loop with openMP, which can gives you additional 4X+ speedups (of course, depending on the number of cores you have in your machine), so your code can be hundreds times faster.
One issue is that to get the optimal speed, you have to remove all the python calls inside your loop, which means you cannot call numpy/scipy functions. So you have to convert tree.query_ball_point and np.linalg.norm part to Cython for optimal speed.

What could be wrong with this matrix multiplication benchmark (Matlab vs Numpy)?

Here's the code I wrote for comparing performance of numpy vs Matlab. It just measures the average time taken for matrix multiplication (1701x576 matrix M1 * 576x576 matrix M2).
Matlab version : (M1 is (1701x576) while M2 is (576x576) matrix)
function r = benchmark(M1,M2)
total_time=0;
for i=1:4
for j=1:1500
tic;
a=M1*M2;
tim=toc;
total_time =total_time+tim;
end
end
avg_time = total_time/4
r=avg_time
end
Python version :
def benchmark():
iters = range(1500)
for i in range(4):
for j in iters:
tic = time.time()
a=M1.dot(M2);
toc = time.time() - tic
t_time=t_time+toc;
return t_time/4
Matlab version takes almost ~18.2s , while Python takes ~19.3s . Ive repeated this test multiple times , and Matlab was always performing better than Python (even if smaller difference) in all cases . My understanding is Numpy uses efficient and compiled code for vector operations , and is supposed to be faster than Matlab.
Then , why could Matlab perform faster than Numpy ? The test was done on a 32 core machine .
Where did I go wrong ? or is this expected for Numpy to be slower than Matlab.
Are there ways to improve the performance for Python ?
Edit : Updated the matlab code to fix the loop index/return value error . The error was the result of me trying to edit the names in the snipper to make it presentable just before posting(a bad idea everytime :) ).
[edited to remove the mention of loops; that was my mistake]
Couple things--
First, the multicore nature of the machine doesn't really matter unless you're explicitly using those extra cores (or linking NumPy against a BLAS library that uses multiple cores -- thanks #ali_m). If you're not, it'll run about as fast on a 32-core machine as it will on a 4-core machine (assuming the clock speeds of the cores themselves are roughly equal).
Second, using purely off-the-shelf Matlab vs off-the-shelf NumPy, Matlab generally beats out NumPy. This is a very general statement, though; YMMV. Also, speaking of Matlab, there does indeed appear to be a bug in the looping indices.
Third, this may not be the best benchmark for performance; there may be some unseen caching issues taking place under the hood that aren't obvious. A better one would be to randomly generate the matrices on-the-fly in each iteration and multiply them, but even this could be problematic depending on the random number generator.
There are two major issues I can see with the test.
The first is that you are using global variable lookup in Python while you are using local variable lookup in MATLAB. Global variable lookup in Python is relatively slow. Making sure the variables are local like they are in MATLAB will affect the performance.
The second is that you are re-doing the same calculation over and over. MATLAB has a JIT for loops and numpy has a cache for calculations, both of which can reduce the time for repeated calculations.
So to make the comparisons more equal and reliable, you should create new, random matrices each time through the loop. This will prevent caching and JIT from messing up your results, and will make sure the variables are all local.
There is a bug in your Matlab code. It appears that you are using the same loop control variable in nested loops.
The outer loop actually only runs once.
Edit: The outer loop actually runs the correct number of times. The two loop control variables seem to be independent.

Running parallel iterations

I am trying to run a sort of simulations where there are fixed parameters i need to iterate on and find out the combinations which has the least cost.I am using python multiprocessing for this purpose but the time consumed is too high.Is there something wrong with my implementation?Or is there better solution.Thanks in advance
import multiprocessing
class Iters(object):
#parameters for iterations
iters['cwm']={'min':100,'max':130,'step':5}
iters['fx']={'min':1.45,'max':1.45,'step':0.01}
iters['lvt']={'min':106,'max':110,'step':1}
iters['lvw']={'min':9.2,'max':10,'step':0.1}
iters['lvk']={'min':3.3,'max':4.3,'step':0.1}
iters['hvw']={'min':1,'max':2,'step':0.1}
iters['lvh']={'min':6,'max':7,'step':1}
def run_mp(self):
mps=[]
m=multiprocessing.Manager()
q=m.list()
cmain=self.iters['cwm']['min']
while(cmain<=self.iters['cwm']['max']):
t2=multiprocessing.Process(target=mp_main,args=(cmain,iters,q))
mps.append(t2)
t2.start()
cmain=cmain+self.iters['cwm']['step']
for mp in mps:
mp.join()
r1=sorted(q,key=lambda x:x['costing'])
returning=[r1[0],r1[1],r1[2],r1[3],r1[4],r1[5],r1[6],r1[7],r1[8],r1[9],r1[10],r1[11],r1[12],r1[13],r1[14],r1[15],r1[16],r1[17],r1[18],r1[19]]
self.counter=len(q)
return returning
def mp_main(cmain,iters,q):
fmain=iters['fx']['min']
while(fmain<=iters['fx']['max']):
lvtmain=iters['lvt']['min']
while (lvtmain<=iters['lvt']['max']):
lvwmain=iters['lvw']['min']
while (lvwmain<=iters['lvw']['max']):
lvkmain=iters['lvk']['min']
while (lvkmain<=iters['lvk']['max']):
hvwmain=iters['hvw']['min']
while (hvwmain<=iters['hvw']['max']):
lvhmain=iters['lvh']['min']
while (lvhmain<=iters['lvh']['max']):
test={'cmain':cmain,'fmain':fmain,'lvtmain':lvtmain,'lvwmain':lvwmain,'lvkmain':lvkmain,'hvwmain':hvwmain,'lvhmain':lvhmain}
y=calculations(test,q)
lvhmain=lvhmain+iters['lvh']['step']
hvwmain=hvwmain+iters['hvw']['step']
lvkmain=lvkmain+iters['lvk']['step']
lvwmain=lvwmain+iters['lvw']['step']
lvtmain=lvtmain+iters['lvt']['step']
fmain=fmain+iters['fx']['step']
def calculations(test,que):
#perform huge number of calculations here
output={}
output['data']=test
output['costing']='foo'
que.append(output)
x=Iters()
x.run_thread()
From a theoretical standpoint:
You're iterating every possible combination of 6 different variables. Unless your search space is very small, or you wanted just a very rough solution, there's no way you'll get any meaningful results within reasonable time.
i need to iterate on and find out the combinations which has the least cost
This very much sounds like an optimization problem.
There are many different efficient ways of dealing with these problems, depending on the properties of the function you're trying to optimize. If it has a straighforward "shape" (it's injective), you can use a greedy algorithm such as hill climbing, or gradient descent. If it's more complex, you can try shotgun hill climbing.
There are a lot more complex algorithms, but these are the basic, and will probably help you a lot in this situation.
From a more practical programming standpoint:
You are using very large steps - so large, in fact, that you'll only probe the function 19,200. If this is what you want, it seems very feasible. In fact, if I comment the y=calculations(test,q), this returns instantly on my computer.
As you indicate, there's a "huge number of calculations" there - so maybe that is your real problem, and not the code you're asking for help with.
As to multiprocessing, my honest advise is to not use it until you already have your code executing reasonably fast. Unless you're running a supercomputing cluster (you're not programming a supercomputing cluster in python, are you??), parallel processing will get you speedups of 2-4x. That's absolutely negligible, compared to the gains you get by the kind of algorithmic changes I mentioned.
As an aside, I don't think I've ever seen that many nested loops in my life (excluding code jokes). If don't want to switch to another algorithm, you might want to consider using itertools.product together with numpy.arange

Categories

Resources