alternative (faster) way to 3 nested for loop python - python

How can I make this function faster? (I call it a lot of times and it could result in some speed improvements)
def vectorr(I, J, K):
vect = []
for k in range(0, K):
for j in range(0, J):
for i in range(0, I):
vect.append([i, j, k])
return vect

You can try to take a look at itertools.product
Equivalent to nested for-loops in a generator expression. For example,
product(A, B) returns the same as ((x,y) for x in A for y in B).
The nested loops cycle like an odometer with the rightmost element
advancing on every iteration. This pattern creates a lexicographic
ordering so that if the input’s iterables are sorted, the product
tuples are emitted in sorted order.
Also no need in 0 while calling range(0, I) and etc - use just range(I)
So in your case it can be:
import itertools
def vectorr(I, J, K):
return itertools.product(range(K), range(J), range(I))

You said you want it to be faster. Let's use NumPy!
import numpy as np
def vectorr(I, J, K):
arr = np.empty((I*J*K, 3), int)
arr[:,0] = np.tile(np.arange(I), J*K)
arr[:,1] = np.tile(np.repeat(np.arange(J), I), K)
arr[:,2] = np.repeat(np.arange(K), I*J)
return arr
There may be even more elegant tweaks possible here, but that's a basic tiling that gives the same result (but as a 2D array rather than a list of lists). The code for this is all implemented in C, so it's very, very fast--this may be important if the input values may get somewhat large.

The other answers are more thorough and, in this specific case at least, better, but in general, if you're using Python 2, and for large values of I, J, or K, use xrange() instead of range(). xrange gives a generator-like object, instead of constructing a list, so you don't have to allocate memory for the entire list.
In Python 3, range works like Python 2's xrange.

import numpy
def vectorr(I,J,K):
val = numpy.indices( (I,J,K))
val.shape = (3,-1)
return val.transpose() # or val.transpose().tolist()

Related

How can I optimize loops through a multidimensional array?

After gathering some 3D netCDF data on Python 3, I am in the process of looping through each x,y data point to calculate another variable. The calculation of this variable is dependent upon the z for a given x,y point. The code seems to be running correctly but is awfully slow; I am wondering if anyone has suggestions on how to optimize the code to have it run more quickly.
I've gone from a lengthier code that defined many intermediate variables to something rather bare bones, which is shown here. Even after trimming the code, it runs slowly (i.e., a few minutes for each i in the outer for loop).
for i in range(0,217):
print(i)
for j in range(0,301):
for k in range(10,30):
if (data.variables[longvars[v][2]][0][k][i][j]-data.variables[longvars[v][3]][0][i][j]) <= 3000.0:
break
if (abs(data.variables[longvars[v][2]][0][k][i][j]-data.variables[longvars[v][3]][0][i][j])-3000.) \
< (abs(data.variables[longvars[v][2]][0][k-1][i][j]-data.variables[longvars[v][3]][0][i][j])-3000.):
lev = k
else:
lev = k-1
newd[i][j] = np.sqrt(((data.variables[longvars[v][0]][0][lev][i][j]-data.variables[longvars[v][4]][0][0][i][j])**2)+((data.variables[longvars[v][1]][0][lev][i][j]-data.variables[longvars[v][5]][0][0][i][j])**2))
I imagine there may be a way to do this with another array that stores the correct z (k) level for each x,y (i,j) point, then runs the calculation over the entire array of data. However, I don't know that it would be any faster. I appreciate any help that folks can provide!
The logic looks sound, but we can optimize it a bit further using generators and comprehensions.
Let's isolate the inner logic into a function called findZValue.
def findZValue(v, i, j, variables, longvars, np):
Forgive me if I am reading this wrong, but it looks like you are trying to find the index of the value closest to 3000? If so first we will make a generator that returns a tuple containing the index and the absolute value of "variable - variable - 3000":
def findZValue(v, i, j, variables, longvars, np):
lev = ((k, abs(variables[longvars[v][2]][0][k][i][j] - variables[longvars[v][3]][0][i][j] - 3000)) for k in range(10, 30))
In order to get the value we want, we wrap the whole thing in a min function (with the key saying we want it sorted by the second value) and specify we want to get the index (i.e. the first value in the tuple returned by min):
def findZValue(v, i, j, variables, longvars, np):
lev = min(((k, abs(variables[longvars[v][2]][0][k][i][j] - variables[longvars[v][3]][0][i][j] - 3000)) for k in range(10, 30)), key = lambda t: t[1])[0]
For the value put into "newd" it looks like you are taking the root of the sum of the squares (i.e. its normalization or magnitude). Luckily numpy (which is what I assume "np" is) has a built-in-method for finding the magnitude/normalization of an array: np.linalg.norm. All we have to do is put the other values into an np.array and then call it on them:
def findZValue(v, i, j, variables, longvars, np):
lev = min(((k, abs(variables[longvars[v][2]][0][k][i][j] - variables[longvars[v][3]][0][i][j] - 3000)) for k in range(10, 30)), key = lambda t: t[1])[0]
return np.linalg.norm(np.array(variables[longvars[v][0]][0][lev][i][j]-variables[longvars[v][4]][0][0][i][j], variables[longvars[v][1]][0][lev][i][j]-variables[longvars[v][5]][0][0][i][j]))
Now we can put the entire loop into a nested comprehension:
newd = [[findZValue(v, i, j, data.variables, longvars, np) for j in range(301)] for i in range(217)]
def findZValue(v, i, j, variables, longvars, np):
lev = min(((k, abs(variables[longvars[v][2]][0][k][i][j] - variables[longvars[v][3]][0][i][j] - 3000)) for k in range(10, 30)), key = lambda t: t[1])[0]
return np.linalg.norm(np.array(variables[longvars[v][0]][0][lev][i][j]-variables[longvars[v][4]][0][0][i][j], variables[longvars[v][1]][0][lev][i][j]-variables[longvars[v][5]][0][0][i][j]))
Using generators and comprehensions should speed things up over using for loops. But if you really want to crank things up we can use "multiprocessing". Specifically, a multiprocessing pool. In order to do so we will need to create a second function to handle each vector (this is due to restrictions on how multiprocessing pools work):
from multiprocessing import Pool
def findZValue(v, i, j, variables, longvars, np):
lev = min(((k, abs(variables[longvars[v][2]][0][k][i][j] - variables[longvars[v][3]][0][i][j] - 3000)) for k in range(10, 30)), key = lambda t: t[1])[0]
return np.linalg.norm(np.array(variables[longvars[v][0]][0][lev][i][j]-variables[longvars[v][4]][0][0][i][j], variables[longvars[v][1]][0][lev][i][j]-variables[longvars[v][5]][0][0][i][j]))
def findZValuesForVector(vector):
return [findZValue(*values) for values in vector]
with Pool(processes=4) as pool:
newd = pool.map(findZValuesForVector, [[[v, i, j, data.variables, longvars, np] for j in range(301)] for i in range(217)])
You can alter the number of "processes" created for the pool to see what gives you the best results.

My code is very slow. How to optimize it? Python

def function_1(arr):
return [j for i in range(len(arr)) for j in range(len(arr))
if np.array(arr)[i] == np.sort(arr)[::-1][j]]
An arrarr array is given. It is required for each position [i] to find the arriarri element number in the arrarr array, sorted in descending order. All values ​​of the arrarr array are different.
I have to write func in 1 line. It is working, but very slowly. I have to do this:
np.random.seed(42)
arr = function_1(np.random.uniform(size=1000000))
print(arr[7] + arr[42] + arr[445677] + arr[53422])
Please help to optimize the code.
You are repeatedly sorting and reversing the array, but the result of that operation is independent of the current value of i or j. The simple thing to do is to pre-compute that, then use its value in the list comprehension.
For that matter, range(len(arr)) can also be computed once.
Finally, arr is already an array; you don't need to make a copy each time through the i loop.
def function_1(arr):
arr_sr = np.sort(arr)[::-1]
r = range(len(arr))
return [j for i in r for j in r if arr[i] == arr_sr[j]]
Fitting this into a single line becomes trickier. Aside from extremely artificial outside constraints, there is no reason to do so, but once Python 3.8 is released, assignment expressions will make it simpler to do so. I think the following would be equivalent.
def function_1(arr):
return [j for i in (r:=range(len(arr))) for j in r if arr[i] == (arr_sr:=np.sort(arr)[::-1])[j]]
Have a think about the steps that are going on in here:
[j
for i in range(len(arr))
for j in range(len(arr))
if np.array(arr)[i] == np.sort(arr)[::-1][j]
]
Suppose your array contains N elements.
You pick an i, N different times
You pick a j N different times
Then for each (i,j) pair you are doing the final line.
That is, you're doing the final line N^2 times.
But in that final line, you're sorting an array containing N elements. That's an NlogN operation. So the complexity of your code is O(N^3.logN).
Try making a sorted copy of the array before your [... for i ... for j ...] is called. That'll reduce the time complexity to O(N^2 + NlogN)
I think...

How to go through a double for loop randomly in python

Consider the following code:
for i in range(size-1):
for j in range(i+1,size):
print((i,j))
I need to go through this for-loop in a random fashion. I attempt to write a generator to do such a thing
def Neighborhood(size):
for i in shuffle(range(size-1)):
for j in shuffle(range(i+1), size):
yield i, j
for i,j in Neighborhood(size):
print((i,j))
However, shuffle cannot be applied to whatever object range is. I do not know how to remedy the situation, and any help is much appreciated. I would prefer a solution avoid converting range to a list, since I need speed. For example, size could be on the order of 30,000 and i will do perform this for loop around 30,000 times.
I also plan to escape the for loop early, so I want to avoid solutions that incorporate shuffle(list(range(size)))
You can use random.sample.
The advantage of using random.sample over random.shuffle, is , it can work on iterators, so in :
Python 3.X you don't need to convert range() to list
In Python 2,X, you can use xrange
Same Code can work in Python 2.X and 3.X
Sample code :
n=10
l1=range(n)
for i in sample(l1,len(l1)):
l2=range(i,n)
for j in sample(l2,len(l2)):
print(i,j)
Edit :
As to why I put in this edit, go through the comments.
def Neighborhood(size):
range1 = range(size-1)
for i in sample(range1, len(range1)):
range2 = range(i+1)
for j in sample(range2, len(range2)):
yield i, j
A simple way to go really random, not row-by-row:
def Neighborhood(size):
yielded = set()
while True:
i = random.randrange(size)
j = random.randrange(size)
if i < j and (i, j) not in yielded:
yield i, j
yielded.add((i, j))
Demo:
for i, j in Neighborhood(30000):
print(i, j)
Prints something like:
2045 5990
224 5588
1577 16076
11498 15640
15219 28006
8066 10142
7856 8248
17830 26616
...
Note: I assume you're indeed going to "escape the for loop early". Then this won't have problems with slowing down due to pairs being produced repeatedly.
I don't think you can randomly traverse an Iterator. You can predefine the shuffled lists, though
random iteration in Python
L1 = list(range(size-1))
random.shuffle(L1)
for i in L1:
L2 = list(range(i+1, size))
random.shuffle(L2)
for j in L2:
print((i,j))
Of course, not optimal for large lists

Recursion to replace Looping n times

My question is quite similar to this one here:
Function with varying number of For Loops (python)
However, what I really want is for example:
def loop_rec(n):
for i in range(n):
for j in range(n):
for k in range(n):
#... n loops
#Do something with i, j and k
#Such as i+j+k
The example in the link does not allow the index x to vary.
Something like the answer suggested in that question but to use the indices instead of just x.
def loop_rec(y, n):
if n >= 1:
for x in range(y): # Not just x
loop_rec(y, n - 1)
else:
whatever()
Thanks
For problems where you have to deal with multiple nested loops, python standard library provides a convenient tool called itertools.product
In your particular case, all you have to do is to wrap the range with itertools.product, and specify how many nested loops via the repeat parameter. itertools.product, necessarily performs Cartesian product.
def loop_rec(y, n):
from itertools import product
for elems in product(range(y),repeat=n):
# sum(args)
# x,y,z ....(n) terms = elems
# index elems as elems[0], elems[1], ....
based on your requirement, you might want to use the entire tuple of each Cartesian product, or might want to index the tuple individually, or if you know the loop depth, you can assign it to the variables.
Thanks, but what if I wanted to change the range for each loop, ie. i
in range(n), j in range(n-1), k in range(n-2)
Assuming, you want to vary the range from m to n i.e. range(n), range(n-1), range(n-2), .... range(m). you can rewrite the product as
product(*map(range, range(m,n)))
.

combine two arrays and sort

Given two sorted arrays like the following:
a = array([1,2,4,5,6,8,9])
b = array([3,4,7,10])
I would like the output to be:
c = array([1,2,3,4,5,6,7,8,9,10])
or:
c = array([1,2,3,4,4,5,6,7,8,9,10])
I'm aware that I can do the following:
c = unique(concatenate((a,b))
I'm just wondering if there is a faster way to do it as the arrays I'm dealing with have millions of elements.
Any idea is welcomed. Thanks
Since you use numpy, I doubt that bisec helps you at all... So instead I would suggest two smaller things:
Do not use np.sort, use c.sort() method instead which sorts the array in place and avoids the copy.
np.unique must use np.sort which is not in place. So instead of using np.unique do the logic by hand. IE. first sort (in-place) then do the np.unique method by hand (check also its python code), with flag = np.concatenate(([True], ar[1:] != ar[:-1])) with which unique = ar[flag] (with ar being sorted). To be a bit better, you should probably make the flag operation in place itself, ie. flag = np.ones(len(ar), dtype=bool) and then np.not_equal(ar[1:], ar[:-1], out=flag[1:]) which avoids basically one full copy of flag.
I am not sure about this. But .sort has 3 different algorithms, since your arrays maybe are almost sorted already, changing the sorting method might make a speed difference.
This would make the full thing close to what you got (without doing a unique beforehand):
def insort(a, b, kind='mergesort'):
# took mergesort as it seemed a tiny bit faster for my sorted large array try.
c = np.concatenate((a, b)) # we still need to do this unfortunatly.
c.sort(kind=kind)
flag = np.ones(len(c), dtype=bool)
np.not_equal(c[1:], c[:-1], out=flag[1:])
return c[flag]
Inserting elements into the middle of an array is a very inefficient operation as they're flat in memory, so you'll need to shift everything along whenever you insert another element. As a result, you probably don't want to use bisect. The complexity of doing so would be around O(N^2).
Your current approach is O(n*log(n)), so that's a lot better, but it's not perfect.
Inserting all the elements into a hash table (such as a set) is something. That's going to take O(N) time for uniquify, but then you need to sort which will take O(n*log(n)). Still not great.
The real O(N) solution involves allocated an array and then populating it one element at a time by taking the smallest head of your input lists, ie. a merge. Unfortunately neither numpy nor Python seem to have such a thing. The solution may be to write one in Cython.
It would look vaguely like the following:
def foo(numpy.ndarray[int, ndim=1] out,
numpy.ndarray[int, ndim=1] in1,
numpy.ndarray[int, ndim=1] in2):
cdef int i = 0
cdef int j = 0
cdef int k = 0
while (i!=len(in1)) or (j!=len(in2)):
# set out[k] to smaller of in[i] or in[j]
# increment k
# increment one of i or j
When curious about timings, it's always best to just timeit. Below, i've listed a subset of the various methods and their timings:
import numpy as np
import timeit
import heapq
def insort(a, x, lo=0, hi=None):
if hi is None: hi = len(a)
while lo < hi:
mid = (lo+hi)//2
if x < a[mid]: hi = mid
else: lo = mid+1
return lo, np.insert(a, lo, [x])
size=10000
a = np.array(range(size))
b = np.array(range(size))
def op(a,b):
return np.unique(np.concatenate((a,b)))
def martijn(a,b):
c = np.copy(a)
lo = 0
for i in b:
lo, c = insort(c, i, lo)
return c
def martijn2(a,b):
c = np.zeros(len(a) + len(b), a.dtype)
for i, v in enumerate(heapq.merge(a, b)):
c[i] = v
def larsmans(a,b):
return np.array(sorted(set(a) | set(b)))
def larsmans_mod(a,b):
return np.array(set.union(set(a),b))
def sebastian(a, b, kind='mergesort'):
# took mergesort as it seemed a tiny bit faster for my sorted large array try.
c = np.concatenate((a, b)) # we still need to do this unfortunatly.
c.sort(kind=kind)
flag = np.ones(len(c), dtype=bool)
np.not_equal(c[1:], c[:-1], out=flag[1:])
return c[flag]
Results:
martijn2 25.1079499722
OP 1.44831800461
larsmans 9.91507601738
larsmans_mod 5.87612199783
sebastian 3.50475311279e-05
My specific contribution here is larsmans_mod which avoids creating 2 sets -- it only creates 1 and in doing so cuts execution time nearly in half.
EDIT removed martijn as it was too slow to compete. Also tested for slightly bigger arrays (sorted) input. I also have not tested for correctness in output ...
In addition to the other answer on using bisect.insort, if you are not content with performance, you may try using blist module with bisect. It should improve the performance.
Traditional list insertion complexity is O(n), while blist's complexity on insertion is O(log(n)).
Also, you arrays seem to be sorted. If so, you can use merge function from heapq mudule to utilize the fact that both arrays are presorted. This approach will take an overhead because of crating a new array in memory. It may be an option to consider as this solution's time complexity is O(n+m), while the solutions with insort are O(n*m) complexity (n elements * m insertions)
import heapq
a = [1,2,4,5,6,8,9]
b = [3,4,7,10]
it = heapq.merge(a,b) #iterator consisting of merged elements of a and b
L = list(it) #list made of it
print(L)
Output:
[1, 2, 3, 4, 4, 5, 6, 7, 8, 9, 10]
If you want to delete repeating values, you can use groupby:
import heapq
import itertools
a = [1,2,4,5,6,8,9]
b = [3,4,7,10]
it = heapq.merge(a,b) #iterator consisting of merged elements of a and b
it = (k for k,v in itertools.groupby(it))
L = list(it) #list made of it
print(L)
Output:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
You could use the bisect module for such merges, merging the second python list into the first.
The bisect* functions work for numpy arrays but the insort* functions don't. It's easy enough to use the module source code to adapt the algorithm, it's quite basic:
from numpy import array, copy, insert
def insort(a, x, lo=0, hi=None):
if hi is None: hi = len(a)
while lo < hi:
mid = (lo+hi)//2
if x < a[mid]: hi = mid
else: lo = mid+1
return lo, insert(a, lo, [x])
a = array([1,2,4,5,6,8,9])
b = array([3,4,7,10])
c = copy(a)
lo = 0
for i in b:
lo, c = insort(c, i, lo)
Not that the custom insort is really adding anything here, the default bisect.bisect works just fine too:
import bisect
c = copy(a)
lo = 0
for i in b:
lo = bisect.bisect(c, i)
c = insert(c, i, lo)
Using this adapted insort is much more efficient than a combine and sort. Because b is sorted as well, we can track the lo insertion point and search for the next point starting there instead of considering the whole array each loop.
If you don't need to preserve a, just operate directly on that array and save yourself the copy.
More efficient still: because both lists are sorted, we can use heapq.merge:
from numpy import zeros
import heapq
c = zeros(len(a) + len(b), a.dtype)
for i, v in enumerate(heapq.merge(a, b)):
c[i] = v
Use the bisect module for this:
import bisect
a = array([1,2,4,5,6,8,9])
b = array([3,4,7,10])
for i in b:
pos = bisect.bisect(a, i)
insert(a,[pos],i)
I can't test this right now, but it should work
The sortednp package implements an efficient merge of sorted numpy-arrays, just sorting the values, not making them unique:
import numpy as np
import sortednp
a = np.array([1,2,4,5,6,8,9])
b = np.array([3,4,7,10])
c = sortednp.merge(a, b)
I measured the times and compared them in this answer to a similar post where it outperforms numpy's mergesort (v1.17.4).
Seems like no one mentioned union1d (union1d). Currently, it is a shortcut for unique(concatenate((ar1, ar2))), but its a short name to remember and it has a potential to be optimized by numpy developers since its a library function. It performs very similar to insort from seberg's accepted answer for large arrays. Here is my benchmark:
import numpy as np
def insort(a, b, kind='mergesort'):
# took mergesort as it seemed a tiny bit faster for my sorted large array try.
c = np.concatenate((a, b)) # we still need to do this unfortunatly.
c.sort(kind=kind)
flag = np.ones(len(c), dtype=bool)
np.not_equal(c[1:], c[:-1], out=flag[1:])
return c[flag]
size = int(1e7)
a = np.random.randint(np.iinfo(np.int).min, np.iinfo(np.int).max, size)
b = np.random.randint(np.iinfo(np.int).min, np.iinfo(np.int).max, size)
np.testing.assert_array_equal(insort(a, b), np.union1d(a, b))
import timeit
repetitions = 20
print("insort: %.5fs" % (timeit.timeit("insort(a, b)", "from __main__ import a, b, insort", number=repetitions)/repetitions,))
print("union1d: %.5fs" % (timeit.timeit("np.union1d(a, b)", "from __main__ import a, b; import numpy as np", number=repetitions)/repetitions,))
Output on my machine:
insort: 1.69962s
union1d: 1.66338s

Categories

Resources