how to do scatter/gather operations in numpy

how to do scatter/gather operations in numpy - python

lets say i have arrays:
a = array((1,2,3,4,5))
indices = array((1,1,1,1))
and i perform operation:
a[indices] += 1
the result is
array([1, 3, 3, 4, 5])
in other words, the duplicates in indices are ignored
if I wanted the duplicates not to be ignored, resulting in:
array([1, 6, 3, 4, 5])
how would I go about this?
the example above is somewhat trivial, what follows is exactly what I am trying to do:
def inflate(self,pressure):
faceforces = pressure * cross(self.verts[self.faces[:,1]]-self.verts[self.faces[:,0]], self.verts[self.faces[:,2]]-self.verts[self.faces[:,0]])
self.verts[self.faces[:,0]] += faceforces
self.verts[self.faces[:,1]] += faceforces
self.verts[self.faces[:,2]] += faceforces
def constrain_lengths(self):
vectors = self.verts[self.constraints[:,1]] - self.verts[self.constraints[:,0]]
lengths = sqrt(sum(square(vectors), axis=1))
correction = 0.5 * (vectors.T * (1 - (self.restlengths / lengths))).T
self.verts[self.constraints[:,0]] += correction
self.verts[self.constraints[:,1]] -= correction
def compute_normals(self):
facenormals = cross(self.verts[self.faces[:,1]]-self.verts[self.faces[:,0]], self.verts[self.faces[:,2]]-self.verts[self.faces[:,0]])
self.normals.fill(0)
self.normals[self.faces[:,0]] += facenormals
self.normals[self.faces[:,1]] += facenormals
self.normals[self.faces[:,2]] += facenormals
lengths = sqrt(sum(square(self.normals), axis=1))
self.normals = (self.normals.T / lengths).T
Ive been getting some very buggy results as a result of duplicates being ignored in my indexed assignment operations.

numpy's histogram function is a scatter operation.
a += histogram(indices, bins=a.size, range=(0, a.size))[0]
You may need to take some care because if indices contains integers, small rounding errors could result in values ending up in the wrong bucket. In which case use:
a += histogram(indices, bins=a.size, range=(-0.5, a.size-0.5))[0]
to get each index into the centre of each bin.
Update: this works. But I recommend using #Eelco Hoogendoorn's answer based on numpy.add.at.

Slightly late to the party, but seeing how commonly this operation is required, and the fact that it still does not seem to be a part of standard numpy, ill put my solution here for reference:
def scatter(rowidx, vals, target):
"""compute target[rowidx] += vals, allowing for repeated values in rowidx"""
rowidx = np.ravel(rowidx)
vals = np.ravel(vals)
cols = len(vals)
data = np.ones(cols)
colidx = np.arange(cols)
rows = len(target)
from scipy.sparse import coo_matrix
M = coo_matrix((data,(rowidx,colidx)), shape=(rows, cols))
target += M*vals
def gather(idx, vals):
"""for symmetry with scatter"""
return vals[idx]
A custom C routine in numpy could easily be twice as fast still, eliminating the superfluous allocation of and multiplication with ones, for starters, but it makes a world of difference in performance versus a loop in python.
Aside from performance considerations, it is stylistically much more in line with other numpy-vectorized code to use a scatter operation, rather than mash some for loops in your code.
Edit:
Ok, forget about the above. As of the lastest 1.8 release, doing scatter operations is now directly supported in numpy at optimal efficiency.
def scatter(idx, vals, target):
"""target[idx] += vals, but allowing for repeats in idx"""
np.add.at(target, idx.ravel(), vals.ravel())

I don't know of a way to do it that is any faster than:
for face in self.faces[:,0]:
self.verts[face] += faceforces
You could also make self.faces into an array of 3 dictionaries where the keys correspond to the face and the value to the number of times it needs to be added. You'd then get code like:
for face in self.faces[0]:
self.verts[face] += self.faces[0][face]*faceforces
which might be faster. I do hope that someone comes up with a better way because I wanted to do this when trying to help someone speed-up their code earlier today.

Related

Function to get the time complexity graph of any function/program

I am new in the developing community and I was wondering if there is any function or method you use to decide which algorithm has the best performance and therefore use it instead of any other.
For example:
I am using a decorator to know how long the functions are taking to solve problems, but I dont think that is extrapolable, hence, I was thinking maybe there is a general method or function you use to decide which algorithm to use.
Can you help me please?
Example I was using the time library to know how long two independent functions take to count the negative numbers in an array:
import time
def time_it(func):
def wrapper(*args,**kwargs):
start=time.time()
result=func(*args,**kwargs)
end=time.time()
print(func.__name__ +" took " +str((end-start)*1000) + " mil seconds")
return result
return wrapper
array=[
[-4, -3, -1, 1],
[-2, -2, 1, 2],
[-1, 1, 2, 3],
[1, 2, 4, 5]
]
#time_it
def count_negatives(array):
count=0
for i in array:
for j in i:
if j < 0:
count +=1
return count
#time_it
def count_neg(array):
count=0
row=0
column=0
while row<len(array) and column<len(array[0]):
if array[row][column]<0:
count +=1
column +=1
else:
row +=1
column=0
return count
print(count_negatives(array))
print(count_neg(array))

An algorithm runs according to some input given to it and as well the operations its doing on it (and other variables).
With enough sampling, you can plot graphs (I prefer the matplotlib library) and see which one handles the input you're giving it the best.
Keep in mind these will be only samples from your own "computer" - meaning it may run faster or slower for others.
Here we can use the time_it decorator you've written with a little change:
I would prefer using time.perf_counter() as it uses the fastest clock cycles thus can give more accurate results than just time.time()
The decorator will return the actual time it took in milliseconds.
I'll change some names so it will be easier to follow + remove the return value as we don't care about the answer of whether an array contains a negative number.
import time
def time_it(func):
def wrapper(*args,**kwargs):
start=time.perf_counter()
result=func(*args,**kwargs)
end=time.perf_counter()
return (end - start) * 1_000 # return is in milliseconds!
return wrapper
#time_it
def count_negatives_v1(array):
count = 0
for i in array:
for j in i:
if j < 0:
count += 1
#time_it
def count_negatives_v2(array):
count = 0
row = 0
column = 0
while row < len(array) and column < len(array[0]):
if array[row][column] < 0:
count += 1
column += 1
else:
row += 1
column = 0
We can now build some function that generates list of lists containing random integers between any range we choose! I've chosen that it could generate a list that contain 500-1000 lists, and these "inner" lists contain 50 numbers each can be between -1000 and 1000
def generate_arrays(inner_lists_amount=(500, 1000), numbers=(-1_000, 1_000), inner_lists_length=50):
inner_arrays_count = random.choice(range(*inner_lists_amount))
return [list(random.choices(range(*numbers), k=inner_lists_length)) for _ in range(inner_arrays_count)]
This will generate up inner_arrays_quantity inner arrays to inner arrays, each one containing 50 number between -1000 and 1000.
Then we will pass it to each of the function you've written: (e.g. v1, v2) and get the result, we will save the output as our "y" values on the graph, the "x" values will be the sample index, here I've chosen sample amount of 1000 meaning it will call 1000 times the generate_arrays, pass it to v1 and v2 and save the results for each of these methods in a different "y" value lists:
import matplotlib.pyplot as plt
def build_graphs(sample_count=100):
x = range(sample_count)
y_v1 = []
y_v2 = []
for _ in range(sample_count):
print(_)
arrays = generate_arrays()
y_v1.append(count_negatives_v1(arrays))
y_v2.append(count_negatives_v2(arrays))
plt.plot(x, y_v1, 'r')
plt.plot(x, y_v2, 'g')
plt.show()
Using the matplotlib module we coloured the second method (v2) with green and v1 in red.
This will give us results as following:
Now this is not 100% accurate and will never be as it depends on a lot of things such as:
PC memory
CPU clock rate sampling the time
and much more, but can be somewhat be improved if for each call of the generate_arrays we do X more tests and check the average time it takes on each specific array. Because here we tested only once how much time it takes for v1, v2 to run on each array... however because the sample amount is 1000 it gives fairly the same results as expected.
Note: this does not give the actual order of the functions (big-o) - if you want to do it, then you can give it increasing amount of data, plotting it into excel and use a trendline with the highest R value to find the best graph function that has the nearest to 100%.
More info using the openpyxl module

How to implement the mean function in NumPy without using the np.mean()?

I am learning how to code and wondered how to take the mean without using a builtin function (I know these are optimized and they should be used in real life, this is more of a thought experiment for myself).
For example, this works for vectors:
def take_mean(arr):
sum = 0
for i in arr:
sum += i
mean = sum/np.size(arr)
return mean
But, of course, if I try to pass a matrix, it already fails. Clearly, I can change the code to work for matrices by doing:
def take_mean(arr):
sum = 0
for i in arr:
for j in i:
sum += i
mean = sum/np.size(arr)
return mean
And this fails for vectors and any >=3 dimensional arrays.
So I'm wondering how I can sum over a n-dimensional array without using any built-in functions. Any tips on how to achieve this?

You can use a combination of recursion and loop to achieve your objective without using any of numpy's methods.
import numpy as np
def find_mean_of_arrays(array):
sum = 0
for element in array:
if type(element) == type(np.array([1])):
sum += find_mean_of_arrays(element)
else:
sum += element
return sum/len(array)
Recursion is a powerful tool and it makes code more elegant and readable. This is yet another example

Unless you need to mean across a specific axis, the shape of the array does not matter to compute the mean. Making your first solution possible.
def take_mean(arr):
sum = 0
for i in arr.reshape(-1): # or arr.flatten()
sum += i
mean = sum/np.size(arr)
return mean

How can I quickly populate a 100000x100000 matrix in Python using NumPy?

I really love the data structure and Algorithms.
I am working with a matrix 80000 X 80000 to insert data. I am using numpy. And, my code looks like this:
n = 80000
similarity = np.zeros((n, n), dtype='int8')
for i, photo_i in enumerate(photos):
for j, photo_j in enumerate(photos[i:]):
similarity[i, j] = score(photo_i, photo_j)
if i % 100 == 0:
print(i)
This piece of code is taking too much time. score function is O(1). I was wondering if there could be a better way to do this. I want to plot the data of this matrix in "short time" possible. But the way, I am doing it has the complexity of O(n^2).
Is there "anything", with that it can be "optimized" or maybe by using different Data structure?
I have already read similar questions on SO and they have mentioned pytables. I will definitely try it but don't know yet how. Any suggestion is welcome.
Thanks in advance.

There's a bunch of different things you could do, which all revolve around avoiding the explicit for-loops, which are slow in Python, and delegating to C-level code (either using Python's underlying C runtime or numpy's builtin array creation methods).
Using fromfunction
Numpy has a built-in function for populating a matrix from a function taking coordinates: numpy.fromfunction. This might be faster since it does all the iteration and assignment in C instead of Python.
You'd have to supply it a score-by-coordinates function, e.g.:
def similarity_value(i, j, photos=photos):
return score(photos[i], photos[j])
similarity = numpy.fromfunction(similarity_value, (n, n), dtype='int8')
The photos=photos in the function definition makes the photos array a local of the function and saves some time accessing it on each invocation; this is a common Python micro-optimization technique.
Note that this computes the similarity for the entire matrix instead of just a triangle. To fix this, you could do:
def similarity_value(i, j, photos=photos):
return score(photos[i], photos[j]) if i < j else 0
similarity = numpy.fromfunction(similarity_value, (n, n), dtype='int8')
similarity += similarity.T # fill in other triangle from transposed matrix
Using comprehensions
You could also try creating the similarity matrix from a generator comprehension (or even a list comprehension), again avoiding the explicit for-loops in favor of a comprehension which is faster, but sacrificing the triangle optimization:
similarity = numpy.fromiter((score(photo_i, photo_j)
for photo_i in photos
for photo_j in photos),
shape=(n,n), dtype='int8')
# or:
similarity = numpy.array([score(photo_i, photo_j)
for photo_i in photos
for photo_j in photos],
shape=(n,n), dtype='int8')
To re-introduce the triangle optimization, you could do something like:
similarity = numpy.array([score(photo_i, photo_j) if i < j else 0
for i, photo_i in enumerate(photos)
for j, photo_j in enumerate(photos)],
shape=(n,n), dtype='int8')
similarity += similarity.T
Using triu_indices to populate a triangle directly
Finally, you could use numpy.triu_indices to assign directly into the matrix's upper (and then lower) triangle:
similarity_values = (score(photo_i, photo_j
for photo_i in photos
for photo_j in photos[:i]) # only computing values for the triangle
similarity = np.zeroes((n,n), dtype='int8')
xs, ys = np.triu_indices(n, 1)
similarity[xs, ys] = similarity_values
similarity[ys, xs] = similarity_values
similarity[np.diag_indices(n)] = 1 # assuming score(x, x) == 1
This approach is inspired by this related question: https://codereview.stackexchange.com/questions/107094/create-symmetrical-matrix-from-list-of-values
I don't have a means of benchmarking which of these approaches would work best, but you could experiment and find out. Good luck!

Fastest way to count duplicate integers in two distinct sections of an array

In this snippet of Python code,
fun iterates through the array arr and counts the number of identical integers in two array sections for every section pair. (It simulates a matrix.) This makes n*(n-1)/2*m comparisons in total, giving a time complexity of O(n^2).
Are there programming solutions or ways of reframing this problem that would yield equivalent results but have reduced time complexity?
# n > 500000, 0 < i < n, m = 100
# dim(arr) = n*m, 0 < arr[x] < 4294967311
arr = mp.RawArray(ctypes.c_uint, n*m)
def fun(i):
for j in range(i-1,0,-1):
count = 0
for k in range(0,m):
count += (arr[i*m+k] == arr[j*m+k])
if count/m > 0.7:
return (i,j)
return ()
arr is a shared memory array, therefore it's best kept read-only for simplicity and performance reasons.
arr is implemented as a 1D RawArray from multiprocessing. The reason for this it has by far the fastest performance according to my tests. Using a numpy 2D array, for example, like this:
arr = np.ctypeslib.as_array(mp.RawArray(ctypes.c_uint, n*m)).reshape(n,m)
would provide vectorization capabilities, but increases the total runtime by an order of magnitude - 250s vs. 30s for n = 1500, which amounts to 733%.

Since you can't change the array characteristics at all, I think you're stuck with O(n^2). numpy would gain some vectorization, but would change the access for others sharing the array. Start with the innermost operation:
for k in range(0,m):
count += (arr[i][k] == arr[j][k])
Change this to a one-line assignment:
count = sum(arr[i][k] == arr[j][k] for k in range(m))
Now, if this is truly an array, rather than a list of lists, use the array package's vectorization to simplify the loops, one at a time:
count = sum(arr[i] == arr[j]) # results in a vector of counts
You can now return the j indices where count[j] / m > 0.7. Note that there's no real need to return i for each one: it's constant within the function, and the calling program already has the value. Your array package likely has a pair of vectorized indexing operations that can return those indices. If you're using numpy, those are easy enough to look up on this site.

So after fiddling around some more, I was able to cut down the running time greatly with help from NumPy's vectorization and Numba's JIT compiler. Going back to the original code:
arr = mp.RawArray(ctypes.c_uint, n*m)
def fun(i):
for j in range(i-1,0,-1):
count = 0
for k in range(0,m):
count += (arr[i*m+k] == arr[j*m+k])
if count/m > 0.7:
return (i,j)
return ()
We can leave out the bottom return statement as well as dismiss the idea of using count entirely, leaving us with:
def fun(i):
for j in range(i-1,0,-1):
if sum(arr[i*m+k] == arr[j*m+k] for k in range(m)) > 0.7*m:
return (i,j)
Then, we change the array arr to a NumPy format:
np_arr = np.frombuffer(arr,dtype='int32').reshape(m,n)
The important thing to note here is that we do not use a NumPy array as a shared memory array to be written from multiple processes, avoiding the overhead pitfall.
Finally, we apply Numba's decorator and rewrite the sum function in vector form so that it works with the new array:
import numba as nb
#nb.njit(fastmath=True,parallel=True)
def fun(i):
for j in range(i-1, 0, -1):
if np.sum(np_arr[i] == np_arr[j]) > 0.7*m:
return (i,j)
This reduced the running time to 7.9s, which is definitely a victory for me.

Constraining random number generation in Python

I am trying to create a loop in Python with numpy that will give me a variable "times" with 5 numbers generated randomly between 0 and 20. However, I want there to be one condition: that none of the differences between two adjacent elements in that list are less than 1. What is the best way to achieve this? I tried with the last two lines of code, but this is most likely wrong.
for j in range(1,6):
times = np.random.rand(1, 5) * 20
times.sort()
print times
da = np.diff(times)
if da.sum < 1: break
For instance, for one iteration, this would not be good:
4.25230915 4.36463992 10.35915732 12.39446368 18.46893283
But something like this would be perfect:
1.47166904 6.85610453 10.81431629 12.10176092 15.53569052

Since you are using numpy, you might as well use the built-in functions for uniform random numbers.
def uniform_min_range(a, b, n, min_dist):
while True:
x = np.random.uniform(a, b, size=n)
np.sort(x)
if np.all(np.diff(x) >= min_dist):
return x
It uses the same trial-and-error approach as the previous answer, so depending on the parameters the time to find a solution can be large.

Use a hit and miss approach to guarantee uniform distribution. Here is a straight-Python implementation which should be tweakable for numpy:
import random
def randSpacedPoints(n,a,b,minDist):
#draws n random numbers in [a,b]
# with property that their distance apart is >= minDist
#uses a hit-miss approach
while True:
nums = [a + (b-a)*random.random() for i in range(n)]
nums.sort()
if all(nums[i] + minDist < nums[i+1] for i in range(n-1)):
return nums
For example,
>>> randSpacedPoints(5,0,20,1)
[0.6681336968970486, 6.882374558960349, 9.73325447748434, 11.774594560239493, 16.009157676493903]
If there is no feasible solution this will hang in an infinite loop (so you might want to add a safety parameter which controls the number of trials).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to do scatter/gather operations in numpy - python

Related

Function to get the time complexity graph of any function/program

How to implement the mean function in NumPy without using the np.mean()?

How can I quickly populate a 100000x100000 matrix in Python using NumPy?

Fastest way to count duplicate integers in two distinct sections of an array

Constraining random number generation in Python

Categories

Resources