I am having the following numpy arrays:
import numpy as np
y2 = np.array([[0.2,0.1,0.8,0.4],[0.4,0.2,0.5,0.1],[0.4,0.2,0.5,0.1]])
y1 = np.array([[1,0,0,0],[0,1,0,0],[0,0,0,1]])
What I am trying to do is to get the position of y1 compared to y2. To be more clear: y1 is the label data and y2 is the predicted data and I want to see in which rank position an algorithm predicted compared with the real data.
I am doing the following:
counter = 0
indexes2 = []
indexes = np.where(y1)[1]
sorted_values = np.argsort(-y2)
for value in sorted_values:
indexes2.append(np.where(value==indexes[counter])[0][0] + 1)
counter += 1
b = np.array(indexes2)
The output is correct:
>>> b
>>> array([2, 2, 3], dtype=int64)
But, I am pretty sure that there is a more elegant way of doing and more optimized. Any hint?
Vectorize the nested loop
We could get rid of the loop by making use of broadcasting -
b = (sorted_values == indexes[:,None]).argmax(1)+1
Some Improvement
For performance, we could optimize the computation of indexes, like so -
indexes = y1.argmax(1)
Bigger Improvement
Additionally, we could optimize on sorted_values computation by avoiding the negation of y2, by doing -
sorted_values2 = np.argsort(y2)
Then, compute b by using broadcasted comparsion as done earlier and subtract the argmax indices from the length of each row. This in effect does the descending ordering along each row as done in the posted question, where we had negation of argsort.
Thus, the final step would be -
b = y2.shape[1] - (sorted_values2 == indexes[:,None]).argmax(1)
Related
I am looking for an efficient way to do the following calculations on millions of arrays. For the values in each array, I want to calculate the mean of the values in the bin with most frequency as demonstrated below. Some of the arrays might contain nan values and other values are float. The loop for my actual data takes too long to finish.
import numpy as np
array = np.array([np.random.uniform(0, 10) for i in range(800,)])
# adding nan values
mask = np.random.choice([1, 0], array.shape, p=[.7, .3]).astype(bool)
array[mask] = np.nan
array = array.reshape(50, 16)
bin_values=np.linspace(0, 10, 21)
f = np.apply_along_axis(lambda a: np.histogram(a, bins=bin_values)[0], 1, array)
bin_start = np.apply_along_axis(lambda a: bin_values[np.argmax(a)], 1, f).reshape(array.shape[0], -1)
bin_end = bin_start + (abs(bin_values[1]-bin_values[0])
values = np.zeros(array.shape[0])
for i in range(array.shape[0]):
values[i] = np.nanmean(array[i][(array[i]>=bin_start[i])*(array[i]<bin_end[i])])
Also, when I run the above code I get three warnings. The first is 'RuntimeWarning: Mean of empty slice' for the line where I calculate the value variable. I set a condition in case I have all nan values to skip this line, but the warning did not go away. I was wondering what the reason is. The other two warnings are for when the less and greater_equal conditions do not meet which make sense to me since they might be nan values.
The arrays that I want to run this algorithm on are independent, but I am already processing them with 12 separate scripts. Running the code in parallel would be an option, however, for now I am looking to improve the algorithm itself.
The reason that I am using lambda function is to run numpy.histogram over an axis since it seems the histogram function does not take an axis as an option. I was able to use a mask and remove the loop from the code. The code is 2 times faster now, but I think it still can be improved more.
I can explain what I want to do in more detail by an example if it clarifies it. Imagine I have 36 numbers which are greater than 0 and smaller than 20. Also, I have bins with equal distance of 0.5 over the same interval (0.0_0.5, 0.5_1.0, 1.0_1.5, … , 19.5_20.0). I want to see if I put the 36 numbers in their corresponding bin what would be the mean of the numbers within the bin which contain the most number of numbers.
Please post your solution if you can think of a faster algorithm.
import numpy as np
# creating an array to test the algorithm
array = np.array([np.random.uniform(0, 10) for i in range(800,)])
# adding nan values
mask = np.random.choice([1, 0], array.shape, p=[.7, .3]).astype(bool)
array[mask] = np.nan
array = array.reshape(50, 16)
# the algorithm
bin_values=np.linspace(0, 10, 21)
# calculating the frequency of each bin
f = np.apply_along_axis(lambda a: np.histogram(a, bins=bin_values)[0], 1, array)
bin_start = np.apply_along_axis(lambda a: bin_values[np.argmax(a)], 1, f).reshape(array.shape[0], -1)
bin_end = bin_start + (abs(bin_values[1]-bin_values[0]))
# creating a mask to get the mean over the bin with maximum frequency
mask = (array>=bin_start) * (array<bin_end)
mask_nan = np.tile(np.nan, (mask.shape[0], mask.shape[1]))
mask_nan[mask] = 1
v = np.nanmean(array * mask_nan, axis = 1)
I want to remove elements from a numpy vector that are closer than a distance d. (I don't want any pair in the array or list that have a smaller distance between them than d but don't want to remove the pair completely otherwise.
for example if my array is:
array([[0. ],
[0.9486833],
[1.8973666],
[2.8460498],
[0.9486833]], dtype=float32)
All I need is to remove either the element with the index 1 or 4 not both of them.
I also need the indices of the elements from the original array that remain in the latent one.
Since the original array is in tensorflow 2.0, I will be happier if conversion to numpy is not needed like above. Because of speed also I prefer not to use another package and stay with numpy or scipy.
Thanks.
Here's a solution, using only a list. Note that this modifies the original list, so if you want to keep the original, copy.deepcopy it.
THRESHOLD = 0.1
def wrangle(l):
for i in range(len(l)):
for j in range(len(l)-1, i, -1):
if abs(l[i] - l[j]) < THRESHOLD:
l.pop(j)
using numpy:
import numpy as np
a = np.array([[0. ],
[0.9486833],
[1.8973666],
[2.8460498],
[0.9486833]])
threshold = 1.0
# The indices of the items smaller than a certain threshold, but larger than 0.
smaller_than = np.flatnonzero(np.logical_and(a < threshold, a > 0))
# Get the first index smaller than threshold
first_index = smaller_than[0]
# Recreate array without this index (bit cumbersome)
new_array = a[np.arange(len(a)) != first_index]
I'm pretty sure this is really easy to recreate in tensorflow, but I don't know how.
If your array is really only 1-d you can flatten it and do something like this:
a=tf.constant(np.array([[0. ],
[0.9486833],
[1.8973666],
[2.8460498],
[0.9486833]], dtype=np.float32))
d = 0.1
flat_a = tf.reshape(a,[-1]) # flatten
a1 = tf.expand_dims(flat_a, 1)
a2 = tf.expand_dims(flat_a, 0)
distance_map = tf.math.abs(a1-a2)
too_small = tf.cast(tf.math.less(dist_map, d), tf.int32)
# 1 at indices i,j if the distance between elements at i and j is less than d, 0 otherwise
upper_triangular_part = tf.linalg.band_part(too_small, 0, -1) - tf.linalg.band_part(too_small, 0,0)
remove = tf.reduce_sum(upper_triangular_part, axis=0)
remove = tf.cast(tf.math.greater(remove, 0), tf.float32)
# 1. at indices where the element should be removed, 0. otherwise
output = flat_a - remove * flat_a
You can access the indices through the remove tensor. If you need the extra dimension you can just use tf.expand_dims at the end of this.
I'm currently trying to find an easy way to do the following operation to an N dimensional array in Python. For simplicity let's start with a 1 dimensional array of size 4.
X = np.array([1,2,3,4])
What I want to do is create a new array, call it Y, such that:
Y = np.array([1,2,3,4],[2,3,4,1],[3,4,1,2],[4,1,2,3])
So what I'm trying to do is create an array Y such that:
Y[:,i] = np.roll(X[:],-i, axis = 0)
I know how to do this using for loops, but I'm looking for a faster method of doing so. The actual array I'm trying to do this to is a 3 dimensional array, call it X. What I'm looking for is a way to find an array Y, such that:
Y[:,:,:,i,j,k] = np.roll(X[:,:,:],(-i,-j,-k),axis = (0,1,2))
I can do this using the itertools.product class using for loops, but this is quite slow. If anyone has a better way of doing this, please let me know. I also have CUPY installed with a GTX-970, so if there's a way of using CUDA to do this faster please let me know. If anyone wants some more context please let me know.
Here is my original code for computing the position space two point correlation function. The array x0 is an n by n by n real valued array representing a real scalar field. The function iterate(j,s) runs j iterations. Each iteration consists of generating a random float between -s and s for each lattice site. It then computes the change in the action dS and accepts the change with a probability of min(1,exp^(-dS))
def momentum(k,j,s):
global Gxa
Gx = numpy.zeros((n,n,t))
for i1 in range(0,k):
iterate(j,s)
for i2,i3,i4 in itertools.product(range(0,n),range(0,n),range(0,n)):
x1 = numpy.roll(numpy.roll(numpy.roll(x0, -i2, axis = 0),-i3, axis = 1),-i4,axis = 2)
x2 = numpy.mean(numpy.multiply(x0,x1))
Gx[i2,i3,i4] = x2
Gxa = Gxa + Gx
Gxa = Gxa/k
Approach #1
We can extend this idea to our 3D array case here. So, simply concatenate with sliced versions along the three dims and then use np.lib.stride_tricks.as_strided based scikit-image's view_as_windows to efficiently get the final output as the strided-view of the concatenated version, like so -
from skimage.util.shape import view_as_windows
X1 = np.concatenate((X,X[:,:,:-1]),axis=2)
X2 = np.concatenate((X1,X1[:,:-1,:]),axis=1)
X3 = np.concatenate((X2,X2[:-1,:,:]),axis=0)
out = view_as_windows(X3,X.shape)
Approach #2
For really large arrays, we might want to initialize the output array and then re-use X3 from earlier approach to assign with slicing it. This slicing process would be faster than the original-rolling. The implementation would be -
m,n,r = X.shape
Yout = np.empty((m,n,r,m,n,r),dtype=X.dtype)
for i in range(m):
for j in range(n):
for k in range(r):
Yout[:,:,:,i,j,k] = X3[i:i+m,j:j+n,k:k+r]
I have a 3D numpy array like a = np.zeros((100,100, 20)). I want to perform an operation over every x,y position that involves all the elements over the z axis and the result is stored in an array like b = np.zeros((100,100)) on the same corresponding x,y position.
Now i'm doing it using a for loop:
d_n = np.array([...]) # a parameter with the same shape as b
for (x,y), v in np.ndenumerate(b):
C = a[x,y,:]
### calculate some_value using C
minv = sys.maxint
depth = -1
C = a[x,y,:]
for d in range(len(C)):
e = 2.5 * float(math.pow(d_n[x,y] - d, 2)) + C[d] * 0.05
if e < minv:
minv = e
depth = d
some_value = depth
if depth == -1:
some_value = len(C) - 1
###
b[x,y] = some_value
The problem now is that this operation is much slower than others done the pythonic way, e.g. c = b * b (I actually profiled this function and it's around 2 orders of magnitude slower than others using numpy built in functions and vectorized functions, over a similar number of elements)
How can I improve the performance of such kind of functions mapping a 3D array to a 2D one?
What is usually done in 3D images is to swap the Z axis to the first index:
>>> a = a.transpose((2,0,1))
>>> a.shape
(20, 100, 100)
And now you can easily iterate over the Z axis:
>>> for slice in a:
do something
The slice here will be each of your 100x100 fractions of your 3D matrix. Additionally, by transpossing allows you to access each of the 2D slices directly by indexing the first axis. For example a[10] will give you the 11th 2D 100x100 slice.
Bonus: If you store the data contiguosly, without transposing (or converting to a contiguous array using a = np.ascontiguousarray(a.transpose((2,0,1))) the access to you 2D slices will be faster since they are mapped contiguosly in memory.
Obviously you want to get rid of the explicit for loop, but I think whether this is possible depends on what calculation you are doing with C. As a simple example,
a = np.zeros((100,100, 20))
a[:,:] = np.linspace(1,20,20) # example data: 1,2,3,.., 20 as "z" for every "x","y"
b = np.sum(a[:,:]**2, axis=2)
will fill the 100 by 100 array b with the sum of the squared "z" values of a, that is 1+4+9+...+400 = 2870.
If your inner calculation is sufficiently complex, and not amenable to vectorization, then your iteration structure is good, and does not contribute significantly to the calculation time
for (x,y), v in np.ndenumerate(b):
C = a[x,y,:]
...
for d in range(len(C)):
... # complex, not vectorizable calc
...
b[x,y] = some_value
There doesn't appear to be a special structure in the 1st 2 dimensions, so you could just as well think of it as 2D mapping on to 1D, e.g. mapping a (N,20) array onto a (N,) array. That doesn't speed up anything, but may help highlight the essential structure of the problem.
One step is to focus on speeding up that C to some_value calculation. There are functions like cumsum and cumprod that help you do sequential calculations on a vector. cython is also a good tool.
A different approach is to see if you can perform that internal calculation over the N values all at once. In other words, if you must iterate, it is better to do so over the smallest dimension.
In a sense this a non-answer. But without full knowledge of how you get some_value from C and d_n I don't think we can do more.
It looks like e can be calculated for all points at once:
e = 2.5 * float(math.pow(d_n[x,y] - d, 2)) + C[d] * 0.05
E = 2.5 * (d_n[...,None] - np.arange(a.shape[-1]))**2 + a * 0.05 # (100,100,20)
E.min(axis=-1) # smallest value along the last dimension
E.argmin(axis=-1) # index of where that min occurs
On first glance it looks like this E.argmin is the b value that you want (tweaked for some boundary conditions if needed).
I don't have realistic a and d_n arrays, but with simple test ones, this E.argmin(-1) matches your b, with a 66x speedup.
How can I improve the performance of such kind of functions mapping a 3D array to a 2D one?
Many functions in Numpy are "reduction" functions*, for example sum, any, std, etc. If you supply an axis argument other than None to such a function it will reduce the dimension of the array over that axis. For your code you can use the argmin function, if you first calculate e in a vectorized way:
d = np.arange(a.shape[2])
e = 2.5 * (d_n[...,None] - d)**2 + a*0.05
b = np.argmin(e, axis=2)
The indexing with [...,None] is used to engage broadcasting. The values in e are floating point values, so it's a bit strange to compare to sys.maxint but there you go:
I, J = np.indices(b.shape)
b[e[I,J,b] >= sys.maxint] = a.shape[2] - 1
* Strickly speaking a reduction function is of the form reduce(operator, sequence) so technically not std and argmin
Given matrix X with T rows and columns k:
T = 50
H = 10
k = 5
X = np.arange(T).reshape(T,1)*np.ones((T,k))
How to perform a rolling cumulative sum of X along the rows axis with lag H?
Xcum = np.zeros((T-H,k))
for t in range(H,T):
Xcum[t-H,:] = np.sum( X[t-H:t,:], axis=0 )
Notice, preferably avoiding strides and convolution, under broadcasting/vectorization best practices.
Sounds like you want the following:
import scipy.signal
scipy.signal.convolve2d(X, np.ones((H,1)), mode='valid')
This of course uses convolve, but the question, as stated, is a convolution operation. Broadcasting would result in a much slower/memory intensive algorithm.
You are actually missing one last row in your rolling sum, this would be the correct output:
Xcum = np.zeros((T-H+1, k))
for t in range(H, T+1):
Xcum[t-H, :] = np.sum(X[t-H:t, :], axis=0)
If you need to do this over an arbitrary axis with numpy only, the simplest will be to do a np.cumsum along that axis, then compute your results as a difference of two slices of that. With your sample array and axis:
temp = np.cumsum(X, axis=0)
Xcum = np.empty((T-H+1, k))
Xcum[0] = temp[H-1]
Xcum[1:] = temp[H:] - temp[:-H]
Another option is to use pandas and its rolling_sum function, which against all odds apparently works on 2D arrays just as you need it to:
import pandas as pd
Xcum = pd.rolling_sum(X, 10)[9:] # first 9 entries are NaN
Here's a strided solution. I realize it's not what you want, but I wondered how it compares.
def foo2(X):
temp = np.lib.stride_tricks.as_strided(X, shape=(H,T-H+1,k),
strides=(k*8,)+X.strides))
# return temp.sum(0)
return np.einsum('ijk->jk', temp)
This times at 35 us, compared to 22 us for Jaime's cumsum solution. einsum is a bit faster than sum(0). temp uses X's data, so there's no memory penalty. But it is harder to understand.