I want to get the top N (maximal) args & values across an entire numpy matrix, as opposed to across a single dimension (rows / columns).
Example input (with N=3):
import numpy as np
mat = np.matrix([[9,8, 1, 2], [3, 7, 2, 5], [0, 3, 6, 2], [0, 2, 1, 5]])
print(mat)
[[9 8 1 2]
[3 7 2 5]
[0 3 6 2]
[0 2 1 5]]
Desired output: [9, 8, 7]
Since max isn't transitive across a single dimension, going by rows or columns doesn't work.
# by rows, no 8
np.squeeze(np.asarray(mat.max(1).reshape(-1)))[:3]
array([9, 7, 6])
# by cols, no 7
np.squeeze(np.asarray(mat.max(0)))[:3]
array([9, 8, 6])
I have code that works, but looks really clunky to me.
# reshape into single vector
mat_as_vector = np.squeeze(np.asarray(mat.reshape(-1)))
# get top 3 arg positions
top3_args = mat_as_vector.argsort()[::-1][:3]
# subset the reshaped matrix
top3_vals = mat_as_vector[top3_args]
print(top3_vals)
array([9, 8, 7])
Would appreciate any shorter way / more efficient way / magic numpy function to do this!
Using numpy.partition() is significantly faster than performing full sort for this purpose:
np.partition(np.asarray(mat), mat.size - N, axis=None)[-N:]
assuming N<=mat.size.
If you need the final result also be sorted (besides being top N), then you need to sort previous result (but presumably you will be sorting a smaller array than the original one):
np.sort(np.partition(np.asarray(mat), mat.size - N, axis=None)[-N:])
If you need the result sorted from largest to lowest, post-pend [::-1] to the previous command:
np.sort(np.partition(np.asarray(mat), mat.size - N, axis=None)[-N:])[::-1]
One way may be with flatten and sorted and slice top n values:
sorted(mat.flatten().tolist()[0], reverse=True)[:3]
Result:
[9, 8, 7]
The idea is from this answer: How to get indices of N maximum values in a numpy array?
import numpy as np
import heapq
mat = np.matrix([[9,8, 1, 2], [3, 7, 2, 5], [0, 3, 6, 2], [0, 2, 1, 5]])
ind = heapq.nlargest(3, range(mat.size), mat.take)
print(mat.take(ind).tolist()[0])
Output
[9, 8, 7]
Related
Question:
Create a array x of shape (n_row.n_col), having first n natural numbers.
N = 30, n_row= 6, n_col=5
Print elements, overlapping first two rows and last three columns.
Expected output:
[[2 3 4]
[7 8 9]]
My output:
[2 3 7 8]
My approach:
x = np.arange (n)
x= x.reshape(n_row,n_col)
a= np. intersect1d(x[0:2,],x[:,-3:-1])
print (a)
I couldn't think of anything else, please help
The overlap of row and column slices of the same array is just the combined slice
import numpy as np
x = np.arange(30).reshape(6, 5)
x[:2,-3:]
Output
array([[2, 3, 4],
[7, 8, 9]])
To compute the overlap by finding same elements is odd but possible
r, c = np.where(np.isin(x, np.intersect1d(x[:2], x[:,-3:])))
x[np.ix_(np.unique(r), np.unique(c))]
Output
array([[2, 3, 4],
[7, 8, 9]])
I think the answers are a bit convoluted...
Personally from the original question:
Question: Create a array x of shape (n_row.n_col), having first n natural numbers. N = 30, n_row= 6, n_col=5
Print elements, overlapping first two rows and last three columns.
I understand "sub-indexing":
N, n_rows, n_cols = 30, 6, 5
a = np.arange(N).reshape(n_rows, n_cols)
print(a[:2, -3:])
Output:
[[2, 3, 4],
[7, 8, 9]]
I want to split my numpy array into separate arrays. The separation must be based on the index. The split count is given by the user.
For example,
The input array: my_array=[1,2,3,4,5,6,7,8,9,10]
If user gives split count =2,
then, the split must be like
my_array1=[1,3,5,7,9]
my_array2=[2,4,6,8,10]
if user gives split count=3, then
the output array must be
my_array1=[1,4,7,10]
my_array2=[2,5,8]
my_array3=[3,6,9]
could anyone please explain, I did for split count 2 using even odd concept
for i in range(len(outputarray)):
if i%2==0:
even_array.append(outputarray[i])
else:
odd_array.append(outputarray[i])
I don't know how to do the split for variable counts like 3,4,5 based on the index.
You can use indexing by vector (aka fancy indexing) for it:
>>> a=np.array([1,2,3,4,5,6,7,8,9,10])
>>> n = 3
>>> [a[np.arange(i, len(a), n)] for i in range(n)]
[array([ 1, 4, 7, 10]), array([2, 5, 8]), array([3, 6, 9])]
Explanation
arange(i, len(a), n) generates an array of integers starting with i, spanning no longer than len(a) with step n. For example, for i = 0 it generates an array
>>> np.arange(0, 10, 3)
array([0, 3, 6, 9])
Now when you index an array with another array, you get the elements at the requested indices:
>>> a[[0, 3, 6, 9]]
array([ 1, 4, 7, 10])
These steps are repeated for i=1..2 resulting in the desired list of arrays.
Here is a python-only way of doing your task
def split_array(array, n=3):
arrays = [[] for _ in range(n)]
for x in range(n):
for i in range(n):
arrays[i] = [array[x] for x in range(len(array)) if x%n==i]
return arrays
Input:
my_array=[1,2,3,4,5,6,7,8,9,10]
print(split_array(my_array, n=3))
Output:
[[1, 4, 7, 10], [2, 5, 8], [3, 6, 9]]
I have three numpy arrays. The shape of the first is (413, 2), the shape of the second is (176, 2), and the shape of the third is (589,). If you'll notice, 413 + 176 = 589. What I want to accomplish is to use the 589 values of the third np array and make the first two arrays of shapes (413, 3) and (176, 3) respectively.
So, what I want is to take the values in the third np array and append them to the columns of the first and second np arrays. I can do the logic for applying to the first and then using the offset of the length of the first to continue appending to the second with the correct values. I suppose I could also combine np arrays 1 and 2, they are separated for a reason though because of my data preprocessing.
To put it visually if that helps, what I have is like this:
Array 1:
[[1 2]
[3 4]
[4 5]]
Array 2:
[[6 7]
[8 9]
[10 11]]
Array 3:
[1 2 3 4 5 6]
And what I want to have is:
Array 1:
[[1 2 1]
[3 4 2]
[4 5 3]]
Array 2:
[[6 7 4]
[8 9 5]
[10 11 6]]
I've tried using np.append, np.concatenate, and np.vstack but have not been able to achieve what I am looking for. I am relatively new to using numpy, and Python in general, so I imagine I am just using these tools incorrectly.
Many thanks for any help that can be offered! This is my first time asking a question here so if I did anything wrong or left anything out please let me know.
Split the third array using the length of array1, then horizontally stack them. You need to use either np.newaxis or array.reshape to change the dimensionality of the slice of array3.
import numpy as np
array1 = np.array(
[[1, 2],
[3, 4],
[4, 5]]
)
array2 = np.array(
[[6, 7],
[8, 9],
[10, 11]]
)
array3 = np.array([1, 2, 3, 4, 5, 6])
array13 = np.hstack([array1, array3[:len(array1), np.newaxis]])
array23 = np.hstack([array1, array3[len(array1):, np.newaxis]])
Outputs:
array13
array([[1, 2, 4],
[3, 4, 5],
[4, 5, 6]])
array23
array([[ 6, 7, 4],
[ 8, 9, 5],
[10, 11, 6]])
I haven't found a simple solution to move elements in a NumPy array.
Given an array, for example:
>>> A = np.arange(10).reshape(2,5)
>>> A
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
and given the indexes of the elements (columns in this case) to move, for example [2,4], I want to move them to a certain position and the consecutive places, for example to p = 1, shifting the other elements to the right. The result should be the following:
array([[0, 2, 4, 1, 3],
[5, 7, 9, 6, 8]])
You can create a mask m for the sorting order. First we set the columns < p to -1, then the to be inserted columns to 0, the remaining columns remain at 1. The default sorting kind 'quicksort' is not stable, so to be safe we specify kind='stable' when using argsort to sort the mask and create a new array from that mask:
import numpy as np
A = np.arange(10).reshape(2,5)
p = 1
c = [2,4]
m = np.full(A.shape[1], 1)
m[:p] = -1 # leave up to position p as is
m[c] = 0 # insert columns c
print(A[:,m.argsort(kind='stable')])
#[[0 2 4 1 3]
# [5 7 9 6 8]]
Is it possible to calculate the mean of multiple arrays, when they may have different lengths? I am using numpy. So let's say I have:
numpy.array([[1, 2, 3, 4, 8], [3, 4, 5, 6, 0]])
numpy.array([[5, 6, 7, 8, 7, 8], [7, 8, 9, 10, 11, 12]])
numpy.array([[1, 2, 3, 4], [5, 6, 7, 8]])
Now I want to calculate the mean, but ignoring elements that are 'missing' (Naturally, I can not just append zeros as this would mess up the mean)
Is there a way to do this without iterating through the arrays?
PS. These arrays are all 2-D, but will always have the same amount of coordinates for that array. I.e. the 1st array is 5 and 5, 2nd is 6 and 6, 3rd is 4 and 4.
An example:
np.array([[1, 2], [3, 4]])
np.array([[1, 2, 3], [3, 4, 5]])
np.array([[7], [8]])
This must give
(1+1+7)/3 (2+2)/2 3/1
(3+3+8)/3 (4+4)/2 5/1
And graphically:
[1, 2] [1, 2, 3] [7]
[3, 4] [3, 4, 5] [8]
Now imagine that these 2-D arrays are placed on top of each other with coordinates overlapping contributing to that coordinate's mean.
I often needed this for plotting mean of performance curves with different lengths.
Solved it with simple function (based on answer of #unutbu):
def tolerant_mean(arrs):
lens = [len(i) for i in arrs]
arr = np.ma.empty((np.max(lens),len(arrs)))
arr.mask = True
for idx, l in enumerate(arrs):
arr[:len(l),idx] = l
return arr.mean(axis = -1), arr.std(axis=-1)
y, error = tolerant_mean(list_of_ys_diff_len)
ax.plot(np.arange(len(y))+1, y, color='green')
So applying that function to the list of above-plotted curves yields the following:
numpy.ma.mean allows you to compute the mean of non-masked array elements. However, to use numpy.ma.mean, you have to first combine your three numpy arrays into one masked array:
import numpy as np
x = np.array([[1, 2], [3, 4]])
y = np.array([[1, 2, 3], [3, 4, 5]])
z = np.array([[7], [8]])
arr = np.ma.empty((2,3,3))
arr.mask = True
arr[:x.shape[0],:x.shape[1],0] = x
arr[:y.shape[0],:y.shape[1],1] = y
arr[:z.shape[0],:z.shape[1],2] = z
print(arr.mean(axis = 2))
yields
[[3.0 2.0 3.0]
[4.66666666667 4.0 5.0]]
The below function also works by adding columns of arrays of different lengths:
def avgNestedLists(nested_vals):
"""
Averages a 2-D array and returns a 1-D array of all of the columns
averaged together, regardless of their dimensions.
"""
output = []
maximum = 0
for lst in nested_vals:
if len(lst) > maximum:
maximum = len(lst)
for index in range(maximum): # Go through each index of longest list
temp = []
for lst in nested_vals: # Go through each list
if index < len(lst): # If not an index error
temp.append(lst[index])
output.append(np.nanmean(temp))
return output
Going off of your first example:
avgNestedLists([[1, 2, 3, 4, 8], [5, 6, 7, 8, 7, 8], [1, 2, 3, 4]])
Outputs:
[2.3333333333333335,
3.3333333333333335,
4.333333333333333,
5.333333333333333,
7.5,
8.0]
The reason np.amax(nested_lst) or np.max(nested_lst) was not used in the beginning to find the max value is because it will return an array if the nested lists are of different sizes.
OP, I know you were looking for a non-iterative built-in solution, but the following really only takes 3 lines (2 if you combine transpose and means but then it just gets messy):
arrays = [
np.array([1,2], [3,4]),
np.array([1,2,3], [3,4,5]),
np.array([7], [8])
]
mean = lambda x: sum(x)/float(len(x))
transpose = [[item[i] for item in arrays] for i in range(len(arrays[0]))]
means = [[mean(j[i] for j in t if i < len(j)) for i in range(len(max(t, key = len)))] for t in transpose]
Outputs:
>>>means
[[3.0, 2.0, 3.0], [4.666666666666667, 4.0, 5.0]]