Deleting specific numbers from a (2,60) numpy array? - python

I have a numpy array that has a shape of (2,60). Some of the numbers in the first row exceed 30 and I want to filter columns for which the value of the first row is less than 30.
I tried array = array[array < 30] #but that doesn't work
An example of my array is
array = np.array([[30,40,12,12,10,2,30,40],[2,5,75,67,89,5,3,4]])
Expected output:
array = [[12 12 10 2]
[75 67 89 5]]

You are looking for this:
array[:,array[0]<30]
output:
array([[12, 12, 10, 2],
[75, 67, 89, 5]])

Related

Weighted resampling a numpy array

I have a 50 x 4 numpy array and I'd like to repeat the rows to make it a 500 x 4 array. But the catch is, I cannot just repeat the rows along 0th axis. I'd like to have more smaller rows and lesser bigger rows in the expanded array.
The input array has data that looks like this:
[1, 1, 16, 5]
[8, 10, 512, 10]
...
[448, 8192, 409600, 150]
Here, the initial rows are small and the last rows are large. But the scales for each column are different. i.e., 16 might be a very low value for column 2 but a high value for column 1
How can I achieve this using numpy or even lists?
Expected output would be a vector of shape 500 x 4 where each row is taken from the input vector, and repeated for some number of times.
[1, 1, 16, 5] # repeated thrice
[1, 1, 16, 5]
[1, 1, 16, 5]
[8, 10, 512, 10] # repeated twice
[8, 10, 512, 10]
...
[448, 8192, 409600, 150]
You can using np.repeat so to repeat an arbitrary number of time a givent value in an array and then use that as an index for the input array (since np.repeat do not work directly on 2D arrays). Here is an example:
# Example of random input
inputArr = np.random.randint(0, 1000, (50, 4))
# Example with [2, 3, ..., 52] repeated lines
counts = np.arange(2, 52)
# Actual computation
outputArr = inputArr[np.repeat(np.arange(inputArr.shape[0]), counts)]

Replace int values in 2D np.array with list of 3 values to make it 3D

I came along this problem when helping on this question where OP does some image processing. Regardless if there are other ways to do the whole thing, in one part, I have a 2D np.array filles with integers. The integers are just mask values, each standing for a RGB color.
I have a dictionary with integers as keys and arrays of RGB colors as value. This is the mapping and the goal is to replace each int in the array with the colors.
Starting with this array where all RGB-array where already replaced by integers so now it is an array of shape (2,3) (originially it was shape(2,3,3))
import numpy as np
arr = np.array([0,2,4,1,3,5]).reshape(2,3)
print(arr)
array([[0, 2, 4],
[1, 3, 5]])
Here is the dictionary (chosen numbers are just random for the example):
dic = {0 : [10,20,30], 1 : [12,22,32], 2 : [15,25,35], 3 : [40,50,60], 4 : [100,200,300], 5 : [250,350,450]}
replacing all these values with the arrays makes it an array with shape (2,3,3) like this:
array([[[ 10, 20, 30],
[ 15, 25, 35],
[100, 200, 300]],
[[ 12, 22, 32],
[ 40, 50, 60],
[250, 350, 450]]])
I looked into np.where because I thought it is the most obvious to me but I always got the error that the shapes are incorrect.
I don't know where exactly I'm stuck, when googling, I came across np.dstack, np.concatenate, reading about changing the shape with np.newaxis / None but I just don't get it done. Maybe creating a new array with np.zeros_like and go from there.
Do I need to create something like a placeholder before I'm able to insert an array holding these 3 RGB values?
Since every single key is in the array because it is created based on that, I thought about loop through the dict, check for key in array and replace it with the dict.value. Am I at least in the right direction or does that lead to nothing?
Any help much appreciated!!!
In this regard, we can create an array of dictionary values by unpacking that and then order them based on the specified orders in the arr. So:
np.array([*dic.values()])[arr]
If the dictionary keys were not in a sorted order, we can create a mask array for ordering based on them, using np.argsort. So, after sorting the array of dictionary values based on the mask array, we can get the results again e.g.:
dic = {0: [10, 20, 30], 2: [15, 25, 35], 3: [40, 50, 60], 1: [12, 22, 32], 4: [100, 200, 300], 5: [250, 350, 450]}
sort_mask = np.array([*dic.keys()]).argsort()
# [0 3 1 2 4 5]
np.array([*dic.values()])[sort_mask][arr]
# [[[ 10 20 30]
# [ 15 25 35]
# [100 200 300]]
#
# [[ 12 22 32]
# [ 40 50 60]
# [250 350 450]]]

select random indices from 2d array

I want to generate a 2d random array and select some(m) random indices to alter their values by predefined values(m).
For an example here, I want to generate 4 by 4 matrix. Then select 4 random indices and alter their values with [105,110,115,120] this values.
random_matrix = np.random.randint(0,100,(4,4))
# array([[27, 20, 2, 8],
# [43, 88, 14, 63],
# [ 5, 55, 4, 72],
# [59, 49, 84, 96]])
Now, I want to randomly select 4 indices and alter their values from predefined p_array = [105,110,115,120]
I try to generate all the indices like this:
[
(i,j)
for i in range(len(random_matrix))
for j in range(len(random_matrix[i]))
]
But how to select 4 random indices from this and alter their values from predefined p_matrix? I couldn't think of any solution because I have to ensure 4 unique random indices where I stuck badly, as randomness haven't that guarantee.
Can we generate random matrix and selecting indices in a single shot? I need that because if the size of m getting larger and larger than it will be getting slower (current implementation). I have to ensure performance also.
Do the following:
import numpy as np
# for reproducibility
np.random.seed(42)
rows, cols = 4, 4
p_array = np.array([105, 110, 115, 120])
# generate random matrix that will always include all the values from p_array
k = rows * cols - len(p_array)
random_matrix = np.concatenate((p_array, np.random.randint(0, 100, k)))
np.random.shuffle(random_matrix)
random_matrix = random_matrix.reshape((rows, cols))
print(random_matrix)
Output
[[115 33 54 27]
[ 3 27 16 69]
[ 33 24 81 105]
[ 62 110 94 120]]
UPDATE
Assuming the same setup as before, you could do the following, to generate a random matrix knowing the indices of the p_array values:
positions = np.random.permutation(np.arange(rows * cols))
random_matrix = random_matrix[positions].reshape((rows, cols))
print("random-matrix")
print("-------------")
print(random_matrix)
print("-------------")
# get indices in flat array
flat_indices = np.argwhere(np.isin(positions, np.arange(4))).flatten()
# get indices in matrix
matrix_indices = np.unravel_index(flat_indices, (rows, cols))
print("p_array-indices")
print("-------------")
print(matrix_indices)
# verify that indeed those are the values
print(random_matrix[matrix_indices])
Output
random-matrix
-------------
[[ 60 74 20 14]
[105 86 120 82]
[ 74 87 110 51]
[ 92 115 99 71]]
-------------
p_array-indices
-------------
(array([1, 1, 2, 3]), array([0, 2, 2, 1]))
[105 120 110 115]
You can do the following, using your suggested cross-product and random.sample:
import random
from itertools import product
pool = [*product(range(len(random_matrix)), range(len(random_matrix[0])))]
random_indices = random.sample(pool, 4)
# [(3, 1), (1, 2), (2, 0), (2, 3)]

Extract a block of rows from 2D numpy

I know this question might be trivial but I am in the learning process. Given numpy 2D array, I want to take a block of rows using slicing approach. For instance, from the following matrix, I want to extract only the first three rows, so from:
[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]
[ 28 9 203 102]
[577 902 11 101]]
I want:
[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]]
My code here actually still missing something. I appreciate any hint.
X = [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [28, 9, 203, 102], [577, 902, 11, 101]]
X = np.array(X)
X_sliced = X[3,:]
print(X_sliced)
Numpy matrices can be thought of as nested lists of lists. Element 1 is list 1, element 2 is list 2, and so on.
You can pull out a single row with x[n], where n is the row number you want.
You can pull out a range of rows with x[n:m], where n is the first row and m is the final row.
If you leave out n or m and do x[n:] or x[:m], Python will fill in the blank with either the start or beginning of the list. For example, x[n:] will return all rows from n to the end, and x[:m] will return all rows from the start to m.
You can accomplish what you want by doing x[:3], which is equivalent to asking for x[0:3].

How to bin a 2D array in numpy?

I'm new to numpy and I have a 2D array of objects that I need to bin into a smaller matrix and then get a count of the number of objects in each bin to make a heatmap. I followed the answer on this thread to create the bins and do the counts for a simple array but I'm not sure how to extend it to 2 dimensions. Here's what I have so far:
data_matrix = numpy.ndarray((500,500),dtype=float)
# fill array with values.
bins = numpy.linspace(0,50,50)
digitized = numpy.digitize(data_matrix, bins)
binned_data = numpy.ndarray((50,50))
for i in range(0,len(bins)):
for j in range(0,len(bins)):
k = len(data_matrix[digitized == i:digitized == j]) # <-not does not work
binned_data[i:j] = k
P.S. the [digitized == i] notation on an array will return an array of binary values. I cannot find documentation on this notation anywhere. A link would be appreciated.
You can reshape the array to a four dimensional array that reflects the desired block structure, and then sum along both axes within each block. Example:
>>> a = np.arange(24).reshape(4, 6)
>>> a
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23]])
>>> a.reshape(2, 2, 2, 3).sum(3).sum(1)
array([[ 24, 42],
[ 96, 114]])
If a has the shape m, n, the reshape should have the form
a.reshape(m_bins, m // m_bins, n_bins, n // n_bins)
At first I was also going to suggest that you use np.histogram2d rather than reinventing the wheel, but then I realized that it would be overkill to use that and would need some hacking still.
If I understand correctly, you just want to sum over submatrices of your input. That's pretty easy to brute force: going over your output submatrix and summing up each subblock of your input:
import numpy as np
def submatsum(data,n,m):
# return a matrix of shape (n,m)
bs = data.shape[0]//n,data.shape[1]//m # blocksize averaged over
return np.reshape(np.array([np.sum(data[k1*bs[0]:(k1+1)*bs[0],k2*bs[1]:(k2+1)*bs[1]]) for k1 in range(n) for k2 in range(m)]),(n,m))
# set up dummy data
N,M = 4,6
data_matrix = np.reshape(np.arange(N*M),(N,M))
# set up size of 2x3-reduced matrix, assume congruity
n,m = N//2,M//3
reduced_matrix = submatsum(data_matrix,n,m)
# check output
print(data_matrix)
print(reduced_matrix)
This prints
print(data_matrix)
[[ 0 1 2 3 4 5]
[ 6 7 8 9 10 11]
[12 13 14 15 16 17]
[18 19 20 21 22 23]]
print(reduced_matrix)
[[ 24 42]
[ 96 114]]
which is indeed the result for summing up submatrices of shape (2,3).
Note that I'm using // for integer division to make sure it's python3-compatible, but in case of python2 you can just use / for division (due to the numbers involved being integers).
Another solution is to have a look at the binArray function on the comments here:
Binning a numpy array
To use your example :
data_matrix = numpy.ndarray((500,500),dtype=float)
binned_data = binArray(data_matrix, 0, 10, 10, np.sum)
binned_data = binArray(binned_data, 1, 10, 10, np.sum)
The result sum all square of size 10x10 in data_matrix (of size 500x500) to obtain a single value per square in binned_data (of size 50x50).
Hope this help !

Categories

Resources