Find column location in matrix based on multiple conditions

Find column location in matrix based on multiple conditions - python

I have a matrix in the following form:
import numpy as np
matrix = np.array([[-2,2,6,7,8],[-3,7,1,0,-2]])
I want to find the location of the column with the highest possible value in the first row conditional on non-negative numbers in the second row e.g. in my case I want the algorithm to find the 4th row.
solution = np.array([7,0])
column_location = 3
I tried using numpy functions like np.min(), np.max(),np.take() but I loose the location information when subsampling the matrix.

Simply:
nn = np.where(matrix[1] >= 0)[0]
ix = nn[matrix[0, nn].argmax()]
On your data:
>>> ix
3

Here's a sketch:
pos_inds = np.where(matrix[1, :] >= 0)[0] # indices where 2nd row is positive
max_ind = matrix[0, pos_inds].argmax() # max index into row with positive values only
orig_max_ind = pos_inds[max_ind] # max index into the original array
print(orig_max_ind) # 3
print(matrix[:, orig_max_ind]) # [7, 0]

Here I use the masks to handle the numpy, and also consider that if all the numbers are negative in second column, there will be no solution:
import numpy as np
import numpy.ma as ma
from copy import deepcopy
min_int = -2147483648
matrix = np.array([[-2, 2, 6, 7, 8], [-3, 7, 1, 0, -2]])
# we keep the original matrix untouched
matrix_copy = deepcopy(matrix)
masked_array = ma.masked_less(matrix[1], 0)
matrix_copy[0][masked_array.mask] = min_int
column_location = np.argmax(matrix_copy[0])
if matrix_copy[0][column_location] == min_int:
print("No solution")
else:
solution = np.array([matrix[0][column_location], matrix[1][column_location]])
print(solution) # np.array([7,0])
print(column_location) # 3

Related

Finding an index numpy python

Consider a NumPy array of shape (8, 8).
My Question: What is the index (x,y) of the 50th element?
Note: For counting the elements go row-wise.
Example, in array A, where A = [[1, 5, 9], [3, 0, 2]] the 5th element would be '0'.
Can someone explain how to find the general solution for this and, what would be the solution for this specific problem?

You can use unravel_index to find the coordinates corresponding to the index of the flattened array. Usually np.arrays start with index 0, you have to adjust for this.
import numpy as np
a = np.arange(64).reshape(8,8)
np.unravel_index(50-1, a.shape)
Out:
(6, 1)

In a NumPy array a of shape (r, c) (just like a list of lists), the n-th element is
a[(n-1) // c][(n-1) % c],
assuming that n starts from 1 as in your example.
It has nothing to do with r. Thus, when r = c = 8 and n = 50, the above formula is exactly
a[6][1].
Let me show more using your example:
from numpy import *
a = array([[1, 5, 9], [3, 0, 2]])
r = len(a)
c = len(a[0])
print(f'(r, c) = ({r}, {c})')
print(f'Shape: {a.shape}')
for n in range(1, r * c + 1):
print(f'Element {n}: {a[(n-1) // c][(n-1) % c]}')
Below is the result:
(r, c) = (2, 3)
Shape: (2, 3)
Element 1: 1
Element 2: 5
Element 3: 9
Element 4: 3
Element 5: 0
Element 6: 2

numpy.ndarray.faltten(a) returns a copy of the array a collapsed into one dimension. And please note that the counting starts from 0, therefore, in your example 0 is the 4th element and 1 is the 0th.
import numpy as np
arr = np.array([[1, 5, 9], [3, 0, 2]])
fourth_element = np.ndarray.flatten(arr)[4]
or
fourth_element = arr.flatten()[4]
the same for 8x8 matrix.

First need to create a 88 order 2d numpy array using np.array and range.Reshape created array as 88
In the output you check index of 50th element is [6,1]
import numpy as np
arr = np.array(range(1,(8*8)+1)).reshape(8,8)
print(arr[6,1])
output will be 50
or you can do it in generic way as well by the help of numpy where method.
import numpy as np
def getElementIndex(array: np.array, element):
elementIndex = np.where(array==element)
return f'[{elementIndex[0][0]},{elementIndex[1][0]}]'
def getXYOrderNumberArray(x:int, y:int):
return np.array(range(1,(x*y)+1)).reshape(x,y)
arr = getXYOrderNumberArray(8,8)
print(getElementIndex(arr,50))

Difference hashing and understanding what these lines of code are doing?

I am new to Python and am writing an application to identify matching images. I am using the dHash algorithm to compare the hashes of images. I have seen in a tutorial the following lines of code:
import cv2 as cv
import numpy as np
import sys
hashSize = 8
img = cv.imread("Resources\\test_image.jpg",0)
image_resized = cv.resize(img, (hashSize + 1, hashSize))
difference = image_resized[0:, 1:] > image_resized[0:, :-1]
hash_Value = sum([5**i for i, value in enumerate(difference.flatten())if value == True])
print(hash_Value)
The two lines I am referring to are the difference line and the hash_Value line. As far as I understand, the first line checks to see if the left pixel has a greater intensity then the right pixel. How does it do this for the whole array? There is no for loop over the array to check each index. As for the second line, I think it is checking to see if the value is true and if it is, it is adding the value of 5^i to sum and then assigning that to image_hash.
I am new to Python and the syntax here is a little confusing. Can someone explain what the two lines above are doing? Is there a more readable way of writing this that will help me understand what the algorithm is doing and be more readable in future?

To break it down, the first line pixel_difference = image_resized[0:, 1:] > image_resized[0:, :-1] is basically doing the following:
import numpy as np # I assume you are using numpy.
# Suppose you have the following 2D arrays:
arr1 = np.array([ [1, 2, 3], [4, 5, 6], [7, 8, 9] ])
arr2 = np.array([ [3, 1, 2], [5, 5, 4], [7, 7, 7] ])
# pixel_difference = image_resized[0:, 1:] > image_resized[0:, :-1]
# can be written as following:
m, n = arr1.shape # This will assign m = 3, n = 3.
pixel_difference = np.ndarray(shape=(m, n-1), dtype=bool) # Initializes m x (n-1) matrix.
for row in range(m):
for col in range(n-1):
pixel_difference[row, col] = arr1[row, col+1] > arr2[row, col]
print(pixel_difference)
And the second line is doing this:
image_hash = 0
for i, value in enumerate(pixel_difference.flatten()):
if value:
image_hash += 5**i
print(image_hash)

Generate binary random matrix with upper and lower limit on number of ones in each row?

I want to generate a binary matrix of numbers with M rows and N columns. Each row must sum to <=p and >=q. In other words, each row must have at most p and at least q ones.
This is the code I have been using.
import numpy as np
def randbin(M, N, P):
return np.random.choice([0, 1], size=(M, N), p=[P, 1 - P])
MyMatrix = randbin(200, 7, 0.5)
Notice that row 0 is all zeros:
I noticed that some rows have all zeros and some rows have all ones. How can I modify this to get what I want? Is there an efficient way of achieving this solution?

You can generate a random number in [q, p] for each row and then set that many random ones in each row. If by efficient you mean vectorized, then yes, there is an efficient way. The trick is to simulate sampling without replacement in one axis but with the the other. This can be done with np.argsort. You can select a variable number of indices by turning a random vector into a mask.
def randbin(m, n, p, q):
# output to assign ones into
result = np.zeros((m, n), dtype=bool)
# simulate sampling with replacement in one axis
col_ind = np.argsort(np.random.random(size=(m, n)), axis=1)
# figure out how many samples to take in each row
count = np.random.randint(p, q + 1, size=(m, 1))
# turn it into a mask over col_ind using a clever broadcast
mask = np.arange(n) < count
# apply the mask not only to col_ind, but also the corresponding row_ind
col_ind = col_ind[mask]
row_ind = np.broadcast_to(np.arange(m).reshape(-1, 1), (m, n))[mask]
# Set the corresponding elements to 1
result[row_ind, col_ind] = 1
return result
The selection is made so that each run of equal values in row_ind is between p and q elements long. The corresponding elements of col_ind are unique and uniformly distributed within each row.
An alternative is #Prunes solution. It requires np.argsort to shuffle the rows independently, since np.random.shuffle would keep the rows together:
def randbin(m, n, p, q):
# make the unique rows
options = np.arange(n) < np.arange(p, q + 1).reshape(-1, 1)
# select random unique row to go into each output row
selection = np.random.choice(options.shape[0], size=m, replace=True)
# perform the selection
result = options[selection]
# create indices to shuffle each row independently
col_ind = np.argsort(np.random.random(result.shape), axis=1)
row_ind = np.arange(m).reshape(-1, 1)
# perform the shuffle
result = result[row_ind, col_ind]
return result

Okay, then: a uniform distribution is easy enough. Let's take that case with [2,5] 1s required. Use a list of the allowable combinations:
[ [1, 1, 0, 0, 0, 0],
[1, 1, 1, 0, 0, 0],
[1, 1, 1, 1, 0, 0],
[1, 1, 1, 1, 1, 0] ]
For each of your rows, choose a random element from these four, and then shuffle it. There is your row.

Aggregate elements based on position vector

I'm trying to vectorize a very simple operation but can't seem to figure out how.
Given a very large numerical vector (over 1M positions) and another array of size n with a given set of positions, I would like to get back a vector of size n with elements being the average of the values of the first vector as specified by the second
a = np.array([1,2,3,4,5,6,7])
b = np.array([[0,1],[2],[3,5],[4,6]])
c = [1.5,3,5,6]
I need to repeat this operation many times so performance is an issue.
Vanilla python solution:
import numpy as np
import time
a = np.array([1,2,3,4,5,6,7])
b = np.array([[0,1],[2],[3,5],[4,6]])
begin = time.time()
for i in range(100000):
c = []
for d in b:
c.append(np.mean(a[d]))
print(time.time() - begin, c)
# 3.7529971599578857 [1.5, 3.0, 5.0, 6.0]

I'm not sure if this is necessarily faster but you may as well try:
import numpy as np
a = np.array([1, 2, 3, 4, 5, 6, 7])
b = np.array([[0, 1], [2], [3, 5], [4, 6]])
# Get the length of each subset of indices
lens = np.fromiter((len(bi) for bi in b), count=len(b), dtype=np.int32)
# Compute reduction indices
reduce_idx = np.roll(np.cumsum(lens), 1)
reduce_idx[0] = 0
# Make flattened array of index lists
idx = np.fromiter((i for bi in b for i in bi), count=lens.sum(), dtype=np.int32)
# Reorder according to indices
a2 = a[idx]
# Sum reordered array at reduction indices and divide by number of indices
c = np.add.reduceat(a2, reduce_idx) / lens
print(c)
# [1.5 3. 5. 6. ]

Decrease array size by averaging adjacent values with numpy

I have a large array of thousands of vals in numpy. I want to decrease its size by averaging adjacent values.
For example:
a = [2,3,4,8,9,10]
#average down to 2 values here
a = [3,9]
#it averaged 2,3,4 and 8,9,10 together
So, basically, I have n number of elements in array, and I want to tell it to average down to X number of values, and it averages like above.
Is there some way to do that with numpy (already using it for other things, so I'd like to stick with it).

Using reshape and mean, you can average every m adjacent values of an 1D-array of size N*m, with N being any positive integer number. For example:
import numpy as np
m = 3
a = np.array([2, 3, 4, 8, 9, 10])
b = a.reshape(-1, m).mean(axis=1)
#array([3., 9.])
1)a.reshape(-1, m) will create a 2D image of the array without copying data:
array([[ 2, 3, 4],
[ 8, 9, 10]])
2)taking the mean in the second axis (axis=1) will then calculate the mean value of each row, resulting in:
array([3., 9.])

Try this:
n_averaged_elements = 3
averaged_array = []
a = np.array([ 2, 3, 4, 8, 9, 10])
for i in range(0, len(a), n_averaged_elements):
slice_from_index = i
slice_to_index = slice_from_index + n_averaged_elements
averaged_array.append(np.mean(a[slice_from_index:slice_to_index]))
>>>> averaged_array
>>>> [3.0, 9.0]

Looks like a simple non-overlapping moving window average to me, how about:
In [3]:
import numpy as np
a = np.array([2,3,4,8,9,10])
window_sz = 3
a[:len(a)/window_sz*window_sz].reshape(-1,window_sz).mean(1)
#you want to be sure your array can be reshaped properly, so the [:len(a)/window_sz*window_sz] part
Out[3]:
array([ 3., 9.])

In this example, I presume that a is the 1D numpy array that needs to be averaged. In the method that I give below, we first find the factors of the length of this array a. And, then we choose the an appropriate factor as the step size to average the array with.
Here is the code.
import numpy as np
from functools import reduce
''' Function to find factors of a given number 'n' '''
def factors(n):
return list(set(reduce(list.__add__,
([i, n//i] for i in range(1, int(n**0.5) + 1) if n % i == 0))))
a = [2,3,4,8,9,10] #Given array.
'''fac: list of factors of length of a.
In this example, len(a) = 6. So, fac = [1, 2, 3, 6] '''
fac = factors(len(a))
'''step: choose an appropriate step size from the list 'fac'.
In this example, we choose one of the middle numbers in fac
(3). '''
step = fac[int( len(fac)/3 )+1]
'''avg: initialize an empty array. '''
avg = np.array([])
for i in range(0, len(a), step):
avg = np.append( avg, np.mean(a[i:i+step]) ) #append averaged values to `avg`
print avg #Prints the final result
[3.0, 9.0]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Find column location in matrix based on multiple conditions - python

Simply: nn = np.where(matrix[1] >= 0)[0] ix = nn[matrix[0, nn].argmax()] On your data: >>> ix 3

Related

Finding an index numpy python

Difference hashing and understanding what these lines of code are doing?

Generate binary random matrix with upper and lower limit on number of ones in each row?

Aggregate elements based on position vector

Decrease array size by averaging adjacent values with numpy

Categories

Resources