Normalize 2D matrix using scalar multiplication in numpy - python

I have a matrix thing that looks like this:
thing.shape
(8070829, 2)
and I want to scale all elements by some scalingfactor = np.iinfo(np.int16).max/thing.max() to normalize the values. Right now I am iterating over all elements which works, but is really slow:
for j, sample in enumerate(thing):
thing[j] = [int(sample[0] * scalingfactor), int(sample[1] * scalingfactor)]
I thought I could do the following, but the results are not the same:
np.multiply(thing, scalingfactor)
Is there are more efficient way to normalize a matrix?

Use vectorized elementwise multiplication and then change dtype (that does the floor-ing) -
(thing*scalingfactor).astype(int) # for thing as array type
Or use np.floor on the scaled version -
np.floor(thing*scalingfactor)
Using the posted code from the question : np.multiply(thing, scalingfactor) would work too, just needs the additional floor-ing step, as suggested earlier.

Related

Finding nearest pixel in defined color space - quick implementation using numpy

I have been working on a task, where I implemented median cut for image quantization – representing the whole image by only limited set of pixels. I implemented the algorithm and now I am trying to implement the part, where I assign each pixel to a representant from the set found by median cut. So, I have variable 'color_space', which is 2d ndarray of shape (n,3), where n is the number of representatives. Then I have variable 'img', which is the original image of shape (rows, columns, 3).
Now I want to find the nearest pixel (bin) for each pixel from the image based on euclidean distance. I was able to come with this solution:
for row in range(img.shape[0]):
for column in range(img.shape[1]):
img[row][column] = color_space[np.linalg.norm(color_space - img[row][column], axis=1).argmin()]
What it does is, that for each pixel from the image, it computes the vector if distances from each of the bins and then it takes the closest one.
Problem is, that this solution is quite slow and I would like to vectorize it - instead of getting vector for each pixel, I would like to get a matrix, where for example first row would be the first vector of distances computed in my code etc...
This problem could be converted into a problem, where I want to do a matrix multiplication, but instead of getting dot product of two vectors, I would get their euclidean distance. Is there some good approach to such problems? Some general solution in numpy, if we want to do 'matrix multiplication' in numpy, but the function Rn x Rn -> R does not need to be dot product, but for example euclidean distance. Of course, for the multiplication, the original image should be resized to (row*columns, 3), but that is a detail.
I have been studying the documentation and searching internet, but didn't find any good approach.
Please note that I don't want others to solve my assignment, the solution I came up with is totally ok, I am just curious whether I could speed it up as I try to learn numpy properly.
Thanks for any advices!
Below is MWE for vectorizing your problem. See comments for explanation.
import numpy
# these are just random array declaration to work with.
image = numpy.random.rand(32, 32, 3)
color_space = numpy.random.rand(10,3)
# your code. I modified it to pick indexes
result = numpy.zeros((32,32))
for row in range(image.shape[0]):
for column in range(image.shape[1]):
result[row][column] = numpy.linalg.norm(color_space - image[row][column], axis=1).argmin()
result = result.astype(numpy.int)
# here we reshape for broadcasting correctly.
image = image.reshape(1,32,32,3)
color_space = color_space.reshape(10, 1,1,3)
# compute the norm on last axis, which is RGB values
result_norm = numpy.linalg.norm(image-color_space, axis=3)
# now compute the vectorized argmin
result_vectorized = result_norm.argmin(axis=0)
print(numpy.allclose(result, result_vectorized))
Eventually, you can get the correct solution by doing color_space[result]. You may have to remove the extra dimensions that you add in color space to get correct shapes in this final operation.
I think this approach might be a bit more numpy-ish/pythonic:
import numpy as np
from typing import *
from numpy import linalg as LA
# assume color_space is defined as a constant somewhere above and is of shape (n,3)
nearest_pixel_idxs: Callable[[np.ndarray], int] = lambda rgb: return LA.norm(color_space - rgb, axis=1).argmin()
img: np.ndarray = color_space[np.apply_along_axis(nearest_pixel_idxs, 1, img.reshape((-1, 3)))]
Why this solution might be more efficient:
It relies on the parallelizable apply_along_axis function nearest_pixel_idxs() rather than the nested for-loops. This is made possible by reshaping img and thereby removing the need for double indexing.
It avoids repeated writes into color_space by only indexing into it once at the very end.
Let me know if you would like me to go into greater depth on any of this - happy to help.
You could first broadcast to get all the combinations and then calculate each norm. You could then pick the smallest from there.
a = np.array([[1,2,3],
[2,3,4],
[3,4,5]])
b = np.array([[1,2,3],
[3,4,5]])
a = np.repeat(a.reshape(a.shape[0],1,3), b.shape[0], axis = 1)
b = np.repeat(b.reshape(1,b.shape[0],3), a.shape[0], axis = 0)
np.linalg.norm(a - b, axis = 2)
Each row of the result represents the distance of the row in a to each of the representatives in b
array([[0. , 3.46410162],
[1.73205081, 1.73205081],
[3.46410162, 0. ]])
You can then use argmin to get the final results.
IMO it is better to use (what #Umang Gupta proposed) numpy's automatic broadcasting than using repeat.

Calculating a vectors with its transposed vector

I'm working on a calculation for a within matrix scatter where i have a 50x20 vector and something that occured to me is that multiplying transposed vectors by the original vector, gives me a dimensional error, saying the following:
operands could not be broadcast together with shapes (50,20) (20,50)
What i tried is: array = my_array * my_array_transposed and got the aforementioned error.
The alternative was to do, then:
new_array = np.dot(my_array, np.transpose(my_array))
In Octave for instance, this would've been a lot easier, but due to the size of the vector, it's kinda hard for me to confirm for ground truth if this is the way to do the following calculation:
Because as far as i know, there is something related as to whether the multiplication is element wise.
My question is, am i applying that formula the right way? If not, whats the right way to multiply a transposed vector by the non-tranposed vector?
Yes, the np.dot formula is the correct one. If you write array = my_array * my_array_transposed you are asking Python to perform component-wise multiplication. Instead you need a row-by-column multiplication which is achieved in numpy with np.dot.

Fastest way generate and sum arrays

I am generating a series of Gaussian arrays given a x vector of length (1400), and arrays for the sigma, center, amplitude (amp), all with length (100). I thought the best way to speed this up would be to use numpy and list comprehension:
g = np.sum([(amp[i]*np.exp(-0.5*(x - (center[i]))**2/(sigma[i])**2)) for i in range(len(center))],axis=0)
Each row is a gaussian along a vector x, and then I sum the columns into a single array of length x.
But this doesn't seem to speed things up at all. I think there is a faster way to do this while avoiding the for loop but I can't quite figure out how.
You should use vectorized computation instead of comprehension so the loops are all performed at c speed.
In order to do so you have to reshape x to be a column vector. For example you could do x = x.reshape((1400,1)).
Then you can operate directly on the arrays, like this:
v=(amp*np.exp(-0.5*(x - (center))**2/(sigma)**2
Then you obtain an array of shape (1400,100) which you can sum up to a vector by np.sum(v, axe=1)
You should try to vectorize all the operations. IMHO the most efficient to first converts your input data to numpy arrays (if they were plain Python lists) and then let numpy process the computations:
np_amp = np.array(amp)
np_center = np.array(center)
np_sigma = np.array(sigma)
g = np.sum((np_amp*np.exp(-0.5*(x - (np_center))**2/(np_sigma)**2)),axis=0)

Calculate a trace of large matrix in python

I have a matrix X and I need to write a function, which calculate a trace of matrix .
I wrote a next script:
import numpy as np
def test(matrix):
return (np.dot(matrix, matrix.T)).trace()
np.random.seed(42)
matrix = np.random.uniform(size=(1000, 1))
print(test(matrix))
It works fine on small matrix, but when I try to calculate on large matrix (for example on matrix with shape (50000, 1)), it gives me a memory error.
I tried to find a solution to the problem in other questions on the site, but nothing helped me. I would be grateful for any advice!
The number you're trying to compute is just the sum of the squares of all entries of X. Sum the squares instead of computing a giant matrix product full of entries you don't want:
return (X**2).sum()
Or ravel the matrix and use dot, which is probably faster for contiguous X:
raveled = X.ravel()
return raveled.dot(raveled)
Actually, ravel is probably faster for non-contiguous X, too - even when ravel needs to copy, it's not doing more allocation than (X**2).sum().

Numpy linalg on multidimensional arrays

Is there a way to use numpy.linalg.det or numpy.linalg.inv on an nx3x3 array (a line in a multiband image), for example? Right now I am doing something like:
det = numpy.array([numpy.linalg.det(i) for i in X])
but surely there is a more efficient way. Of course, I could use map:
det = numpy.array(map(numpy.linalg.det, X))
Any other more direct way?
I'm pretty sure there is no substantially more efficient way than what you have. You can save some memory by first creating an empty array for the results and writing all results directly to that array:
res = numpy.empty_like(X)
for i, A in enumerate(X):
res[i] = numpy.linalg.inv(A)
This won't be any faster, though -- it will only use less memory.
a "normal" determinant is only defined for a matrix (dimension=2), so if that's what you want i don't see another way.
if you really want to compute the determinant of a cube then you could try to implement one of the ways described here:
http://en.wikipedia.org/wiki/Hyperdeterminant
notice that it is not necessarily the same value as the one you're currently computing.
New answer to an old question: Since version 1.8.0, numpy supports evaluating a batch of 2D matrices. For a batch of MxM matrices, the input and output now looks like:
linalg.det(a)
Compute the determinant of an array.
Parameters a(…, M, M) array_like
Input array to compute determinants for.
Returns det(…) array_like
Determinant of a.
Note the ellipsis. There can be multiple "batch dimensions", where for example you can evaluate a determinants on a meshgrid.
https://numpy.org/doc/stable/reference/generated/numpy.linalg.det.html
https://numpy.org/doc/stable/reference/generated/numpy.linalg.inv.html

Categories

Resources