I have two numpy int matrices a and b, if I create diff = a-b than something weird happens... I have huge values that weren't present in any of the 2 matrices.
In the picture you can see that the max value of a and b is 52, there are no nan values, but the value diff[0][8] (and many other but not all) is 4294967295.
Screen shots of the results
Any guess?
Related
Suppose I have a 6x6 matrix I want to add into a 9v9 matrix, but I also want to add it at a specified location and not necessarily in a 6x6 block.
The below code summarizes what I want to accomplish, the only difference is that I want to use variables instead of the rows 0:6 and 3:9.
import numpy as np
a = np.zeros((9,9))
b = np.ones((6,6))
a[0:6,3:9] += b #Inserts the 6x6 ones matrix into the top right corner of the 9x9 zeros
Now using variables:
rows = np.array([0,1,2,3,4,5])
cols = np.array([3,4,5,6,7,8])
a[rows,3:9] += b #This works fine
a[0:6,cols] += b #This also works fine
a[rows,cols += b #But this gives me the following error: ValueError: shape mismatch: value array of shape (6,6) could not be broadcast to indexing result of shape (6,)
I have spent hours reading through forums and trying different solutions but nothing has ever worked. The reason I need to use variables is because these are input by the user and could be any combination of rows and columns. This notation worked perfectly in MatLab, where I could add b into a with any combination of rows and columns.
Explanation:
rows = np.array([0,1,2,3,4,5])
cols = np.array([3,4,5,6,7,8])
a[rows,cols] += b
You could translate the last line to the following code:
for x, y, z in zip(rows, cols, b):
a[x, y] = z
That means: rows contains the x-coordinate, cols the y-coordinate of the field you want to manipulate. Both arrays contain 6 values, so you effectively manipulate 6 values, and b must thus also contain exactly 6 values. But your b contains 6x6 values. Therefore this is a "shape mismatch". This site should contain all you need about indexing of np.arrays.
Sorry for the poor wording of the title. What I wanted to do is like this:
Matrix 1 is the original matrix, and matrix 2 is matrix 1 but with every even columns and rows taken out. Matrix 3 is matrix 1 but only has 1 (mod 3) columns and rows. Matrix 4 is the same, with 1 (mod 4) columns and rows. Matrix 5 has 1 (mod 2) columns and all rows.
Is there a PyTorch function that manipulates tensors in this way that is fast and can utilize the GPU? This is sort of like MaxPool2d, however I just need the first value and not the max. If there aren't any functions like that, is there a way to do it manually but still fast?
Matrix 5 is the easiest to show, because you only need to slice along one dimension. But you can slice along both to get the other results.
matrix5 = matrix1[, ::2]
This notation takes every second column, starting at the zeroth.
I have a question on the difference between just using max(list array) and np.max(list array).
Is the only difference here the time it takes for Python to return the code?
They may differ in edge cases, such as a list containing NaNs.
import numpy as np
a = max([2, 4, np.nan]) # 4
b = np.max([2, 4, np.nan]) # nan
NumPy propagates NaN in such cases, while the behavior of Python's max is less certain.
There are also subtle issues regarding data types:
a = max([10**n for n in range(20)]) # a is an integer
b = np.max([10**n for n in range(20)]) # b is a float
And of course running time differences documented in numpy.max or max ? Which one is faster?
Generally, one should use max for Python lists and np.max for NumPy arrays to minimize the number of surprises. For instance, my second example is not really about np.max but about the data type conversion: to use np.max the list is first converted to a NumPy array, but elements like 10**19 are too large to be represented by NumPy integer types so they become floats.
I've written the following Python/Pandas code to multiply each column of an M row x N col dataframe (A) by an M x 1 dataframe (b) to yield the M x N dataframe C:
def multiply_columns(A, b):
C = pd.DataFrame(A.values * b.values, columns=A.columns, index=b.index)
return C
In other words, it multiplies each column of a matrix by a column vector of equal length.
The code works fine, but I can't recall the formal name for this operation. Thoughts?
It is called "broadcasting". Please see the numpy documentation on the subject: Broadcasting.
Also, it is important to note that A.values and b.values are not matrices, they are arrays. This may seem like a minor detail, but it is very important. Many mathematical operations on matrices produce completely different results than their corresponding operations on arrays. So, for example, M1*M2 is a matrix product for matrices, while it is an element-by-element multiplication for arrays. See more details in This answer.
I believed this a simple question and looked for relative topics but I didn't find the right thing. Here is the problem:
I have two NumPy arrays for which I need to make statistic analysis by calculating some criterions, for exemple the correlation coefficient and the Nash criterion (for who are familiar with Nash). Since in the first array are observation data (the second is simulation results), I have some NaNs. I would like my programme to calculate the criterions in ignoring the value couples where the value in the first array is NaN.
I tried the mask method. It worked well if I need only to deal with the first array (for calculation its average for exemple), but didn't work for comparisons of the two arrays value by value.
Could anyone give some help? Thanks!
Just answered a similar question Numpy only on finite entries. You can replace the NaN values in you array with Numpy's isnan function, which is a common way to deal with NaN values.
import numpy as np
replace_NaN = np.isnan(array_name)
array_name[replace_NaN] = 0