How do I stop python from assuming that I want the scalar result of a dot product?
I have columns taken from two matrices, V=[v1,v2,v3,...] and D=[d1,d2,...] of lengths M and N respectively.
I need the following matrix, which can be generated by matrix multiplication of one column with one row.
v1d1, v1d2, v1*d3, ...
v2d1, v2d2,
v3*d1,
.
.
.
This calculation will be done at least hundreds of thousands of times so I don't want to use a for-loop.
When I try to do this with numpy it assumes I want the more common dot product (1xM, Nx1) to result in a scalar(if M=N) or error, rather than the (Mx1, 1xN) for the MxN matrix I want. I've tried np.dot and np.matmul, and in each case it seems to ignore np.transpose.
In the following I've tried to specify that these objects should be considered to have two dimensions, and it gives the same error with or without the presence of transpose.
import numpy as np
v = np.arange(4)
d = np.arange(3)
np.reshape(v,(1,4))
np.reshape(d,(3,1))
e = np.matmul(np.transpose(d),v)
print(e)
Traceback (most recent call last):
File "/home/voidbender/research/NNs/test2.py", line 8, in
e = np.matmul(np.transpose(v),d)
ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0, with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 3 is different from 4)
You forgot to re-assign v and d to its reshape version, this does the job:
import numpy as np
v = np.arange(4)
d = np.arange(3)
v =np.reshape(v,(1,4))
d =np.reshape(d,(3,1))
e=np.matmul(d,v)
print(e)
results:
[[0 0 0 0]
[0 1 2 3]
[0 2 4 6]]
Just to make it simpler, and avoid reshape, you can create column and row vectors by:
row_vector=np.array([np.arange(4)])
col_vector=np.array([np.arange(3)]).T
e=np.matmul(col_vector,row_vector)
Related
i was told to Use a Python program to compute ê for every row of the array and store the results in a separate NumPy array.
which example 1 or 2 (image) below being correctly displayed as a seprate Numpy Array?
Consider you have a 3*2 image. Probably "all rows" would mean doing operation e across the columns. Just an example of np.sum()
>>> img=np.array([[1,1],[2,1],[4,1]])
>>> e=np.sum(img,axis=1)
>>> e
array([2, 3, 5])
>>> e.shape
(3,)
>>> img.shape
(3, 2)
>>> img
array([[1, 1],
[2, 1],
[4, 1]])
>>>
However, it depends really on what your ê is which hasn't been posted.
It could be ê is being calculated for each element or 'element-wise' for which you would have to do as Rinshan stated in second part of his answer.
You should refer these diagrams to clear out which axis you need to perform e-hat on. This is only how much I can help you sorry.
EDIT: If e-hat is exp then you could sum across the columns then apply np.exp()
Compute exponent row wise
import numpy as np
ar = np.array(([4.0,0.2,1.16,0.5],[6.0,0.1,0.06,-0.75]))
e = np.exp(np.sum(ar,axis=1)) # O/P array([350.72414402, 223.63158768])
# Take exponent and take some along axis 1
e = np.sum(np.exp(ar),axis=1)
Computing exponent element wise
e = np.exp(ar)
# To convert to single row
e =e.reshape(-1) # or single line e = np.exp(ar).reshape(-1)
print(e)
array([ 54.59815003, 1.22140276, 3.18993328, 1.64872127,
403.42879349, 1.10517092, 1.06183655, 0.47236655])
Compute by multiplying with variable
import numpy as np
ar = np.array(([4.0,0.2,1.16,0.5],[6.0,0.1,0.06,-0.75]))
s = np.sum(a,axis=1)
e_calculated = e ** s # (Where you can assing value of e)
# Calculating with np.power np.power(e,s)
I have two numpy ndarrays named A and B. Each ndarray has dimension 2 by 3. For each of the grid point, I have to find which element of the two arrays is closest to zero and assign a flag accordingly. The flag takes value 1 for array A and value 2 for array B. That is, if the element in (0,0) (i.e., row 0 and column 0) of array A is closest to zero compared to (0,0) element of array B, then the output assigns a value 1 in position row 0 and column 0. The output array will have the dimension 1 by 3.
I give an example below
A= np.array([[0.1,2,0.3],[0.4,3,2]])
B= np.array([[1,0.2,0.5],[4,0.03,0.02]])
The output should be
[[1,2,1],[1,2,2]]
Is there an efficient way of doing it without writing for loop? Many thanks.
Here's what i would do:
import numpy as np
a = np.array([[0.1,2,0.3],[0.4,3,2]])
b = np.array([[1,0.2,0.5],[4,0.03,0.02]])
c = np.abs(np.stack([a, b])).argmin(0)+1
Output:
array([[1, 2, 1],
[1, 2, 2]])
I have a matrix of 10,000 by 10,000 filled with 1s and 0s. What i want to do is to go through each column and find the rows that contain the value 1.
Then I want to store it in a new matrix with 2 columns : column 1 = column index and Column 2 = an array of row indices that contain 1. There are some columns that do not have any 1s at all, in which case it would be an empty array.
Trying to do a for loop again but it is computationally inefficient.
I tried with a smaller matrix
#sample matrix
n = 4
mat = [[randint(0,1) for _ in range(n)] for _ in range(n)]
arr = np.random.randint(0, size=(4, 2))
for col in range(n):
arr[n][1] = n
arr[n][2] = np.where(col == 1)
but this runs quite slowly for a 10,000 by 10,000 matrix. I am wondering if this is right and if there was a better way?
Getting indices where a[i][j] == 1
You can get the data that you're looking for (locations of ones within a matrix of zeroes and ones) efficiently using numpy.argwhere() or numpy.nonzero(), however you will not be able to get them in the format specified in your original question using NumPy ndarrays alone.
You could achieve the data in you're specified format using a combination of ndarrays and standard Python lists, however since efficiency is paramount given the size of the data you are working with I would think it best to focus on getting the data rather than getting it in the format of an ndarray of irregular Python lists.
You can always reformat the results (indices of 1 within your matrix) following computation if the format you have mentioned is a hard requirement, and this way your code will benefit from optimisations provided by NumPy during the heavy computation - reducing the execution time of your procedure overall.
Example using np.argwhere()
import numpy as np
a = np.random.randint(0, 2, size=(4,4))
b = np.argwhere(a == 1)
print(f'a\n{a}')
print(f'b\n{b}')
Output
a
[[1 1 1 1]
[0 0 0 0]
[1 0 1 0]
[1 1 1 1]]
b
[[0 0]
[0 1]
[0 2]
[0 3]
[2 0]
[2 2]
[3 0]
[3 1]
[3 2]
[3 3]]
As you can see, np.argwhere(a == 1) returns an ndarray whose values are ndarrays containing the indices of locations in a whose values (x) meet the condition x == 1.
I gave the above method with a = np.random.randint(0, 2, size=(10000,10000) a try on my laptop (nothing fancy) a few times and it finished at around 3-5 seconds each time.
Getting row indices where all values != 1
If you want to store all row indices of a containing no values == 1, the most straightforward way (assuming you are using my example code above) would probably be by using numpy.setdiff1d() to return an array of row indices that are not present within b - i.e. the set difference between an array containing all row indices of a and the 1d array b[0] which will be row indices of all values in a that are != 1.
Assuming the same a and b as the above example.
c = np.setdiff1d(np.arange(a.shape[0]), b[:, 0])
print(c)
Output
array([1])
In the above example c = [1] as 1 is the only row index in a that doesn't contain any values == 1.
It is worth noting that if a is defined as np.random.randint(0, 2, size=(10000,10000), the probability of c being anything but a zero-length (i.e. empty) array is vanishingly small. This is because for a row to contain no values == 1, np.random would have to return 0 10,000 times in a row to fill a row with 0.
Why use multiple NumPy arrays?
I know that it may seem strange to use b and c to store results pertaining to locations where a == 1 and a != 1 respectively. Why not just use an irregular list as outlined in your original question?
The answer in short is efficiency. By using NumPy arrays you will be able to vectorise computations on your data and largely avoid costly Python loops, the benefits of which will be magnified considerably as reflected in time spent on execution given the size of the data you are working with.
You can always store your data in a different format that is more human friendly and map it back to NumPy as required, however the above examples will likely increase efficiency substantially at execution time when compared to the example in your original question.
Suppose I have a numpy array with 2 rows and 10 columns. I want to select columns with even values in the first row. The outcome I want can be obtained is as follows:
a = list(range(10))
b = list(reversed(range(10)))
c = np.concatenate([a, b]).reshape(2, 10).T
c[c[:, 0] % 2 == 0].T
However, this method transposes twice and I don't suppose it's very pythonic. Is there a way to do the same job cleaner?
Numpy allows you to select along each dimension separately. You pass in a tuple of indices whose length is the number of dimensions.
Say your array is
a = np.random.randint(10, size=(2, 10))
The even elements in the first row are given by the mask
m = (a[0, :] % 2 == 0)
You can use a[0] to get the first row instead of a[0, :] because missing indices are synonymous with the slice : (take everything).
Now you can apply the mask to just the second dimension:
result = a[:, m]
You can also convert the mask to indices first. There are subtle differences between the two approaches, which you won't see in this simple case. The biggest difference is usually that linear indices are a little faster, especially if applied more than once:
i = np.flatnonzero(m)
result = a[:, i]
import numpy as np
I have two arrays of size n (to simplify, I use in this example n = 2):
A = array([[1,2,3],[1,2,3]])
B has two dimensions with n time a random integer: 1, 2 or 3.
Let's pretend:
B = array([[1],[3]])
What is the most pythonic way to subtract B from A in order to obtain C, C = array([2,3],[1,2]) ?
I tried to use np.subtract but due to the broadcasting rules I do not obtain C. I do not want to use mask or indices but element's values. I also tried to use np.delete, np.where without success.
Thank you.
This might work and should be quite Pythonic:
dd=[[val for val in A[i] if val not in B[i]] for i in xrange(len(A))]