I want to create an array with 17 elements starting with 1 and other numbers are each twice the value immediately before it.
what I have so far is:
import numpy as np
array = np.zeros(shape=17)
array[0]=1
x = 1
for i in array:
print(x)
x *= 2
print(array)
what I got is:
1
2
4
8
16
32
64
128
256
512
1024
2048
4096
8192
16384
32768
65536
[1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
and what I want is:
[1.2.4.8.16.32.64.128.256.512.1024.2048.4096.8192.16384.32768.65536]
There is a function for that
np.logspace(0,16,17,base=2,dtype=int)
# array([ 1, 2, 4, 8, 16, 32, 64, 128, 256,
# 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536])
Alternatives:
1<<np.arange(17)
2**np.arange(17)
np.left_shift.accumulate(np.ones(17,int))
np.repeat((1,2),(1,16)).cumprod()
np.vander([2],17,True)[0]
np.ldexp(1,np.arange(17),dtype=float)
Silly alternatives:
from scipy.sparse import linalg,diags
linalg.spsolve(diags([(1,),(-2,)],(0,-1),(17,17)),np.r_[:17]==0
np.packbits(np.identity(32,'?')[:17],1,'little').view('<i4').T[0]
np.ravel_multi_index(np.identity(17,int)[::-1],np.full(17,2))
np.where(np.sum(np.ix_(*17*((0,1),))).reshape(-1)==1)[0]
You need to assign the value back
import numpy as np
array = np.zeros(shape=17, dtype="int")
x = 1
for i in range(len(array)):
array[i] = x
print(x)
x *= 2
>>> print(array)
[ 1 2 4 8 16 32 64 128 256 512 1024 2048
4096 8192 16384 32768 65536]
it will be more efficient using numpy vectorization like below.
import numpy as np
n=17
triangle = (np.tri(n,n,-1, dtype=np.int64)+1)
triangle.cumprod(axis=1)[:,-1]
Explanation
np.tri(n,n, dtype=np.int64) will create triangle matrix with values 1 at and below diagonal and 0 else where
np.tri(n,n, -1, dtype=np.int64) will shift the triangle matrix by one row such that first row is all zero
np.tri(n,n, -1, dtype=np.int64)+1 will change 0s to 1s and 1s to 2s
at last step use cumprod and take last column which is our answer as it will be products of 0,1,2 ... n 2's with remaining 1s
Related
I have a couple of for loops that I want to vectorize in order to improve performance. They operate on 1 x N matrices.
for y in range(1, len(array[0]) + 1):
array[0, y - 1] = np.floor(np.nanmean(otherArray[0, ((y-1)*3):((y-1)*3+3)]))
for i in range(len(array[0])):
array[0, int((i-1)*L+1)] = otherArray[0, i]
The operations are reliant on the index of the array which is given by the for loop. Is there any way to access the index while using numpy.vectorize so that I can rewrite these as vectorized functions?
First loop:
import numpy as np
array = np.zeros((1, 10))
otherArray = np.arange(30).reshape(1, -1)
print(f'array = \n{array}')
print(f'otherArray = \n{otherArray}')
for y in range(1, len(array[0]) + 1):
array[0, y - 1] = np.floor(np.nanmean(otherArray[0, ((y-1)*3):((y-1)*3+3)]))
print(f'array = \n{array}')
array = np.floor(np.nanmean(otherArray.reshape(-1, 3), axis = 1)).reshape(1, -1)
print(f'array = \n{array}')
output:
array =
[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]
otherArray =
[[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29]]
array =
[[ 1. 4. 7. 10. 13. 16. 19. 22. 25. 28.]]
array =
[[ 1. 4. 7. 10. 13. 16. 19. 22. 25. 28.]]
Second loop:
array = np.zeros((1, 10))
otherArray = np.arange(10, dtype = float).reshape(1, -1)
L = 1
print(f'array = \n{array}')
print(f'otherArray = \n{otherArray}')
for i in range(len(otherArray[0])):
array[0, int((i-1)*L+1)] = otherArray[0, i]
print(f'array = \n{array}')
array = otherArray
print(f'array = \n{array}')
output:
array =
[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]
otherArray =
[[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]]
array =
[[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]]
array =
[[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]]
It looks like in the first loop you are trying to compute a moving average. This is best done like this:
import numpy as np
window_width = 3
arr = np.arange(12)
out = np.floor(np.nanmean(arr.reshape(-1,window_width) ,axis=-1))
print(out)
Regarding your second loop, I have no clue what it does. You are trying to copy values from otherArray to array with some offset? I’d recommend you look at numpy’s slicing functionality.
I have a python numpy 3x4 array A:
A=np.array([[0,1,2,3],[4,5,6,7],[1,1,1,1]])
and a 3x3 array B:
B=np.array([[1,1, 1],[2, 2, 2],[3,3,3]])
I am trying to use a numpy operation to produce array C where each element in C is based on an equation using corresponding elements in A and the entire row in B. A simplified example:
C[row,col] = A[ro1,col] * ( A[row,col] / B[row,0] + B[row,1] + B[row,2) )
My first thoughts were to just simple and just multiply all of A by column in B. Error.
C = A * B[:,0]
Then I thought to try this but it didn't work.
C = A[:,:] * B[:,0]
I am not sure how to use the " : " operator and get access to the specific row, col at the same time. I can do this in regular loops but I wanted something more numpy.
mport numpy as np
A=np.array([[0,1,2,3],[4,5,6,7],[1,1,1,1]])
B=np.array([[1,1, 1],[2, 2, 2],[3,3,3]])
C=np.zeros([3,4])
row,col = A.shape
print(A.shape)
print(A)
print(B.shape)
print(B)
print(C.shape)
print(C)
print(range(row-1))
for row in range(row):
for col in range(col):
C[row,col] = A[row,col] * (( A[row,col] / B[row,0]) + B[row,1] + B[row,2])
print(C)
Which prints:
(3, 4)
[[0 1 2 3]
[4 5 6 7]
[1 1 1 1]]
(3, 3)
[[1 1 1]
[2 2 2]
[3 3 3]]
(3, 4)
[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]
range(0, 2)
[[ 0. 3. 8. 15. ]
[24. 32.5 42. 0. ]
[ 6.33333333 6.33333333 0. 0. ]]
Suggestions on a better way?
Edited:
Now that I understand broadcasting a bit more, and got that code running, let me expand in a generic way what I am trying to solve. I am trying to map values of a category such as "Air" which can be a range (such as 0-5) that have to be mapped to a shade of a given RGB value. The values are recorded over a time period.
For example, at time 1, the value of Water is 4. The standard RGB color for Water is Blue (0,0,255). There are 5 possible values for Water. In the case of Blue, 255 / 5 = 51. To get the effect of the 4 value on the Blue palette, multiply 51 x 4 = 204. Since we want higher values to be darker, we subtract 255 (white) - 205 yielding 51. The Red and Green components end up being 0. So the value read at time N is a multiply on the weighted R, G and B values. We invert 0 values to be subtracted from 255 so they appear white. Stronger values are darker.
So to calculate the R' G' and B' for time 1 I used:
answer = data[:,1:4] - (data[:,1:4] / data[:,[0]] * data[:,[4]])
I can extract an [R, G, B] from and answer and put into an Image at some x,y. Works good. But I can't figure out how to use Range, R, G and B and calculate new R', G', B' for all Time 1, 2, ... N. Trying to expand the numpy approach if possible. I did it with standard loops as:
for row in range(rows):
for col in range(cols):
r = int(data[row,1] - (data[row,1] / data[row,0] * data[row,col_offset+col] ))
g = int(data[row,2] - (data[row,2] / data[row,0] * data[row,col_offset+col] ))
b = int(data[row,3] - (data[row,3] / data[row,0] * data[row,col_offset+col] ))
almostImage[row,col] = [r,g,b]
I can display the image in matplotlib and save it to .png, etc. So I think next step is to try list comprehension over the time points 2D array, and then refer back to the range and RGB values. Will give it a try.
Try this:
A*(A / B[:,[0]] + B[:,1:].sum(1, keepdims=True))
Output:
array([[ 0. , 3. , 8. , 15. ],
[24. , 32.5 , 42. , 52.5 ],
[ 6.33333333, 6.33333333, 6.33333333, 6.33333333]])
Explanation:
The first operation A/B[:,[0]] utilizes numpy broadcasting.
Then B[:,1:].sum(1, keepdims=True) is just B[:,1] + B[:,2], and keepdims=True allows the dimension to stay the same. Print it to see details.
I am fairly new to programming and I never used numpy before.
So, I have a matrix with 19001 x 19001 dimensions. It contains a lot of zeros, so it is relatively sparse. I wrote some code to compute the pairwise cosine similarity of the columns if the item in the row is non-zero. I add all the pairwise similarity values of one row and do some mathematical operations on them to obtain one value for each row of the matrix in the end (see code below). It does what it is supposed to, however as dealing with a great number of dimensions, it is really slow. Is there any way to modify my code to make it more efficient?
import numpy as np
from scipy.spatial.distance import cosine
row_number = 0
out_file = open('outfile.txt', 'w')
for row in my_matrix:
non_zeros = np.nonzero(my_matrix[row_number])[0]
non_zeros = list(non_zeros)
cosine_sim = []
for item in non_zeros:
if len(non_zeros) <= 1:
break
x = non_zeros[0]
y = non_zeros[1]
similarity = 1 - cosine(my_matrix[:, x], my_matrix[:, y])
cosine_sim.append(similarity)
non_zeros.pop(0)
summing = np.sum(cosine_sim)
mean = summing / len(cosine_sim)
log = np.log(mean)
out_file_value = log * -1
out_file.write(str(row_number) + " " + str(out_file_value) + "\n")
if row_number <= 19000:
row_number += 1
else:
break
I know that there are some function to actually compute the cosine similarity even between columns (from sklearn.metrics.pairwise import cosine_similarity), so I tried it. However, the output is kind of the same but on the same time really confusing to me even though I read the documentation and the posts on this page referring to the issue.
For instance:
my_matrix =[[0. 0. 7. 0. 5.]
[0. 0. 11. 0. 0.]
[0. 2. 0. 0. 0.]
[0. 0. 2. 11. 5.]
[0. 0. 5. 0. 0.]]
transposed = np.transpose(my_matrix)
sim_matrix = cosine_similarity(transposed)
# resulting similarity matrix
sim_matrix =[[0. 0. 0. 0. 0.]
[0. 1. 0. 0. 0.]
[0. 0. 1. 0.14177624 0.45112924]
[0. 0. 0.14177624 1. 0.70710678]
[0. 0. 0.45112924 0.70710678 1.]]
If I compute the cosine similarity with my code above, it returns 0.45112924 for the 1st row ([0]) and 0.14177624 and 0.70710678 for row 4 ([3]).
out_file.txt
0 0.796001425306
1 nan
2 nan
3 0.856981065776
4 nan
I greatly appreciate any help or suggestions to my question!
You can consider using scipy instead. However, it doesn't take sparse matrix input. You have to provide numpy array.
import scipy.sparse as sp
from scipy.spatial.distance import cdist
X = np.random.randn(10000, 10000)
D = cdist(X, X.T, metric='cosine') # cosine distance matrix between 2 columns
Here is the speed that I got for 10000 x 10000 random array.
%timeit cdist(X, X.T, metric='cosine')
16.4 s ± 325 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Try on small array
X = np.array([[1,0,1], [0, 3, 2], [1,0,1]])
D = cdist(X, X.T, metric='cosine')
This will give
[[ 1.11022302e-16 1.00000000e+00 4.22649731e-01]
[ 6.07767730e-01 1.67949706e-01 9.41783727e-02]
[ 1.11022302e-16 1.00000000e+00 4.22649731e-01]]
For example D[0, 2] is the cosine distance between column 0 and 2
from numpy.linalg import norm
1 - np.dot(X[:, 0], X[:,2])/(norm(X[:, 0]) * norm(X[:,2])) # give 0.422649
I have a NumPy ndarray that looks like:
[[ 0 0 0 1 0]
[ 0 0 0 0 1]]
but I would like to process it to the following form:
[[ 0. 0. 0. 1. 0.]
[ 0. 0. 0. 0. 1.]]
How would I achieve this?
It looks to me like you have an array of some integer type. You probably want to convert to an array of float:
array_float = array_int.astype(float)
e.g.:
>>> ones_i = np.ones(10, dtype=int)
>>> print ones_i
[1 1 1 1 1 1 1 1 1 1]
>>> ones_f = ones_i.astype(float)
>>> print ones_f
[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
With that said, I think that it is worth asking why you want to process the string representation of your array. There very well might be a better way to accomplish your goal.
Let's say I have an array r of dimension (n, m). I would like to shuffle the columns of that array.
If I use numpy.random.shuffle(r) it shuffles the lines. How can I only shuffle the columns? So that the first column become the second one and the third the first, etc, randomly.
Example:
input:
array([[ 1, 20, 100],
[ 2, 31, 401],
[ 8, 11, 108]])
output:
array([[ 20, 1, 100],
[ 31, 2, 401],
[ 11, 8, 108]])
One approach is to shuffle the transposed array:
np.random.shuffle(np.transpose(r))
Another approach (see YXD's answer https://stackoverflow.com/a/20546567/1787973) is to generate a list of permutations to retrieve the columns in that order:
r = r[:, np.random.permutation(r.shape[1])]
Performance-wise, the second approach is faster.
For a general axis you could follow the pattern:
>>> import numpy as np
>>>
>>> a = np.array([[ 1, 20, 100, 4],
... [ 2, 31, 401, 5],
... [ 8, 11, 108, 6]])
>>>
>>> print a[:, np.random.permutation(a.shape[1])]
[[ 4 1 20 100]
[ 5 2 31 401]
[ 6 8 11 108]]
>>>
>>> print a[np.random.permutation(a.shape[0]), :]
[[ 1 20 100 4]
[ 2 31 401 5]
[ 8 11 108 6]]
>>>
So, one step further from your answer:
Edit: I very easily could be mistaken how this is working, so I'm inserting my understanding of the state of the matrix at each step.
r == 1 2 3
4 5 6
6 7 8
r = np.transpose(r)
r == 1 4 6
2 5 7
3 6 8 # Columns are now rows
np.random.shuffle(r)
r == 2 5 7
3 6 8
1 4 6 # Columns-as-rows are shuffled
r = np.transpose(r)
r == 2 3 1
5 6 4
7 8 6 # Columns are columns again, shuffled.
which would then be back in the proper shape, with the columns rearranged.
The transpose of the transpose of a matrix == that matrix, or, [A^T]^T == A. So, you'd need to do a second transpose after the shuffle (because a transpose is not a shuffle) in order for it to be in its proper shape again.
Edit: The OP's answer skips storing the transpositions and instead lets the shuffle operate on r as if it were.
In general if you want to shuffle a numpy array along axis i:
def shuffle(x, axis = 0):
n_axis = len(x.shape)
t = np.arange(n_axis)
t[0] = axis
t[axis] = 0
xt = np.transpose(x.copy(), t)
np.random.shuffle(xt)
shuffled_x = np.transpose(xt, t)
return shuffled_x
shuffle(array, axis=i)
>>> print(s0)
>>> [[0. 1. 0. 1.]
[0. 1. 0. 0.]
[0. 1. 0. 1.]
[0. 0. 0. 1.]]
>>> print(np.random.permutation(s0.T).T)
>>> [[1. 0. 1. 0.]
[0. 0. 1. 0.]
[1. 0. 1. 0.]
[1. 0. 0. 0.]]
np.random.permutation(), does the row permutation.
There is another way, which does not use transposition and is apparently faster:
np.take(r, np.random.permutation(r.shape[1]), axis=1, out=r)
CPU times: user 1.14 ms, sys: 1.03 ms, total: 2.17 ms. Wall time: 3.89 ms
The approach in other answers: np.random.shuffle(r.T)
CPU times: user 2.24 ms, sys: 0 ns, total: 2.24 ms
Wall time: 5.08 ms
I used r = np.arange(64*1000).reshape(64, 1000) as an input.