Related
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
In learning about normalization for image recognition, I have seen many people use this code. I know this sentence is used to normalize the confusion matrix so that it contains only numbers between 0 and 1. So that the percentage of correctly classified samples is read from the matrix. I'm not very good at math, but I'd like to know exactly how this sentence works.
If anyone can help me, I'd appreciate it!
It finds a sum along an axis (axis 1) and then does broadcasted division along that axis by the corresponding value of the sum.
So suppose you had:
>>> arr = np.arange(4*5).reshape(4, 5)
>>> arr
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]])
So first, it sums along the axis:
>>> arr.sum(1)
array([10, 35, 60, 85])
Note, you can't broadcast these two arrays with the current shape:
>>> arr / arr.sum(1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: operands could not be broadcast together with shapes (4,5) (4,)
The trailing axis has to be 1, so you add a new axis, with resulting shape (4, 1):
>>> arr.sum(1)[:, np.newaxis]
array([[10],
[35],
[60],
[85]])
>>> arr.sum(1)[:, np.newaxis].shape
(4, 1)
So now, the broadcasting division works:
>>> arr
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]])
>>> arr.sum(1)[:, np.newaxis]
array([[10],
[35],
[60],
[85]])
>>> arr / arr.sum(1)[:, np.newaxis]
array([[0. , 0.1 , 0.2 , 0.3 , 0.4 ],
[0.14285714, 0.17142857, 0.2 , 0.22857143, 0.25714286],
[0.16666667, 0.18333333, 0.2 , 0.21666667, 0.23333333],
[0.17647059, 0.18823529, 0.2 , 0.21176471, 0.22352941]])
Read more about broadcasting in the numpy docs
I would like to know if I have generated the 3 arrays in the manner below, how can I sum all the numbers up from all 3 arrys without summing up the ones that appear in each array.
(I would like to only som upt 10 once but I cant add array X_1 andX_2 because they both have 10 and 20, I only want to som up those numbers once.)
Maybe this can be done by creating a new array out of the X_1, X_2 and X_3 what leave out doubles?
def get_divisible_by_n(arr, n):
return arr[arr%n == 0]
x = np.arange(1,21)
X_1=get_divisible_by_n(x, 2)
#we get array([ 2, 4, 6, 8, 10, 12, 14, 16, 18, 20])
X_2=get_divisible_by_n(x, 5)
#we get array([ 5, 10, 15, 20])
X_3=get_divisible_by_n(x, 3)
#we get array([3, 6, 9, 12, 15, 18])
it is me again!
here is my solution using numpy, cuz i had more time this time:
import numpy as np
arr = np.arange(1,21)
divisable_by = lambda x: arr[np.where(arr % x == 0)]
n_2 = divisable_by(2)
n_3 = divisable_by(3)
n_5 = divisable_by(5)
what_u_want = np.unique( np.concatenate((n_2, n_3, n_5)) )
# [ 2, 3, 4, 5, 6, 8, 9, 10, 12, 14, 15, 16, 18, 20]
Not really efficient and not using numpy but here is one solution:
def get_divisible_by_n(arr, n):
return [i for i in arr if i % n == 0]
x = [i for i in range(21)]
X_1 = get_divisible_by_n(x, 2)
X_2 = get_divisible_by_n(x, 5)
X_3 = get_divisible_by_n(x, 3)
X_all = X_1+X_2+X_3
y = set(X_all)
print(sum(y)) # 142
This is a sample of what I am trying to accomplish. I am very new to python and have searched for hours to find out what I am doing wrong. I haven't been able to find what my issue is. I am still new enough that I may be searching for the wrong phrases. If so, could you please point me in the right direction?
I want to combine n mumber of arrays to make one array. I want to have the first row from x as the first row in the combined the first row from y as the second row in combined, the first row from z as the third row in combined the the second row in x as the fourth row in combined, etc.
so I would look something like this.
x = [x1 x2 x3]
[x4 x5 x6]
[x7 x8 x9]
y = [y1 y2 y3]
[y4 y5 y6]
[y7 y8 y9]
x = [z1 z2 z3]
[z4 z5 z6]
[z7 z8 z9]
combined = [x1 x2 x3]
[y1 y2 y3]
[z1 z2 z3]
[x4 x5 x6]
[...]
[z7 z8 z9]
The best I can come up with is the
import numpy as np
x = np.random.rand(6,3)
y = np.random.rand(6,3)
z = np.random.rand(6,3)
combined = np.zeros((9,3))
for rows in range(len(x)):
combined[0::3] = x[rows,:]
combined[1::3] = y[rows,:]
combined[2::3] = z[rows,:]
print(combined)
All this does is write the last value of the input array to every third row in the output array instead of what I wanted. I am not sure if this is even the best way to do this. Any advice would help out.
*I just figure out this works but if someone knows a higher performance method, *please let me know.
import numpy as np
x = np.random.rand(6,3)
y = np.random.rand(6,3)
z = np.random.rand(6,3)
combined = np.zeros((18,3))
for rows in range(6):
combined[rows*3,:] = x[rows,:]
combined[rows*3+1,:] = y[rows,:]
combined[rows*3+2,:] = z[rows,:]
print(combined)
You can do this using a list comprehension and zip:
combined = np.array([row for row_group in zip(x, y, z) for row in row_group])
Using vectorised operations only:
A = np.vstack((x, y, z))
idx = np.arange(A.shape[0]).reshape(-1, x.shape[0]).T.flatten()
A = A[idx]
Here's a demo:
import numpy as np
x, y, z = np.random.rand(3,3), np.random.rand(3,3), np.random.rand(3,3)
print(x, y, z)
[[ 0.88259564 0.17609363 0.01067734]
[ 0.50299357 0.35075811 0.47230915]
[ 0.751129 0.81839586 0.80554345]]
[[ 0.09469396 0.33848691 0.51550685]
[ 0.38233976 0.05280427 0.37778962]
[ 0.7169351 0.17752571 0.49581777]]
[[ 0.06056544 0.70273453 0.60681583]
[ 0.57830566 0.71375038 0.14446909]
[ 0.23799775 0.03571076 0.26917939]]
A = np.vstack((x, y, z))
idx = np.arange(A.shape[0]).reshape(-1, x.shape[0]).T.flatten()
print(idx) # [0 3 6 1 4 7 2 5 8]
A = A[idx]
print(A)
[[ 0.88259564 0.17609363 0.01067734]
[ 0.09469396 0.33848691 0.51550685]
[ 0.06056544 0.70273453 0.60681583]
[ 0.50299357 0.35075811 0.47230915]
[ 0.38233976 0.05280427 0.37778962]
[ 0.57830566 0.71375038 0.14446909]
[ 0.751129 0.81839586 0.80554345]
[ 0.7169351 0.17752571 0.49581777]
[ 0.23799775 0.03571076 0.26917939]]
I have changed your code a little bit to get the desired output
import numpy as np
x = np.random.rand(6,3)
y = np.random.rand(6,3)
z = np.random.rand(6,3)
combined = np.zeros((18,3))
combined[0::3] = x
combined[1::3] = y
combined[2::3] = z
print(combined)
You had the shape of the combined matrix wrong and there is no real need for the for loop.
This might not be the most pythonic way to do it but you could
for block in range(len(combined)/3):
for rows in range(len(x)):
combined[block*3+0::3] = x[rows,:]
combined[block*3+1::3] = y[rows,:]
combined[block*3+2::3] = z[rows,:]
A simple numpy solution is to stack the arrays on a new middle axis, and reshape the result to 2d:
In [5]: x = np.arange(9).reshape(3,3)
In [6]: y = np.arange(9).reshape(3,3)+10
In [7]: z = np.arange(9).reshape(3,3)+100
In [8]: np.stack((x,y,z),axis=1).reshape(-1,3)
Out[8]:
array([[ 0, 1, 2],
[ 10, 11, 12],
[100, 101, 102],
[ 3, 4, 5],
[ 13, 14, 15],
[103, 104, 105],
[ 6, 7, 8],
[ 16, 17, 18],
[106, 107, 108]])
It may be easier to see what's happening if we give each dimension a different value; e.g. 2 3x4 arrays:
In [9]: x = np.arange(12).reshape(3,4)
In [10]: y = np.arange(12).reshape(3,4)+10
np.array combines them on a new 1st axis, making a 2x3x4 array. To get the interleaving you want, we can transpose the first 2 dimensions, producing a 3x2x4. Then reshape to a 6x4.
In [13]: np.array((x,y))
Out[13]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[10, 11, 12, 13],
[14, 15, 16, 17],
[18, 19, 20, 21]]])
In [14]: np.array((x,y)).transpose(1,0,2)
Out[14]:
array([[[ 0, 1, 2, 3],
[10, 11, 12, 13]],
[[ 4, 5, 6, 7],
[14, 15, 16, 17]],
[[ 8, 9, 10, 11],
[18, 19, 20, 21]]])
In [15]: np.array((x,y)).transpose(1,0,2).reshape(-1,4)
Out[15]:
array([[ 0, 1, 2, 3],
[10, 11, 12, 13],
[ 4, 5, 6, 7],
[14, 15, 16, 17],
[ 8, 9, 10, 11],
[18, 19, 20, 21]])
np.vstack produces a 6x4, but with the wrong order. We can't transpose that directly.
np.stack with default axis behaves just like np.array. But with axis=1, it creates a 3x2x4, which we can reshape:
In [16]: np.stack((x,y), 1)
Out[16]:
array([[[ 0, 1, 2, 3],
[10, 11, 12, 13]],
[[ 4, 5, 6, 7],
[14, 15, 16, 17]],
[[ 8, 9, 10, 11],
[18, 19, 20, 21]]])
The list zip in the accepted answer is a list version of transpose, creating a list of 3 2-element tuples.
In [17]: list(zip(x,y))
Out[17]:
[(array([0, 1, 2, 3]), array([10, 11, 12, 13])),
(array([4, 5, 6, 7]), array([14, 15, 16, 17])),
(array([ 8, 9, 10, 11]), array([18, 19, 20, 21]))]
np.array(list(zip(x,y))) produces the same thing as the stack, a 3x2x4 array.
As for speed, I suspect the allocate and assign (as in Ash's answer) is fastest:
In [27]: z = np.zeros((6,4),int)
...: for i, arr in enumerate((x,y)):
...: z[i::2,:] = arr
...:
In [28]: z
Out[28]:
array([[ 0, 1, 2, 3],
[10, 11, 12, 13],
[ 4, 5, 6, 7],
[14, 15, 16, 17],
[ 8, 9, 10, 11],
[18, 19, 20, 21]])
For serious timings, use much larger examples than this.
For computer vision training purposes, random cropping is often used as a data augmentation technique. At each iteration, a batch of random crops is generated and fed to the network being trained. This needs to be efficient, as it is done at each training iteration.
If the data has too many dimensions, random dimension selection might also be needed. Random frames can be selected in a video for example. The data can even have 4 dimensions (3 in space + time), or more.
How can one write an efficient generator of random views of lower dimension?
A very naïve version for getting 2D views from 3D data, and only one by one, could be:
import numpy as np
import numpy.random as nr
def views():
# suppose `data` comes from elsewhere
# data.shape is (n1, n2, n3)
while True:
drop_dim = nr.randint(0, 3)
drop_dim_keep = nr.randint(0, shape[drop_dim])
selector = np.zeros(shape, dtype=bool)
if drop_dim == 0:
selector[drop_dim_keep, :, :] = 1
elif drop_dim == 1:
selector[:, drop_dim_keep, :] = 1
else:
selector[:, :, drop_dim_keep] = 1
yield np.squeeze(data[selector])
A more elegant solution probably exists, where at least:
there is no ugly if/else on the randomly chosen dimension
views can take a batch_size integer argument and generate several views at once without a loop
the dimension of input/output data is not specified (e.g. can do 3D -> 2D as well as 4D -> 2D)
I tweaked your function to clarify what it's doing:
def views():
# suppose `data` comes from elsewhere
# data.shape is (n1, n2, n3)
while True:
drop_dim = nr.randint(0, 3)
dropshape = list(shape[:])
dropshape[drop_dim] -= 1
drop_dim_keep = nr.randint(0, shape[drop_dim])
print(drop_dim, drop_dim_keep)
selector = np.ones(shape, dtype=bool)
if drop_dim == 0:
selector[drop_dim_keep, :, :] = 0
elif drop_dim == 1:
selector[:, drop_dim_keep, :] = 0
else:
selector[:, :, drop_dim_keep] = 0
yield data[selector].reshape(dropshape)
A small sample run:
In [534]: data = np.arange(24).reshape(shape)
In [535]: data
Out[535]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
In [536]: v = views()
In [537]: next(v)
2 1
Out[537]:
array([[[ 0, 2, 3],
[ 4, 6, 7],
[ 8, 10, 11]],
[[12, 14, 15],
[16, 18, 19],
[20, 22, 23]]])
In [538]: next(v)
0 0
Out[538]:
array([[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
So it's picking one of the dimensions, and for that dimension dropping one 'column'.
The main efficiency issue is whether it's returning a view or a copy. In this case it has to return a copy.
You are using a boolean mask to select the return, exactly the same as what np.delete does in this case.
In [544]: np.delete(data,1,2).shape
Out[544]: (2, 3, 3)
In [545]: np.delete(data,0,0).shape
Out[545]: (1, 3, 4)
So you could replace much of your interals with delete, letting it take care of generalizing the dimensions. Look at its code to see how it handles those details (It isn't short and sweet!).
def rand_delete():
# suppose `data` comes from elsewhere
# data.shape is (n1, n2, n3)
while True:
drop_dim = nr.randint(0, 3)
drop_dim_keep = nr.randint(0, shape[drop_dim])
print(drop_dim, drop_dim_keep)
yield np.delete(data, drop_dim_keep, drop_dim)
In [547]: v1=rand_delete()
In [548]: next(v1)
0 1
Out[548]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]]])
In [549]: next(v1)
2 0
Out[549]:
array([[[ 1, 2, 3],
[ 5, 6, 7],
[ 9, 10, 11]],
[[13, 14, 15],
[17, 18, 19],
[21, 22, 23]]])
Replace the delete with take:
def rand_take():
while True:
take_dim = nr.randint(0, 3)
take_keep = nr.randint(0, shape[take_dim])
print(take_dim, take_keep)
yield np.take(data, take_keep, axis=take_dim)
In [580]: t = rand_take()
In [581]: next(t)
0 0
Out[581]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In [582]: next(t)
2 3
Out[582]:
array([[ 3, 7, 11],
[15, 19, 23]])
np.take returns a copy, but the equivalent slicing does not
In [601]: data.__array_interface__['data']
Out[601]: (182632568, False)
In [602]: np.take(data,0,1).__array_interface__['data']
Out[602]: (180099120, False)
In [603]: data[:,0,:].__array_interface__['data']
Out[603]: (182632568, False)
A slicing tuple can be generated with expressions like
In [604]: idx = [slice(None)]*data.ndim
In [605]: idx[1] = 0
In [606]: data[tuple(idx)]
Out[606]:
array([[ 0, 1, 2, 3],
[12, 13, 14, 15]])
Various numpy functions that take an axis parameter construct an indexing tuple like this. (For example one or more of the apply... functions.
If I have an array, let's say: np.array([4,8,-2,9,6,0,3,-6]) and I would like to add the previous number to the next element, how do I do?
And every time the number 0 shows up the addition of elements 'restarts'.
An example with the above array, I should get the following output when I run the function:
stock = np.array([4,12,10,19,25,0,3,-3]) is the right output, if the above array is inserted in transactions.
def cumulativeStock(transactions):
# insert your code here
return stock
I can't think of a method to solving this problem. Any help would be very appreciated.
I believe you mean something like this?
z = np.array([4,8,-2,9,6,0,3,-6])
n = z == 0
[False False False False False True False False]
res = np.split(z,np.where(n))
[array([ 4, 8, -2, 9, 6]), array([ 0, 3, -6])]
res_total = [np.cumsum(x) for x in res]
[array([ 4, 12, 10, 19, 25]), array([ 0, 3, -3])]
np.concatenate(res_total)
[ 4 12 10 19 25 0 3 -3]
another vectorized solution:
import numpy as np
stock = np.array([4, 8, -2, 9, 6, 0, 3, -6])
breaks = stock == 0
tmp = np.cumsum(stock)
brval = numpy.diff(numpy.concatenate(([0], -tmp[breaks])))
stock[breaks] = brval
np.cumsum(stock)
# array([ 4, 12, 10, 19, 25, 0, 3, -3])
import numpy as np
stock = np.array([4, 12, 10, 19, 25, 0, 3, -3, 4, 12, 10, 0, 19, 25, 0, 3, -3])
def cumsum_stock(stock):
## Detect all Zero's first
zero_p = np.where(stock==0)[0]
## Create empty array to append final result
final_stock = np.empty(shape=[0, len(zero_p)])
for i in range(len(zero_p)):
## First Zero detection
if(i==0):
stock_first_part = np.cumsum(stock[:zero_p[0]])
stock_after_zero_part = np.cumsum(stock[zero_p[0]:zero_p[i+1]])
final_stock = np.append(final_stock, stock_first_part)
final_stock = np.append(final_stock, stock_after_zero_part)
## Last Zero detection
elif(i==(len(zero_p)-1)):
stock_last_part = np.cumsum(stock[zero_p[i]:])
final_stock = np.append(final_stock, stock_last_part, axis=0)
## Intermediate Zero detection
else:
intermediate_stock = np.cumsum(stock[zero_p[i]:zero_p[i+1]])
final_stock = np.append(final_stock, intermediate_stock, axis=0)
return(final_stock)
final_stock = cumsum_stock(stock).astype(int)
#Output
final_stock
Out[]: array([ 4, 16, 26, ..., 0, 3, 0])
final_stock.tolist()
Out[]: [4, 16, 26, 45, 70, 0, 3, 0, 4, 16, 26, 0, 19, 44, 0, 3, 0]
def cumulativeStock(transactions):
def accum(x):
acc=0
for i in x:
if i==0:
acc=0
acc+=i
yield acc
stock = np.array(list(accum(transactions)))
return stock
for your input np.array([4,8,-2,9,6,0,3,-6])
it returns
array([ 1, 3, 6, 9, 13, 0, 1, 3, 6])
I assume you mean you want to seperate the list at every zero?
from itertools import groupby
import numpy
def cumulativeStock(transactions):
#split list on item 0
groupby(transactions, lambda x: x == 0)
all_lists = [list(group) for k, group in groupby(transactions, lambda x: x == 0) if not k]
# cumulative the items
stock = []
for sep_list in all_lists:
for item in numpy.cumsum(sep_list):
stock.append(item)
return stock
print(cumulativeStock([4,8,-2,9,6,0,3,-6]))
Which will return:
[4, 12, 10, 19, 25, 3, -3]