I am trying to replace a row in a 2d numpy array.
array2d = np.arange(20).reshape(4,5)
for i in range(0, 4, 1):
array2d[i] = array2d[i] / np.sum(array2d[i])
but I'm getting all 0s:
[[0 0 0 0 0]
[0 0 0 0 0]
[0 0 0 0 0]
[0 0 0 0 0]]
The expected result is:
[[0 0.1 0.2 0.3 0.4]
[0.14285714 0.17142857 0.2 0.22857143 0.25714286]
[0.16666667 0.18333333 0.2 0.21666667 0.23333333]
[0.17647059 0.18823529 0.2 0.21176471 0.22352941]]
The reason you are getting 0's is because the array's dtype is int but the division returns floats in range 0 to 1 and because you modify the rows in-place they are converted to integers (i.e. to 0 in your example). So to fix it use array2d = np.arange(20, dtype=float).reshape(4,5).
But there is no need for the for-loop:
array2d = np.arange(20).reshape(4,5)
array2d = array2d / np.sum(array2d, axis=1, keepdims=True)
Note that here I didn't specify the dtype of the array to be float, but the resulting array's dtype is float because on the second line we created a new array instead of modifying the first array in-place.
https://numpy.org/doc/stable/user/basics.indexing.html#assigning-values-to-indexed-arrays
Related
Given a list of array, I would like to extract the frequent elements in every cell.
For example, for 3 array
arr 1
0,0,0
0,4,1
0,1,4
arr 2
0,0,0
0,7,1
0,1,1
arr 3
5,0,0
0,4,1
0,1,1
The most frequent element for each cell would be
0 0 0
0 4 1
0 1 1
May I know how to achieve this with Numpy? And in actual case, the list of array can be up to 10k in shape.
The list of array are defined as below
import numpy as np
arr=np.array([[0,0,0],[0,4,1],[0,1,4]])
arr2=np.array([[0,0,0],[0,7,1],[0,1,1]])
arr3=np.array([[5,0,0],[0,4,1],[0,1,1]])
arr = np.stack([arr,arr2,arr3], axis=0)
You can stack the arrays into a large matrix and then use scipy.stats.mode along the axis of interest:
import numpy as np
import scipy.stats
arr1 = [[0,0,0],
[0,4,1],
[0,1,4]]
arr2 = [[0,0,0],
[0,7,1],
[0,1,1]]
arr3 = [[5,0,0],
[0,4,1],
[0,1,1]]
arr = np.stack((arr1, arr2, arr3), axis=0)
output = scipy.stats.mode(arr, axis=0).mode[0]
print(output)
# [[0 0 0]
# [0 4 1]
# [0 1 1]]
I used a piece of code to create a 2D binary valued array to cover all possible scenarios of an event. For the first round, I tested it with 2 members.
Here is my code:
number_of_members = 2
n = number_of_members
values = np.arange(2**n, dtype=np.uint8).reshape(-1, 1)
print('$$$ ===> ', values)
bin_array = np.unpackbits(values, axis=1)[:, -n:]
print('*** ===> ', bin_array)
And the result is this:
$$$ ===> [[0]
[1]
[2]
[3]]
*** ===> [[0 0]
[0 1]
[1 0]
[1 1]]
As you can see, it correctly provided my 2D binary array.
The problem begins when I intended to use number_of_members = 20. If I assign 20 to number_of_members python shows this as result:
$$$ ===> [[ 0]
[ 1]
[ 2]
...
[253]
[254]
[255]]
*** ===> [[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 1]
[0 0 0 ... 0 1 0]
...
[1 1 1 ... 1 0 1]
[1 1 1 ... 1 1 0]
[1 1 1 ... 1 1 1]]
The result has 8 columns, but I expected an array of 32 columns. How can I unpack a uint32 array?
As you noted correctly, np.unpackbits operates only on uint8 arrays. The nice thing is that you can view any type as uint8. You can create a uint32 view into your data like this:
view = values.view(np.uint8)
On my machine, this is little-endian, which makes trimming easier. You can force little-endian order conditionally across all systems:
if values.dtype.byteorder == '>' or (values.dtype.byteorder == '=' and sys.byteorder == 'big'):
view = view[:, ::-1]
Now you can unpack the bits. In fact, unpackbits has a nice feature that I personally added, the count parameter. It allows you to make your output be exactly 20 bits long instead of the full 32, without subsetting. Since the output will be mixed big-endian bits and little-endian bytes, I recommend displaying the bits in little-endian order too, and flipping the entire result:
bin_array = np.unpackbits(view, axis=1, count=20, bitorder='little')[:, ::-1]
The result is a (1<<20, 20) array with the exact values you want.
Lets say i have a single array of 3x4 (3 rows, 4 columns) for example
import numpy as np
data = [[0,5,0,1], [0,5,0,1], [0,5,0,1]]
data = np.array(data)
print(data)
[[0 5 0 1]
[0 5 0 1]
[0 5 0 1]]
and i want to subtract column 4 from column 2 and have the values in their own, named, 3x1 array like this
print(subtraction)
[[4]
[4]
[4]]
how would i go about this in numpy?
result = (data[:, 1] - data[:, 3]).reshape((3, 1))
I have a nested array with some values. I have another array, where the length of both arrays are equal. I'd like to get an output, where I have a nested array of 1's and 0's, such that it is 1 where the value in the second array was equal to the value in that nested array.
I've taken a look on existing stack overflow questions but have been unable to construct an answer.
masks_list = []
for i in range(len(y_pred)):
mask = (y_pred[i] == y_test.values[i]) * 1
masks_list.append(mask)
masks = np.array(masks_list);
Essentially, that's the code I currently have and it works, but I think that it's probably not the most effecient way of doing it.
YPRED:
[[4 0 1 2 3 5 6]
[0 1 2 3 5 6 4]]
YTEST:
8 1
5 4
Masks:
[[0 0 1 0 0 0 0]
[0 0 0 0 0 0 1]]
Another good solution with less line of code.
a = set(y_pred).intersection(y_test)
f = [1 if i in a else 0 for i, j in enumerate(y_pred)]
After that you can check performance like in this answer as follow:
import time
from time import perf_counter as pc
t0=pc()
a = set(y_pred).intersection(y_test)
f = [1 if i in a else 0 for i, j in enumerate(y_pred)]
t1 = pc() - t0
t0=pc()
for i in range(len(y_pred)):
mask = (y_pred[i] == y_test[i]) * 1
masks_list.append(mask)
t2 = pc() - t0
val = t1 - t2
Generally it means if value is positive than the first solution are slower.
If you have np.array instead of list you can try do as described in this answer:
type(y_pred)
>> numpy.ndarray
y_pred = y_pred.tolist()
type(y_pred)
>> list
Idea(least loop): compare array and nested array:
masks = np.equal(y_pred, y_test.values)
you can look at this too:
np.array_equal(A,B) # test if same shape, same elements values
np.array_equiv(A,B) # test if broadcastable shape, same elements values
np.allclose(A,B,...) # test if same shape, elements have close enough values
There is a 2D numpy array of about 500000 rows by 512 values each row:
[
[1,0,1,...,0,0,1], # 512 1's or 0's
[0,1,0,...,0,1,1],
...
[0,0,1,...,1,0,1], # row number 500000
]
How to sort the rows ascending as if each row is a long 512-bit integer?
[
[0,0,1,...,1,0,1],
[0,1,0,...,0,1,1],
[1,0,1,...,0,0,1],
...
]
Instead of converting to strings you can also use a void view (as from #Jaime here) of the data and argsort by that.
def sort_bin(b):
b_view = np.ascontiguousarray(b).view(np.dtype((np.void, b.dtype.itemsize * b.shape[1])))
return b[np.argsort(b_view.ravel())] #as per Divakar's suggestion
Testing
np.random.seed(0)
b = np.random.randint(0, 2, (10,5))
print(b)
print(sort_bin(b))
[[0 1 1 0 1]
[1 1 1 1 1]
[1 0 0 1 0]
...,
[1 0 1 1 0]
[0 1 0 1 1]
[1 1 1 0 1]]
[[0 0 0 0 1]
[0 1 0 1 1]
[0 1 1 0 0]
...,
[1 1 1 0 1]
[1 1 1 1 0]
[1 1 1 1 1]]
Should be much faster and less memory-intensive since b_view is just a view into b
t = np.random.randint(0,2,(2000,512))
%timeit sort_bin(t)
100 loops, best of 3: 3.09 ms per loop
%timeit np.array([[int(i) for i in r] for r in np.sort(np.apply_along_axis(lambda r: ''.join([str(c) for c in r]), 0, t))])
1 loop, best of 3: 3.29 s per loop
About 1000x faster actually
You could sort them in a stable way 512 times, starting with the right-most bit first.
Sort by last bit
Sort by second-last bit, stable (to not mess up results of previous sort)
...
...
Sort by first bit, stable
A smaller example: assume you want to sort these three 2-bit numbers by bits:
11
01
00
In the first step, you sort by the right bit, resulting in:
00
11
01
Now you sort by the first bit, in this case we have two 0s in that column. If your sorting algorithm is not stable it would be allowed to put these equal items in any order in the result, that could cause 01 to appear before 00 which we do not want, so we use a stable sort, keeping the relative order of equal items, for the first column, resulting in the desired:
00
01
11
Creating a string of each row and then applying np.sort()
So if we have an array to test on:
a = np.array([[1,0,0,0],[0,0,0,0],[1,1,1,1],[0,0,1,1]])
We can create strings of each row by using np.apply_along_axis:
a = np.apply_along_axis(lambda r: ''.join([str(c) for c in r]), 0, a)
which would make a now:
array(['1010', '0010', '0011', '0011'], dtype='<U4')
and so now we can sort the strings with np.sort():
a = np.sort(a)
making a:
array(['0010', '0011', '0011', '1010'], dtype='<U4')
we can then convert back to the original format with:
a = np.array([[int(i) for i in r] for r in a])
which makes a:
array([[0, 0, 1, 0],
[0, 0, 1, 1],
[0, 0, 1, 1],
[1, 0, 1, 0]])
And if you wanted to cram this all into one line:
a = np.array([[int(i) for i in r] for r in np.sort(np.apply_along_axis(lambda r: ''.join([str(c) for c in r]), 0, a))])
This is slow but does the job.
def sort_col(arr, col_num=0):
# if we have sorted over all columns return array
if col_num >= arr.shape[1]:
return arr
# sort array over given column
arr_sorted = arr[arr[:, col_num].argsort()]
# if the number of 1s in the given column is not equal to the total number
# of rows neither equal to 0, split on 1 and 0, sort and then merge
if len(arr) > np.sum(arr_sorted[:, col_num]) > 0:
arr_sorted0s = sort_col(arr_sorted[arr_sorted[:, col_num]==0], col_num+1)
arr_sorted1s = sort_col(arr_sorted[arr_sorted[:, col_num]==1], col_num+1)
# change order of stacking if you want ascenting order
return np.vstack((arr_sorted0s, arr_sorted1s))
# if the number of 1s in the given column is equal to the total number
# of rows or equal to 0, just go to the next iteration
return sort_col(arr_sorted, col_num + 1)
np.random.seed(0)
a = np.random.randint(0, 2, (5, 4))
print(a)
print(sort_col(a))
# prints
[[0 1 1 0]
[1 1 1 1]
[1 1 1 0]
[0 1 0 0]
[0 0 0 1]]
[[0 0 0 1]
[0 1 0 0]
[0 1 1 0]
[1 1 1 0]
[1 1 1 1]]
Edit. Or better yet use Daniels solution. I didn't check for new answers before I posted my code.