Related
I have 3D arrays filled with ones and zeros (created through pyellipsoid). The array is uint8. I wanted to know the number of 1s. I used sum(sum(sum(array))) to do this and it worked fine for small arrays (up to approx.5000 entries).
I compared sum(sum(sum(array))) to numpy.count_nonzero(array) for a known number of nonzero entries. For bigger arrays the answers from "sum" are always wrong and lower than they should be.
If I use float64 arrays it works fine with big arrays. If I change the data type to uint8 it does not work.
Why is that? I am sure there is a very simple reason, but I can't find an answer.
Small array example:
test = numpy.zeros((2,2,2))
test[0,0,0] = 1
test[1,0,0] = 1
In: test
Out:
array([[[1., 0.],
[0., 0.]],
In: sum(sum(sum(test)))
Out: 2.0
Big example (8000 entries, only one zero, 7999 ones):
test_big=np.ones((20,20,20))
test_big[0,0,0] = 0
test_big
Out[77]:
array([[[0., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.],
...,
[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.]],
[[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.],
...,
[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.]],
[[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.],
...,
[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.]],
...,
[[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.],
...,
[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.]],
[[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.],
...,
[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.]],
[[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.],
...,
[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.],
[1., 1., 1., ..., 1., 1., 1.]]])
In: sum(sum(sum(test_big)))
Out: 7999.0
So far so good. Here, the data type of the sum output is float64. But if I now change the data type of the array to the type that is used with pyellipsoid (uint8)...
In: test_big = test_big.astype('uint8')
In: sum(sum(sum(test_big)))
Out: 2879
So obviously 2879 is not 7999. Here, the data type of the sum output is int32 (-2147483648 to 2147483647) so this should be big enough for 7999, right? I guess it has something to do with the data type, but how? Why?
(I am using Spyder in Anaconda on Windows).
The issue is as you guessed - there is an integer overflow. If you take a look at sum(sum(test_big)), you will notice that the values are wrong there.
The part that is wrong is that integer overflow can occur in your sum() functions which are taking the partial sums.
What I would suggest is making a sum of this array using np.sum() as it does give an appropriate sum despite of data type.
I am currently coding Conway's game of life and to add randomization to my world I have implemented a function to create a random matrix with 1 and 0 with n rows and n columns.
The problem is that, for my code to work, I need a random matrix of 1 and 0 but they have to be floats, so 0.0 and 1.0
So I cannot use:
rand_matrix = numpy.random.randint(0, 2, size=n)
Instead I have tried:
n = 10
one_zero = [0.0,1.0]
rand_matrix = np.array([n*[random.choice(one_zero),random.choice(one_zero)],n*[random.choice(one_zero),random.choice(one_zero)]])
Getting:
[[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[0. 1. 0. 1. 0. 1. 0. 1. 0. 1. 0. 1. 0. 1. 0. 1. 0. 1. 0. 1.]]
But this returns a matrix of different dimensions, what I want is a random n by n matrix. And each row should be a combination of 0.0 and 1.0 randomly chosen.
>>> np.random.randint(0, 2, (10, 10)).astype(float)
array([[0., 0., 0., 1., 1., 0., 1., 1., 1., 1.],
[1., 1., 1., 1., 0., 1., 1., 0., 0., 1.],
[1., 0., 1., 0., 1., 1., 1., 1., 0., 0.],
[1., 1., 0., 1., 1., 1., 1., 1., 1., 1.],
[0., 0., 0., 1., 1., 1., 0., 1., 0., 1.],
[0., 1., 1., 1., 0., 0., 1., 0., 0., 0.],
[0., 1., 1., 0., 1., 0., 0., 1., 1., 0.],
[0., 0., 0., 1., 0., 0., 1., 0., 1., 0.],
[0., 1., 1., 1., 1., 1., 1., 0., 1., 1.],
[0., 1., 0., 1., 1., 1., 0., 0., 0., 0.]])
You can just cast the randint() matrix to float:
>>> import numpy as np
>>> n = 10
>>> rand_matrix = np.random.randint(0, 2, size=(n, n)).astype(float)
array([[0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
[0., 0., 0., 1., 1., 1., 0., 0., 1., 1.],
[1., 1., 1., 1., 0., 0., 1., 1., 0., 0.],
[1., 0., 0., 1., 0., 1., 0., 1., 0., 1.],
[0., 0., 0., 0., 1., 1., 0., 1., 1., 1.],
[0., 1., 1., 1., 1., 0., 1., 1., 1., 0.],
[0., 0., 1., 0., 0., 0., 0., 1., 0., 1.],
[1., 0., 0., 1., 0., 0., 1., 1., 1., 0.],
[1., 0., 0., 0., 0., 0., 1., 1., 1., 1.],
[1., 0., 0., 1., 1., 1., 1., 1., 1., 0.]])
>>>
Is there any efficient way (numpy style) to generate all n choose k binary vectors (with k ones)?
for example, if n=3 and k=2, then I want to generate (1,1,0), (1,0,1), (0,1,1).
Thanks
I do not know how efficient this is, but here is a way:
from itertools import combinations
import numpy as np
n, k = 5, 3
np.array(
[
[1 if i in comb else 0 for i in range(n)]
for comb in combinations(np.arange(n), k)
]
)
>>>
array([[1., 1., 1., 0., 0.],
[1., 1., 0., 1., 0.],
[1., 1., 0., 0., 1.],
[1., 0., 1., 1., 0.],
[1., 0., 1., 0., 1.],
[1., 0., 0., 1., 1.],
[0., 1., 1., 1., 0.],
[0., 1., 1., 0., 1.],
[0., 1., 0., 1., 1.],
[0., 0., 1., 1., 1.]])
Given a pytorch tensor in dtype=int8:
tensor([[[-3, -6, -1],
[-6, -10, -1,
[9, 9, 6],
[[-4, -7, -3],
[-4, -6, -1],
[14, 16, 8]],
[[-4, -6, -2],
[-6, -9, -2],
[9, 10, 5]]], device='cuda:0', dtype=torch.int8)
How do I convert the above tensor into its binary representation?
I tried to convert to numpy to use np.unpackbits function but it only takes un-sign integer 8 as input.
Change the dtype from torch.int8 to torch.uint8.
This is the required output that I want:
tensor([[[[1., 1., 1., 1., 1., 1., 0., 1.],
[1., 1., 1., 1., 1., 0., 1., 0.],
[1., 1., 1., 1., 1., 1., 1., 1.]],
[[1., 1., 1., 1., 1., 0., 1., 0.],
[1., 1., 1., 1., 0., 1., 1., 0.],
[1., 1., 1., 1., 1., 1., 1., 1.]],
[[0., 0., 0., 0., 1., 0., 0., 1.],
[0., 0., 0., 0., 1., 0., 0., 1.],
[0., 0., 0., 0., 0., 1., 1., 0.]]],
[[[1., 1., 1., 1., 1., 1., 0., 0.],
[1., 1., 1., 1., 1., 0., 0., 1.],
[1., 1., 1., 1., 1., 1., 0., 1.]],
[[1., 1., 1., 1., 1., 1., 0., 0.],
[1., 1., 1., 1., 1., 0., 1., 0.],
[1., 1., 1., 1., 1., 1., 1., 1.]],
[[0., 0., 0., 0., 1., 1., 1., 0.],
[0., 0., 0., 1., 0., 0., 0., 0.],
[0., 0., 0., 0., 1., 0., 0., 0.]]],
[[[1., 1., 1., 1., 1., 1., 0., 0.],
[1., 1., 1., 1., 1., 0., 1., 0.],
[1., 1., 1., 1., 1., 1., 1., 0.]],
[[1., 1., 1., 1., 1., 0., 1., 0.],
[1., 1., 1., 1., 0., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 0., 0.]],
[[0., 0., 0., 0., 1., 0., 0., 1.],
[0., 0., 0., 0., 1., 0., 1., 0.],
[0., 0., 0., 0., 0., 1., 0., 1.]]]], device='cuda:0',
grad_fn=<RemainderBackward0>)
I found a helpful repo that has a function which converts int8 into binary like above
https://github.com/KarenUllrich/pytorch-binary-converter.git
Adapted from my answer for Convert integer to pytorch tensor of binary bits, here's something more concise than the repo from your answer:
a = torch.tensor([[[-3, -6, -1],
[-6, -10, -1],
[ 9, 9, 6]],
[[-4, -7, -3],
[-4, -6, -1],
[14, 16, 8]],
[[-4, -6, -2],
[-6, -9, -2],
[ 9, 10, 5]]], dtype=torch.int8)
def int_to_bits(x, bits=None, dtype=torch.uint8):
assert not(x.is_floating_point() or x.is_complex()), "x isn't an integer type"
if bits is None: bits = x.element_size() * 8
mask = 2**torch.arange(bits-1,-1,-1).to(x.device, x.dtype)
return x.unsqueeze(-1).bitwise_and(mask).ne(0).to(dtype=dtype)
int_to_bits(a, dtype=torch.float32)
This returns:
tensor([[[[1., 1., 1., 1., 1., 1., 0., 1.],
[1., 1., 1., 1., 1., 0., 1., 0.],
[1., 1., 1., 1., 1., 1., 1., 1.]],
[[1., 1., 1., 1., 1., 0., 1., 0.],
[1., 1., 1., 1., 0., 1., 1., 0.],
[1., 1., 1., 1., 1., 1., 1., 1.]],
[[0., 0., 0., 0., 1., 0., 0., 1.],
[0., 0., 0., 0., 1., 0., 0., 1.],
[0., 0., 0., 0., 0., 1., 1., 0.]]],
[[[1., 1., 1., 1., 1., 1., 0., 0.],
[1., 1., 1., 1., 1., 0., 0., 1.],
[1., 1., 1., 1., 1., 1., 0., 1.]],
[[1., 1., 1., 1., 1., 1., 0., 0.],
[1., 1., 1., 1., 1., 0., 1., 0.],
[1., 1., 1., 1., 1., 1., 1., 1.]],
[[0., 0., 0., 0., 1., 1., 1., 0.],
[0., 0., 0., 1., 0., 0., 0., 0.],
[0., 0., 0., 0., 1., 0., 0., 0.]]],
[[[1., 1., 1., 1., 1., 1., 0., 0.],
[1., 1., 1., 1., 1., 0., 1., 0.],
[1., 1., 1., 1., 1., 1., 1., 0.]],
[[1., 1., 1., 1., 1., 0., 1., 0.],
[1., 1., 1., 1., 0., 1., 1., 1.],
[1., 1., 1., 1., 1., 1., 1., 0.]],
[[0., 0., 0., 0., 1., 0., 0., 1.],
[0., 0., 0., 0., 1., 0., 1., 0.],
[0., 0., 0., 0., 0., 1., 0., 1.]]]])
I have a numpy array. I want to modify one array index by the chosen elements of another array. For example:
import numpy as np
t1 = np.ones((10,3))
t2 = np.arange(10)
t1[np.where(t2>5)][:,2] = 10
print(t1)
What I want t1 is:
array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 10.],
[1., 1., 10.],
[1., 1., 10.],
[1., 1., 10.]])
But the output of t1 is:
array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])
What the problem there?
It's backwards, it should be:
t1[:,2][np.where(t2>5)] = 10
output:
array([[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 10.],
[ 1., 1., 10.],
[ 1., 1., 10.],
[ 1., 1., 10.]])
The most pythonic way to do this is probably
t1[:, 2] = np.where(t2 > 5, # where t2 > 5
10, # put 10
t1[:, 2]) # into the third column of t1
Besides added clarity, for very large objects this will have a time benefit, as there is no creation of an intermediate indexing array np.where(t2 > 5) and no resulting intermediate callbacks to that python object - everything is done in-place with c-compiled code.
You can do:
t1[np.where(t2>5), 2] = 10
Syntax: array[<row>, <col>]