Find out indices of an array elements - python

I have this array as a result of subtracting two images after getting there RGB integer values as an arrays
arr = img1 - img2
[[[0 0 0]
[0 0 0]
[0 0 0]
...,
[0 0 0]
[0 0 0]
[0 0 0]]
...,
[[0 0 0]
[0 0 0]
[0 0 0]
...,
[0 0 0]
[0 0 0]
[0 0 0]]]
i used these lines of code to change the shape of array to add the indices of each pixel subtraction
x, y, z = arr.shape
indices = np.vstack(np.unravel_index(np.arange(x*y), (y, x))).T
result = np.hstack((arr.reshape(x*y, z), indices))
and here what the result looks like:
[[ 0 0 0 0 0]
[ 0 0 0 0 1]
[ 0 0 0 0 2]
...,
[ 0 0 0 511 509]
[ 0 0 0 511 510]
[ 0 0 0 511 511]]
the first three values in each row is the RGB difference and the last two values is the X and Y indices
my question here, is there an efficient way to find the indices of the non zero values?

If I understand what you're saying correctly, you want them per list in your list of lists...
Saying this is your list:
l=[[0,0,0,0,0],[0,0,0,0,1],[0,0,0,0,2],[0,0,0,511,509],[0,0,0,511,510],[0,0,0,511,511]]
try running:
import numpy as np
ans=[np.nonzero(l[i])[0] for i in range(1,len(l))]
print ans
returns:
[array([4]), array([4]), array([3, 4]), array([3, 4]), array([3, 4])]
So it's an array containing arrays that have the indices of each non-zero element in each list. Since it uses list comprehension it runs pretty quickly and accession is as simple as using the indices. It will just be ans[list in list of lists][number of non-zero indices] like so:
ans[2][1]
4

Related

one hot encode with pandas get_dummies missing values

I have a dataset in the form of a DataFrame and each row has a label ranging from 1-5. I am doing a one hot encode using pd.get_dummies(). If my dataset has all 5 labels there is not problem. However not all sets contain all 5 numbers so the encode just skips the missing value and creates a problem for new datasets coming in. Can I set a range so that the one hot encode knows there should be 5 labels? Or would I have to append 1,2,3,4,5 to the end of the array before I perform the encode and then delete the last 5 entries?
Correct encode: values 1-5 are encoded
arr = np.array([1,2,5,3,1,5,1,4])
df = pd.DataFrame(arr, columns = ['test'])
hotarr = np.array(pd.get_dummies(df['test']))
>>>[[1 0 0 0 0]
[0 1 0 0 0]
[0 0 0 0 1]
[0 0 1 0 0]
[1 0 0 0 0]
[0 0 0 0 1]
[1 0 0 0 0]
[0 0 0 1 0]]
Missing value encode: this dataset is missing label 4.
arr = np.array([1,2,5,3,1,5,1,])
df = pd.DataFrame(arr, columns = ['test'])
hotarr = np.array(pd.get_dummies(df['test']))
>>>[[1 0 0 0]
[0 1 0 0]
[0 0 0 1]
[0 0 1 0]
[1 0 0 0]
[0 0 0 1]
[1 0 0 0]]
Set up the CategoricalDtype before encoding to ensure all categories are represented when getting dummies:
import numpy as np
import pandas as pd
arr = np.array([1, 2, 5, 3, 1, 5, 1])
df = pd.DataFrame(arr, columns=['test'])
# Setup Categorical Dtype
df['test'] = df['test'].astype(pd.CategoricalDtype(categories=[1, 2, 3, 4, 5]))
hotarr = np.array(pd.get_dummies(df['test']))
print(hotarr)
Alternatively can reindex after get_dummies with fill_value=0 to add the missing columns:
hotarr = np.array(pd.get_dummies(df['test'])
.reindex(columns=[1, 2, 3, 4, 5], fill_value=0))
Both produce hotarr with 5 columns even though input does not contain 4:
[[1 0 0 0 0]
[0 1 0 0 0]
[0 0 0 0 1]
[0 0 1 0 0]
[1 0 0 0 0]
[0 0 0 0 1]
[1 0 0 0 0]]

applying conditions to arrays with different channels

I have the following simple program
import numpy as np
thepixels = np.array([[0, 5 ], [5, 0 ]])
print(thepixels.shape)
cnd= thepixels[:]>3
print(cnd)
print(thepixels[cnd])
layer4= np.zeros((2,2,4),dtype=np.uint8)
print("the array")
print(layer4)
print("the info")
print(layer4.dtype)
print(layer4.shape)
Which gives the output
(2, 2)
[[False True]
[ True False]]
[5 5]
the array
[[[0 0 0 0]
[0 0 0 0]]
[[0 0 0 0]
[0 0 0 0]]]
the info
uint8
(2, 2, 4)
you can see that there is an array of shape (2,2) which serves me to find a condition that I want to apply to my zero array of shape (2,2,4)
What I am scratching my head to do (with numpy) is:
Given:
a channel number: nchannel
a value: value
apply the condition so that I can have the value in the array on the `nchannel.
For example:
nchannel= 1
value=10
What I want to get is
[[[0 0 0 0]
[0 10 0 0]]
[[0 10 0 0]
[0 0 0 0]]]
or if value is 50 and nchannel is 4 then
[[[0 0 0 0]
[0 0 0 50]]
[[0 0 0 50]
[0 0 0 0]]]
How can I apply the condition to get these arrays?
P.S. I know that by doing layer4[:,:,nchannel]=value I can apply the value unconditionally to the channel, but how do I apply it depending on the condition?
Your question is not crystal clear to me, but at least this gives me your expected results.
...
layer4[cnd, nchannel] = new_value
...

Python plot 2D array with black and white cells

I want to plot 2D array of 1's and 0's in Python with black and white cells using pyplot.imshow(). If there is '1' then the cell color should be black and if it's '0' the cell color should be white.
I tried this:
grid = np.zeros((4, 4, 4), int)
choices = np.random.choice(grid.size, 6, replace=False)
grid.ravel()[choices] = 1
plt.imshow(grid, cmap='gray')
plt.show()
This is how the output looks like with this code
If you meant to create a 3-dimensional grid, than you are probably interested in plotting all slices:
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(2020)
grid = np.zeros((4, 4, 4), int)
choices = np.random.choice(grid.size, 6, replace=False)
grid.ravel()[choices] = 1
print(grid)
fig,ax=plt.subplots(2,2,figsize=(6,6))
for i,a in enumerate(ax.flatten()):
a.imshow(grid[i,:,:], cmap='gray_r',)
a.set_title(f"slice {i}")
plt.show()
yields:
[[[0 0 0 0]
[0 0 0 0]
[0 0 0 0]
[1 1 0 0]]
[[0 0 0 0]
[0 0 0 0]
[0 0 0 0]
[0 0 0 0]]
[[0 0 0 0]
[0 1 0 0]
[0 0 1 0]
[0 0 0 0]]
[[0 1 0 0]
[1 0 0 0]
[0 0 0 0]
[0 0 0 0]]]
and this image:
If, however, you wanted to plot in 2d, then use:
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(2020)
grid = np.zeros((4, 4), int)
choices = np.random.choice(grid.size, 6, replace=False)
grid.ravel()[choices] = 1
print(grid)
plt.imshow(grid,cmap='gray_r')
plt.show()
yields:
[[0 1 1 0]
[1 0 0 0]
[0 1 1 0]
[1 0 0 0]]

Take non-zero elements in a macro-list

I have a problem with the instruction np.nonzero() in python. I want to take all the indices of a given list that are non zero. So, consider that I have the following code:
import numpy as np
from scipy.special import binom
M=4
N=3
def generate(N,nb):
states = np.zeros((int(binom(nb+N-1, nb)), N), dtype=int)
states[0, 0]=nb
ni = 0 # init
for i in xrange(1, states.shape[0]):
states[i,:N-1] = states[i-1, :N-1]
states[i,ni] -= 1
states[i,ni+1] += 1+states[i-1, N-1]
if ni >= N-2:
if np.any(states[i, :N-1]):
ni = np.nonzero(states[i, :N-1])[0][-1]
else:
ni += 1
return states
base = generate(M,N)
The result of base is given by:
base = [[3 0 0 0]
[2 1 0 0]
[2 0 1 0]
[2 0 0 1]
[1 2 0 0]
[1 1 1 0]
[1 1 0 1]
[1 0 2 0]
[1 0 1 1]
[1 0 0 2]
[0 3 0 0]
[0 2 1 0]
[0 2 0 1]
[0 1 2 0]
[0 1 1 1]
[0 1 0 2]
[0 0 3 0]
[0 0 2 1]
[0 0 1 2]
[0 0 0 3]]
The point is that for a given index j,k I want to take all the items in base that has non-zero components in the sites j,k, for example:
Taking j=0,k=1 I have to obtain:
result = [1 4 5 6]
which corresponds to the elements 1,4,5,6 of base that satisfies this condition. On the other hand, I have used the command:
np.nonzero((base[:, j]) & (base[:, k]))[0]
but it doesn't work correctly, any idea why?
First of all, the syntax for list index base[:, j] is wrong, use : [:][j] instead
also:
np.nonzero((base[:, j]) & (base[:, k]))[0]
won't work ,because the & sign is not applicable here..
you could use numpy like this:
b = np.array(base);
j=0;k=1;
np.nonzero(b.T[j]* b.T[k])[0]
which will give:
array([1, 4, 5, 6])

quickly calculate randomized 3D numpy array from 2D numpy array

I have a 2-dimensional array of integers, we'll call it "A".
I want to create a 3-dimensional array "B" of all 1s and 0s such that:
for any fixed (i,j) sum(B[i,j,:])==A[i.j], that is, B[i,j,:] contains A[i,j] 1s in it
the 1s are randomly placed in the 3rd dimension.
I know how I would do this using standard python indexing but this turns out to be very slow.
I am looking for a way to do this that takes advantage of the features that can make Numpy fast.
Here is how I would do it using standard indexing:
B=np.zeros((X,Y,Z))
indexoptions=range(Z)
for i in xrange(Y):
for j in xrange(X):
replacedindices=np.random.choice(indexoptions,size=A[i,j],replace=False)
B[i,j,[replacedindices]]=1
Can someone please explain how I can do this in a faster way?
Edit: Here is an example "A":
A=np.array([[0,1,2,3,4],[0,1,2,3,4],[0,1,2,3,4],[0,1,2,3,4],[0,1,2,3,4]])
in this case X=Y=5 and Z>=5
Essentially the same idea as #JohnZwinck and #DSM, but with a shuffle function for shuffling a given axis:
import numpy as np
def shuffle(a, axis=-1):
"""
Shuffle `a` in-place along the given axis.
Apply numpy.random.shuffle to the given axis of `a`.
Each one-dimensional slice is shuffled independently.
"""
b = a.swapaxes(axis,-1)
# Shuffle `b` in-place along the last axis. `b` is a view of `a`,
# so `a` is shuffled in place, too.
shp = b.shape[:-1]
for ndx in np.ndindex(shp):
np.random.shuffle(b[ndx])
return
def random_bits(a, n):
b = (a[..., np.newaxis] > np.arange(n)).astype(int)
shuffle(b)
return b
if __name__ == "__main__":
np.random.seed(12345)
A = np.random.randint(0, 5, size=(3,4))
Z = 6
B = random_bits(A, Z)
print "A:"
print A
print "B:"
print B
Output:
A:
[[2 1 4 1]
[2 1 1 3]
[1 3 0 2]]
B:
[[[1 0 0 0 0 1]
[0 1 0 0 0 0]
[0 1 1 1 1 0]
[0 0 0 1 0 0]]
[[0 1 0 1 0 0]
[0 0 0 1 0 0]
[0 0 1 0 0 0]
[1 0 1 0 1 0]]
[[0 0 0 0 0 1]
[0 0 1 1 1 0]
[0 0 0 0 0 0]
[0 0 1 0 1 0]]]

Categories

Resources