Changing items in numpy array - python

I want to change all items in array A (in axis=1) into 0, according to the following criteria (toy code):
import numpy as np
A = np.array([[1,3], [2,5], [6,2]] )
B = np.array([[1,1,0,0,0],[1,0,0,2,0],[0,0,2,2,2],[0,0,0,2,0],[6,6,0,0,0]])
for i in A:
if i[1]<=2:
B[B==i[0]]=0
# result
>>> B
array([[1, 1, 0, 0, 0],
[1, 0, 0, 2, 0],
[0, 0, 2, 2, 2],
[0, 0, 0, 2, 0],
[0, 0, 0, 0, 0]])
But, in numpy way, that is NO 'for' loops :) Thanks!

You can use a conditional list comprehension to create a list of the first value in a tuple pair where the second value is less than or equal to two (in the example for A, it is the last item which gives a value of 6).
Then use slicing with np.isin to find the elements in B what are contained within the values from the previous condition, and then set those values to zero.
target_val = 2
B[np.isin(B, [a[0] for a in A if a[1] <= target_val])] = 0
>>> B
array([[1, 1, 0, 0, 0],
[1, 0, 0, 2, 0],
[0, 0, 2, 2, 2],
[0, 0, 0, 2, 0],
[0, 0, 0, 0, 0]])
Alternatively, you could also use np.where instead of slicing.
np.where(np.isin(B, [a[0] for a in A if a[1] <= target_val]), 0, B)

In one line: B[np.isin(B, A[A[:, 1] <= 2][:, 0])] = 0
Explanation:
c = A[:, 1] <= 2 # broadcast the original `if i[1]<=2:` check along axis=1
# i.e., mask A according to where the second values of the pairs are <= 2
d = c[:, 0] # index with the mask, and select the old `i[0]` values, here just `6`
e = np.isin(B, d) # mask B according to where the values are in the above
B[e] = 0 # and zero out those positions, i.e. where the old B value is 6

Related

Python Numpy According a 2-D array's value to assign values to a 3-D array

Let's say, an array A which shape is (2,3) and values are in 0, 1, 2, 3
Another array B which shape is (2, 3, 4)
Goal:According to A position and value to add 1 in B. without using loop. maybe numpy.where? is possible?
Example:
A = [[0, 1, 3],[2, 1, 0]]
B = np.zeros((2, 3, 4))
something I'm looking for help
B = [[[1, 0, 0, 0]
[0, 1, 0, 0]
[0, 0, 0, 1]]
[[0, 0, 1, 0]
[0, 1, 0, 0]
[1, 0, 0, 0]]]
further more, if value in A is Nah, what will happen. can we just do nothing?
Check out this code:
Method-1
B[0,[0,1,2], A[0]] = 1
B[1,[0,1,2], A[1]] = 1
Method-2
import numpy as np
A = [[0, 1, 3],[2, 1, 0]]
B = np.zeros((2, 3, 4))
for i,j in zip(range(len(A)),A):
for k,l in zip(range(len(j)),j):
B[i][k][l] = 1
print(B)
I've got an idea.
one hot coding.
numpy.eye(4)[A]
so that A has the same shape as B.
A + B

How to get start and end of subsection of 2d array where a condition holds

Given a 2d array of 1s and 0s I need to get the coordinates of the subsection where the 1s are. For example, given the array defined below:
a = [[0, 0, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 0]]
My function should return the start, where the subsection of 1s begins as (1,1) and the end, where the subsection of 1s ends as (2,2)
Is there a more efficient way to achieve this than a for loop by using perhaps numpy?
Inspired by this post, here's one way -
def start_end(a):
m,n = a.shape
mask = a.astype(bool)
rowm = mask.any(1)
colm = mask.any(0)
r0 = rowm.argmax()
c0 = colm.argmax()
r1 = m-rowm[::-1].argmax()-1
c1 = n-colm[::-1].argmax()-1
return (r0,c0), (r1,c1)
Sample run -
In [114]: a # bit more generic case
Out[114]:
array([[0, 1, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 1]])
In [115]: start_end(a)
Out[115]: ((0, 1), (2, 3))
You can use the numpy nonzero() method to get a list of row, column pairs where the indices are nonzero. Then you can get the min max on these indices.
ys, xs = np.array(a).nonzero()
min_point = (ys[0], xs[0])
max_point = (ys[-1], xs[-1])
I added another row for validation:
a = [[0, 0, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 0],
[0, 0, 0, 1]]
You could do:
results_list = list()
for i, v in enumerate(a):
if 1 in v:
results_list.append([i, v.index(1)])
Then, for the first and last values:
results_list[0] # [1, 1]
results_list[-1] # [3, 3]
Or, in one swoop:
results_list= list()
while True:
for i, v in enumerate(a):
if 1 in v:
results_list.append([i, v.index(1)])
# Set firs and last locations to variables for use somewhere else.
first_pair, last_pair = results_list[0], results_list[-1]
break
Output:
In [0]: print(f"First pair: {first_pair}\nLast pair: {last_pair}")
Out [0]:
First pair: [1, 1]
Last pair: [3, 3]

Updating numpy 2-dimensional array according to conditions across different 2-D arrays

In the code that I am writing, I have three 2D numpy arrays with the same dimensions (m x n), with each 2D array containing info about a specific trait, but each corresponding cell (with a specific row/col value) across all three 2D arrays corresponding to a specific person. The three 2D arrays are trait1, trait2, and trait3. As an example, person (0, 0) will have traits 1, 2, but not three, if only trait1 and trait2 have a value of 1 at location (0,0), but trait3 does not.
What would be an efficient method of updating a 2D array at a specific location based on the values of other corresponding 2D arrays of the same dimension at the same location? That is, how can I efficiently update a 2D array at a specific location such that the other 2D arrays at this same location fulfill specific conditions?
I am currently trying to update the values of the 2D array trait1 and trait2 according to the current values of trait1 and trait2 (such that the corresponding trait1 value == 1, and the corresponding trait2 value == 0); I am also trying to update the values of trait3 according to the current values of trait1, and trait2 (under the same conditions as the previous). However, I am having trouble doing this without using nested for loops, which greatly slows down my program.
Below is my current approach, which works, but is much too slow for my purposes:
for i in range (0, m):
for j in range (0, n):
if trait1[i][j] == 1:
if trait2[i][j] == 0:
trait1[i][j] = 0
trait2[i][j] = 1
new_color(i, j, 1) #updates the color of the specific person on a grid
trait3[i][j] = 0
elif trait1[i][j] == 0:
if trait2[i][j] <= 0:
trait1[i][j] = 1
trait2[i][j] = 0
new_color(i, j, 0)
Numpy array are really slow if you use loop indeed. If you can use matrices operations / numpy function for everything, it will go much faster.
In your case, you could first extract the indices you're interested about, and then update your matrices like this:
import numpy as np
np.random.seed(1)
# Generate some sample data
trait1, trait2, trait3 = ( np.random.randint(0,2, [4,4]) for _ in range(3) )
In [4]: trait1
Out[4]:
array([[1, 1, 0, 0],
[1, 1, 1, 1],
[1, 0, 0, 1],
[0, 1, 1, 0]])
In [5]: trait2
Out[5]:
array([[0, 1, 0, 0],
[0, 1, 0, 0],
[1, 0, 0, 0],
[1, 0, 0, 0]])
In [6]: trait3
Out[6]:
array([[1, 1, 1, 1],
[1, 0, 0, 0],
[1, 1, 1, 1],
[1, 1, 0, 1]])
And then:
cond1_idx = np.where((trait1 == 1) & (trait2==0))
cond2_idx = np.where((trait1 == 0) & (trait2<=0))
trait1[cond1_idx] = 0
trait2[cond1_idx] = 1
trait3[cond1_idx] = 0
[ new_color(i, j, 1) for i,j in zip(*cond1_idx) ]
trait1[cond2_idx] = 1
trait2[cond2_idx] = 0
[ new_color(i, j, 0) for i,j in zip(*cond2_idx) ]
Result:
In [2]: trait1
Out[2]:
array([[0, 1, 1, 1],
[0, 1, 0, 0],
[1, 1, 1, 0],
[0, 0, 0, 1]])
In [3]: trait2
Out[3]:
array([[1, 1, 0, 0],
[1, 1, 1, 1],
[1, 0, 0, 1],
[1, 1, 1, 0]])
In [4]: trait3
Out[4]:
array([[0, 1, 1, 1],
[0, 0, 0, 0],
[1, 1, 1, 0],
[1, 0, 0, 1]])
I cannot really test the new_color though since I don't have the function

2-D Matrix: Finding and deleting columns that are subsets of other columns

I have a problem where I want to identify and remove columns in a logic matrix that are subsets of other columns. i.e. [1, 0, 1] is a subset of [1, 1, 1]; but neither of [1, 1, 0] and [0, 1, 1] are subsets of each other. I wrote out a quick piece of code that identifies the columns that are subsets, which does (n^2-n)/2 checks using a couple nested for loops.
import numpy as np
A = np.array([[1, 0, 0, 0, 0, 1],
[0, 1, 1, 1, 1, 0],
[1, 0, 1, 0, 1, 1],
[1, 1, 0, 1, 0, 1],
[1, 1, 0, 1, 0, 0],
[1, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0],
[0, 0, 1, 0, 1, 0]])
rows,cols = A.shape
columns = [True]*cols
for i in range(cols):
for j in range(i+1,cols):
diff = A[:,i]-A[:,j]
if all(diff >= 0):
print "%d is a subset of %d" % (j, i)
columns[j] = False
elif all(diff <= 0):
print "%d is a subset of %d" % (i, j)
columns[i] = False
B = A[:,columns]
The solution should be
>>> print B
[[1 0 0]
[0 1 1]
[1 1 0]
[1 0 1]
[1 0 1]
[1 0 0]
[0 1 1]
[0 1 0]]
For massive matrices though, I'm sure there's a way that I could do this faster. One thought is to eliminate subset columns as I go so I'm not checking columns already known to be a subset. Another thought is to vectorize this so don't have O(n^2) operations. Thank you.
Since the A matrices I'm actually dealing with are 5000x5000 and sparse with about 4% density, I decided to try a sparse matrix approach combined with Python's "set" objects. Overall it's much faster than my original solution, but I feel like my process of going from matrix A to list of sets D is not as fast it could be. Any ideas on how to do this better are appreciated.
Solution
import numpy as np
A = np.array([[1, 0, 0, 0, 0, 1],
[0, 1, 1, 1, 1, 0],
[1, 0, 1, 0, 1, 1],
[1, 1, 0, 1, 0, 1],
[1, 1, 0, 1, 0, 0],
[1, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0],
[0, 0, 1, 0, 1, 0]])
rows,cols = A.shape
drops = np.zeros(cols).astype(bool)
# sparse nonzero elements
C = np.nonzero(A)
# create a list of sets containing the indices of non-zero elements of each column
D = [set() for j in range(cols)]
for i in range(len(C[0])):
D[C[1][i]].add(C[0][i])
# find subsets, ignoring columns that are known to already be subsets
for i in range(cols):
if drops[i]==True:
continue
col1 = D[i]
for j in range(i+1,cols):
col2 = D[j]
if col2.issubset(col1):
# I tried `if drops[j]==True: continue` here, but that was slower
print "%d is a subset of %d" % (j, i)
drops[j] = True
elif col1.issubset(col2):
print "%d is a subset of %d" % (i, j)
drops[i] = True
break
B = A[:, ~drops]
print B
Here's another approach using NumPy broadcasting -
A[:,~((np.triu(((A[:,:,None] - A[:,None,:])>=0).all(0),1)).any(0))]
A detailed commented explanation is listed below -
# Perform elementwise subtractions keeping the alignment along the columns
sub = A[:,:,None] - A[:,None,:]
# Look for >=0 subtractions as they indicate non-subset criteria
mask3D = sub>=0
# Check if all elements along each column satisfy that criteria giving us a 2D
# mask which represent the relationship between all columns against each other
# for the non subset criteria
mask2D = mask3D.all(0)
# Finally get the valid column mask by checking for all columns in the 2D mas
# that have at least one element in a column san the diagonal elements.
# Index into input array with it for the final output.
colmask = ~(np.triu(mask2D,1).any(0))
out = A[:,colmask]
Define subset as col1.dot(col1) == col1.dot(col2) if and only if col1 is a subset of col2
Define col1 and col2 are the same if and only if col1 is subset of col2 and vice versa.
I split the work into two. First get rid of all but one equivalent columns. Then remove subsets.
Solution
import numpy as np
def drop_duplicates(A):
N = A.T.dot(A)
D = np.diag(N)[:, None]
drops = np.tril((N == D) & (N == D.T), -1).any(axis=1)
return A[:, ~drops], drops
def drop_subsets(A):
N = A.T.dot(A)
drops = ((N == np.diag(N)).sum(axis=0) > 1)
return A[:, ~drops], drops
def drop_strict(A):
A1, d1 = drop_duplicates(A)
A2, d2 = drop_subsets(A1)
d1[~d1] = d2
return A2, d1
A = np.array([[1, 0, 0, 0, 0, 1],
[0, 1, 1, 1, 1, 0],
[1, 0, 1, 0, 1, 1],
[1, 1, 0, 1, 0, 1],
[1, 1, 0, 1, 0, 0],
[1, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0],
[0, 0, 1, 0, 1, 0]])
B, drops = drop_strict(A)
Demonstration
print B
print
print drops
[[1 0 0]
[0 1 1]
[1 1 0]
[1 0 1]
[1 0 1]
[1 0 0]
[0 1 1]
[0 1 0]]
[False True False False True True]
Explanation
N = A.T.dot(A) is a matrix of every combination of dot product. Per the definition of subset at the top, this will come in handy.
def drop_duplicates(A):
N = A.T.dot(A)
D = np.diag(N)[:, None]
# (N == D)[i, j] being True identifies A[:, i] as a subset
# of A[:, j] if i < j. The relationship is reversed if j < i.
# If A[:, j] is subset of A[:, i] and vice versa, then we have
# equivalent columns. Taking the lower triangle ensures we
# leave one.
drops = np.tril((N == D) & (N == D.T), -1).any(axis=1)
return A[:, ~drops], drops
def drop_subsets(A):
N = A.T.dot(A)
# without concern for removing equivalent columns, this
# removes any column that has an off diagonal equal to the diagonal
drops = ((N == np.diag(N)).sum(axis=0) > 1)
return A[:, ~drops], drops

Nested List Indices [duplicate]

This question already has answers here:
List of lists changes reflected across sublists unexpectedly
(17 answers)
Closed 9 years ago.
I have experienced some problem by using a nested list in Python in the code shown bleow.
Basically, I have a 2D list contains all 0 values, I want to update the list value in a loop.
However, Python does not produce the result I want. Is there something that I misunderstand about range() and Python list indices?
some_list = 4 * [(4 * [0])]
for i in range(3):
for j in range(3):
some_list[i+1][j+1] = 1
for i in range(4):
print(some_list[i])
The results I expected are:
[0, 0, 0, 0]
[0, 1, 1, 1]
[0, 1, 1, 1]
[0, 1, 1, 1]
But the actual results from Python are:
[0, 1, 1, 1]
[0, 1, 1, 1]
[0, 1, 1, 1]
[0, 1, 1, 1]
What's going on here?
The problem is caused by the fact that python chooses to pass lists around by reference.
Normally variables are passed "by value", so they operate independently:
>>> a = 1
>>> b = a
>>> a = 2
>>> print b
1
But since lists might get pretty large, rather than shifting the whole list around memory, Python chooses to just use a reference ('pointer' in C terms). If you assign one to another variable, you assign just the reference to it. This means that you can have two variables pointing to the same list in memory:
>>> a = [1]
>>> b = a
>>> a[0] = 2
>>> print b
[2]
So, in your first line of code you have 4 * [0]. Now [0] is a pointer to the value 0 in memory, and when you multiply it, you get four pointers to the same place in memory. BUT when you change one of the values then Python knows that the pointer needs to change to point to the new value:
>>> a = 4 * [0]
>>> a
[0, 0, 0, 0]
>>> [id(v) for v in a]
[33302480, 33302480, 33302480, 33302480]
>>> a[0] = 1
>>> a
[1, 0, 0, 0]
The problem comes when you multiply this list - you get four copies of the list pointer. Now when you change one of the values in one list, all four change together:
>>> a[0][0] = 1
>>> a
[[1, 0, 0, 0], [1, 0, 0, 0], [1, 0, 0, 0], [1, 0, 0, 0]]
The solution is to avoid the second multiplication. A loop does the job:
>>> some_list = [(4 * [0]) for _ in range(4)]
Actually all the objects in your list are same, so changing one changes others too:
In [151]: some_list = 4 * [(4 * [0])]
In [152]: [id(x) for x in some_list]
Out[152]: [148641452, 148641452, 148641452, 148641452]
In [160]: some_list[0][1]=5 #you think you changed the list at index 0 here
In [161]: some_list
Out[161]: [[0, 5, 0, 0], [0, 5, 0, 0], [0, 5, 0, 0], [0, 5, 0, 0]] #but all lists are changed
Create your list this way:
In [156]: some_list=[[0]*4 for _ in range(4)]
In [157]: some_list
Out[157]: [[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]
In [158]: [id(x) for x in some_list]
Out[158]: [148255436, 148695180, 148258380, 148255852]
In [163]: some_list[0][1]=5
In [164]: some_list
Out[164]: [[0, 5, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]] #works fine in this case

Categories

Resources