Iterate over array and make calculation on elements - python

I have an array that I want to sum specific elements while iterating through it. I struggle to find a way to do this with loop.
The array shape is (25,25)
array
[ 92843, 86851, 91950, 98232, 83329, 94591, 88962, 97020,
107113, 98452, 103242, 106442, 123032, 119063, 112971, 114715,
108654, 114856, 109872, 124583, 120518, 112815, 120780, 127831,
147174],
[132633, 124073, 131357, 140331, 119041, 135131, 127089, 138601,
153019, 140647, 147489, 152061, 175761, 170090, 161388, 163879,
155221, 164080, 156960, 177976, 172169, 161165, 172544, 182617,
210249],
[159159, 148887, 157629, 168397, 142849, 162157, 152507, 166321,
183623, 168776, 176986, 182473, 210913, 204108, 193665, 196655,
186265, 196896, 188352, 213571, 206602, 193398, 207052, 219140,
252298]
I want to print out results like below for each iteration
print(array[23][0]+array[23][1]) # 159159 + 148887 = 308046
print(array[22][0]+array[22][1]+array[22][2]) #132633 + 124073 + 131357 = 388063
print(array[21][0]+array[21][1]+array[21][2]+array[21][3]) # 92843 + 86851 + 91950 + 98232 = 369876
Presenting each element as array[i][j], as you can see in each iteration i - 1, and the "length" of j increased one.
Is there anyway I can use loop to do this task ? Thanks!

Try this:
for i, sub in enumerate(reversed(array)):
print(sum(sub[:i]))
For example, if
array = [[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20],
[21, 22, 23, 24, 25]]
the output would be
0 # last row, no elements summed
16 # 16 = 16
23 # 11 + 12 = 23
21 # 6 + 7 + 8 = 21
10 # 1 + 2 + 3 + 4 = 10

You may simply want the np.tril, followed by a np.sum(_, axis=0). This will give the sum of each row of the lower triangle of the matrix. Easily altered to give the upper triangle, if that's what you need.
print(np.sum(np.tril(array), axis=0))

In [661]: arr = np.arange(1,17).reshape(4,4)
In [662]: arr
Out[662]:
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12],
[13, 14, 15, 16]])
In [666]: for i in range(3,-1,-1):
...: c = arr[i,:4-i]
...: print(c.sum(), c)
...:
13 [13]
19 [ 9 10]
18 [5 6 7]
10 [1 2 3 4]

Related

Numpy filter matrix based on column

I have a matrix with several different values for each row:
arr1 = np.array([[1,2,3,4,5,6,7,8,9],[10,11,12,13,14,15,16,17,18],[19,20,21,22,23,24,25,26,27]])
arr2 = np.array([["A"],["B"],["C"]])
This produces the following matrices:
array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18],
[19, 20, 21, 22, 23, 24, 25, 26, 27]])
array([['A'],
['B'],
['C']])
A represents the first 3 columns, B represents the next 3 columns, and C represents the last 3 columns. So the result I'd like here is:
array([[1,2,3],
[13,14,15],
[25,26,27]])
I was thinking about converting arr2 to a mask array, but I'm not even sure how to do this. If it was a 1darray I could do something like this:
arr[0,1,2]
but for a 2darray I'm not even sure how to mask like this. I tried this and got errors:
arr[[0,1,2],[3,4,5],[6,7,8]]
What's the best way to do this?
Thanks.
You could use string.ascii_uppercase to index the index in the alphabet. And reshape arr1 by 3 chunks:
from string import ascii_uppercase
reshaped = np.reshape(arr1, (len(arr1), -1, 3))
reshaped[np.arange(len(arr1)), np.vectorize(ascii_uppercase.index)(arr2).ravel()]
Or just directly map A to 0 and so on...
reshaped = np.reshape(arr1, (len(arr1), -1, 3))
reshaped[np.arange(len(arr1)), np.vectorize(['A', 'B', 'C'].index)(arr2).ravel()]
Both Output:
array([[ 1, 2, 3],
[13, 14, 15],
[25, 26, 27]])
If you gonna have shape of arr1 fixed as shown above (3,9) then it can be done with single line of code as below:
arr2 = np.array([arr1[0][0:3],arr1[1][3:6],arr1[2][6:9]])
The output will be as follows:
[[ 1 2 3]
[13 14 15]
[25 26 27]]
you can use 'advanced indexing' which index the target array by coordinate arrays.
rows = np.array([[0,0,0],[1,1,1],[2,2,2]])
cols = np.array([[0,1,2],[3,4,5],[6,7,8]])
arr1[rows, cols]
>>> array([[ 1, 2, 3],
[13, 14, 15],
[25, 26, 27]])
and you can make some functions like
def diagonal(arr, step):
rows = np.array([[x]*step for x in range(step)])
cols = np.array([[y for y in range(x, x+step)] for x in range(0, step**2, step)])
return arr[rows, cols]
diagonal(arr1, 3)
>>> array([[ 1, 2, 3],
[13, 14, 15],
[25, 26, 27]])
reference: https://numpy.org/devdocs/user/basics.indexing.html

Numpy where() using a condition that changes with the items position in the array

I'm trying to build a grid world using numpy.
The grid is 4*4 and laid out in a square.
The first and last squares (i.e. 1 and 16) are terminal squares.
At each time step you can move one step in any direction either: up, down , left or right.
Once you enter one of the terminal squares no further moves are possible and the game terminates.
The first and last columns are the left and right edges of the square whilst the first and last rows represent the top and bottom edges.
If you are on an edge, for example the left one and attempt to move left, instead of moving left you stay in the square you started in. Similarly you remain in the same square if you try and cross any of the other edges.
Although the grid is a square I've implemented it as an array.
States_r calculates the position of the states after a move right. 1 and 16 stay where they are because they are terminal states (note the code uses zero based counting so 1 and 16 are 0 and 15 respectively in the code).
The rest of the squares are in increased by one. The code for states_r works however those squares on the right edge i.e. (4, 8, 12) should also stay where they are but states_r code doesn't do that.
State_l is my attempt to include the edge condition for the left edge of the square. The logic is the same the terminal states (1, 16) should not move nor should those squares on the left edge (5, 9, 13). I think the general logic is correct but it's producing an error.
states = np.arange(16)
states_r = states[np.where((states + 1 <= 15) & (states != 0), states + 1, states)]
states_l = states[np.where((max(1, (states // 4) * 4) <= states - 1) & (states != 15), states - 1, states)]
The first example states_r works, it handles the terminal state but does not handle the edge condition.
The second example is my attempt to include the edge condition, however it is giving me the following error:
"The truth value of an array with more than one element is ambiguous."
Can someone please explain how to fix my code?
Or alternatively suggest another solution,ideally I want the code to be fast (so I can scale it up) so I want to avoid for loops if possible?
If I understood correctly you want arrays which indicate for each state where the next state is, depending on the move (right, left, up, down).
If so, I guess your implementation of state_r is not quit right. I would suggest to switch to a 2D representation of your grid, because a lot of the things you describe are easier and more intuitive to handle if you have x and y directly (at least for me).
import numpy as np
n = 4
states = np.arange(n*n).reshape(n, n)
states_r, states_l, states_u, states_d = (states.copy(), states.copy(),
states.copy(), states.copy())
states_r[:, :n-1] = states[:, 1:]
states_l[:, 1:] = states[:, :n-1]
states_u[1:, :] = states[:n-1, :]
states_d[:n-1, :] = states[1:, :]
# up [[ 0, 1, 2, 3],
# left state right [ 0, 1, 2, 3],
# down [ 4, 5, 6, 7],
# [ 8, 9, 10, 11]]
#
# [[ 0, 0, 1, 2], [[ 0, 1, 2, 3], [[ 1, 2, 3, 3],
# [ 4, 4, 5, 6], [ 4, 5, 6, 7], [ 5, 6, 7, 7],
# [ 8, 8, 9, 10], [ 8, 9, 10, 11], [ 9, 10, 11, 11],
# [12, 12, 13, 14]] [12, 13, 14, 15]] [13, 14, 15, 15]]
#
# [[ 4, 5, 6, 7],
# [ 8, 9, 10, 11],
# [12, 13, 14, 15],
# [12, 13, 14, 15]]
If you want to exclude the terminal states, you can do something like this:
terminal_states = np.zeros((n, n), dtype=bool)
terminal_states[0, 0] = True
terminal_states[-1, -1] = True
states_r[terminal_states] = states[terminal_states]
states_l[terminal_states] = states[terminal_states]
states_u[terminal_states] = states[terminal_states]
states_d[terminal_states] = states[terminal_states]
If you prefer the 1D approach:
import numpy as np
n = 4
states = np.arange(n*n)
valid_s = np.ones(n*n, dtype=bool)
valid_s[0] = False
valid_s[-1] = False
states_r = np.where(np.logical_and(valid_s, states % n < n-1), states+1, states)
states_l = np.where(np.logical_and(valid_s, states % n > 0), states-1, states)
states_u = np.where(np.logical_and(valid_s, states > n-1), states-n, states)
states_d = np.where(np.logical_and(valid_s, states < n**2-n), states+n, states)
Another way of doing it without preallocating arrays:
states = np.arange(16).reshape(4,4)
states_l = np.hstack((states[:,0][:,None],states[:,:-1],))
states_r = np.hstack((states[:,1:],states[:,-1][:,None]))
states_d = np.vstack((states[1:,:],states[-1,:]))
states_u = np.vstack((states[0,:],states[:-1,:]))
To get them all in 1-D, you can always flatten()/ravel()/reshape(-1) the 2-D arrays.
[[ 0 1 2 3]
[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[ 0 0 1 2] [[ 0 1 2 3] [[ 1 2 3 3]
[ 4 4 5 6] [ 4 5 6 7] [ 5 6 7 7]
[ 8 8 9 10] [ 8 9 10 11] [ 9 10 11 11]
[12 12 13 14]] [12 13 14 15]] [13 14 15 15]]
[[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]
[12 13 14 15]]
And for corners you can do:
states_u[-1,-1] = 15
states_l[-1,-1] = 15

Find nearest index in one dataframe to another

I am new to python and its libraries. Searched all the forums but could not find a proper solution. This is the first time posting a question here. Sorry if I did something wrong.
So, I have two DataFrames like below containing X Y Z coordinates (UTM) and other features.
In [2]: a = {
...: 'X': [1, 2, 5, 7, 10, 5, 2, 3, 24, 21],
...: 'Y': [3, 4, 8, 15, 20, 12, 23, 22, 14, 7],
...: 'Z': [12, 4, 9, 16, 13, 1, 8, 17, 11, 19],
...: }
...:
In [3]: b = {
...: 'X': [1, 8, 20, 7, 32],
...: 'Y': [6, 4, 17, 45, 32],
...: 'Z': [52, 12, 6, 8, 31],
...: }
In [4]: df1 = pd.DataFrame(data=a)
In [5]: df2 = pd.DataFrame(data=b)
In [6]: print(df1)
X Y Z
0 1 3 12
1 2 4 4
2 5 8 9
3 7 15 16
4 10 20 13
5 5 12 1
6 2 23 8
7 3 22 17
8 24 14 11
9 21 7 19
In [7]: print(df2)
X Y Z
0 1 6 52
1 8 4 12
2 20 17 6
3 7 45 8
4 32 32 31
I need to find the closest point (distance) in df1 to each point of df2 and creating new DataFrame.
So I wrote the code below and actually find the closest point (distance) to df2.iloc[0].
In [8]: x = (
...: np.sqrt(
...: ((df1['X'].sub(df2["X"].iloc[0]))**2)
...: .add(((df1['Y'].sub(df2["Y"].iloc[0]))**2))
...: .add(((df1['Z'].sub(df2["Z"].iloc[0]))**2))
...: )
...: ).idxmin()
In [9]: x1 = df1.iloc[[x]]
In[10]: print(x1)
X Y Z
3 7 15 16
So, I guess I need a loop to iterate through df2 and apply above code to each row. As a result I need a new updated df1 containing all the closest points to each point of df2. But couldn't make it. Please advise.
This is actually a great example of a case where numpy's broadcasting rules have distinct advantages over pandas.
Manually aligning df1's coordinates as column vectors (by referencing df1[[col]].to_numpy()) and df2's coordinates as row vectors (df2[col].to_numpy()), we can get the distance from every element in each dataframe to each element in the other very quickly with automatic broadcasting:
In [26]: dists = np.sqrt(
...: (df1[['X']].to_numpy() - df2['X'].to_numpy()) ** 2
...: + (df1[['Y']].to_numpy() - df2['Y'].to_numpy()) ** 2
...: + (df1[['Z']].to_numpy() - df2['Z'].to_numpy()) ** 2
...: )
In [27]: dists
Out[27]:
array([[40.11234224, 7.07106781, 24.35159132, 42.61455151, 46.50806382],
[48.05205511, 10. , 22.29349681, 41.49698784, 49.12229636],
[43.23193264, 5.83095189, 17.74823935, 37.06750599, 42.29657197],
[37.58989226, 11.74734012, 16.52271164, 31.04834939, 33.74907406],
[42.40283009, 16.15549442, 12.56980509, 25.67099531, 30.85449724],
[51.50728104, 13.92838828, 16.58312395, 33.7934905 , 45.04442252],
[47.18050445, 20.32240143, 19.07878403, 22.56102835, 38.85871846],
[38.53569774, 19.33907961, 20.85665361, 25.01999201, 33.7194306 ],
[47.68647607, 18.89444363, 7.07106781, 35.48239 , 28.0713377 ],
[38.60051813, 15.06651917, 16.43167673, 41.96427052, 29.83286778]])
Argmin will now give you the correct vector of positional indices:
In [28]: dists.argmin(axis=0)
Out[28]: array([3, 2, 8, 6, 8])
Or, to select the appropriate values from df1:
In [29]: df1.iloc[dists.argmin(axis=0)]
Out[29]:
X Y Z
3 7 15 16
2 5 8 9
8 24 14 11
6 2 23 8
8 24 14 11
Edit
An answer popped up just after mine, then was deleted, which made reference to scipy.spatial.distance_matrix, computing dists with:
distance_matrix(df1[list('XYZ')].to_numpy(), df2[list('XYZ')].to_numpy())
Not sure why that answer was deleted, but this seems like a really nice, clean approach to getting the array I produced manually above!
Performance Note
Note that if you are just trying to get the closest value, there's no need to take the square root, as this is a costly operation compared to addition, subtraction, and powers, and sorting on dist**2 is still valid.
First, you define a function that returns the closest point using numpy.where. Then you use the apply function to run through df2.
import pandas as pd
import numpy as np
a = {
'X': [1, 2, 5, 7, 10, 5, 2, 3, 24, 21],
'Y': [3, 4, 8, 15, 20, 12, 23, 22, 14, 7],
'Z': [12, 4, 9, 16, 13, 1, 8, 17, 11, 19]
}
b = {
'X': [1, 8, 20, 7, 32],
'Y': [6, 4, 17, 45, 32],
'Z': [52, 12, 6, 8, 31]
}
df1 = pd.DataFrame(a)
df2 = pd.DataFrame(b)
dist = lambda dx,dy,dz: np.sqrt(dx**2+dy**2+dz**2)
def closest(row):
darr = dist(df1['X']-row['X'], df1['Y']-row['Y'], df1['Z']-row['Z'])
idx = np.where(darr == np.amin(darr))[0][0]
return df1['X'][idx], df1['Y'][idx], df1['Z'][idx]
df2['closest'] = df2.apply(closest, axis=1)
print(df2)
Output:
X Y Z closest
0 1 6 52 (7, 15, 16)
1 8 4 12 (5, 8, 9)
2 20 17 6 (24, 14, 11)
3 7 45 8 (2, 23, 8)
4 32 32 31 (24, 14, 11)

How to downsample a 2D array by randomly selecting elements in 2x2 sub-arrays?

I have a 2n x 2m numpy array. I would like to form a n x m array by selecting randomly one element in 2 x 2 non-overlapping sub-arrays that partition my initial array. What would be the best way to do so? Is there a way to avoid two for loops (one along each dimension)?
For example, if my array is
1 2 3 4
5 6 7 8
9 0 1 2
8 5 7 0
then, there are four 2 x 2 sub-arrays that partition it:
1 2 3 4
5 6 7 8
9 0 1 2
8 5 7 0
and I would like to pick up randomly one element in each of them to form new arrays, such as
5 3 , 6 8 , 2 3
9 2 9 1 0 0 .
Thank you for your time.
This can be done by sampling. Instead of sampling each 2x2 square, we sample the entire ndarray into 4 separate ndarray, where the same index within those sub-arrays will point within the same 2x2 square. And then we randomly sample from those 4 separate ndarray:
# create test dataset
test = np.arange(36).reshape(6,6)
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35]])
# Create subsamples from ndarray
samples = np.array([test[::2, ::2], test[1::2, 1::2], test[::2, 1::2], test[1::2, ::2]])
>>> samples
array([[[ 0, 2, 4],
[12, 14, 16],
[24, 26, 28]],
[[ 7, 9, 11],
[19, 21, 23],
[31, 33, 35]],
[[ 1, 3, 5],
[13, 15, 17],
[25, 27, 29]],
[[ 6, 8, 10],
[18, 20, 22],
[30, 32, 34]]])
Now the same index of each of these 4 subsamples point to the same 2x2 square on the original ndarray. We just need to select from the same index randomly:
# Random choice sampling between these 4 subsamples.
select = np.random.randint(4,size=(3,3))
>>> select
array([[2, 2, 1],
[3, 1, 1],
[3, 0, 0]])
result = select.choose(samples)
>>> result
array([[ 1, 3, 11],
[18, 21, 23],
[30, 26, 28]])
I got blockshaped function from another answer. This answer assumes that size of your original array is appropriate for the operation.
import numpy as np
def blockshaped(arr, nrows, ncols):
"""
Return an array of shape (n, nrows, ncols) where
n * nrows * ncols = arr.size
If arr is a 2D array, the returned array should look like n subblocks with
each subblock preserving the "physical" layout of arr.
"""
h, w = arr.shape
return (arr.reshape(h//nrows, nrows, -1, ncols)
.swapaxes(1,2)
.reshape(-1, nrows, ncols))
arr = np.array([[1,2,3,4],[5,6,7,8],[9,0,1,2],[8,5,7,0]])
# arr is an 2d array with dimension mxn
m = arr.shape[0]
n = arr.shape[1]
# define blocksize
block_size = 2
# divide into sub 2x2 arrays
# blocks is a (Nx2x2) array
blocks = blockshaped(arr, block_size, block_size)
# select random elements from each block to form new array
num_blocks = block_size**2
new_arr = blocks[np.arange(num_blocks), np.random.randint(low=0, high=2, size=num_blocks), np.random.randint(low=0, high=2,size=num_blocks)]
print("original array:")
print(arr)
print("random pooled array:")
print(new_arr)

How can I get the complete predecessors list based on a direct predecessor list in python?

My problem is related to the extraction of ore blocks in an open pit mine. Where the blocks have a relation of pecedĂȘncia, as explained below.
In this representation we have 6 blocks, where:
In this case we can only "extract" block 6 if 1, 2 and 3 have already been extracted, block 7 if 2,3,4 have been extracted, block 8 if 3, 4, 5 have already been extracted and the block 9 if 6,7,8 have already been extracted.
import sys
blocks = [1,2,3,4,5,6,7,8,9]#list of blocks
p = [[] for i in blocks]#list of direct precedents
p[0] = []
p[1] = []
p[2] = []
p[3] = []
p[4] = []
p[5] = [1,2,3]
p[6] = [2,3,4]
p[7] = [3,4,5]
p[8] = [6,7,8]
From this direct list of precedents, I'd like a method that would help get the following "complete list of precedents" for instances larger than 9 blocks (something between 1060 and 100000 blocks).
import sys
blocks = [1,2,3,4,5,6,7,8,9]#list of blocks
p = [[] for i in blocks]#full list of precedents
p[0] = []
p[1] = []
p[2] = []
p[3] = []
p[4] = []
p[5] = [1,2,3]
p[6] = [2,3,4]
p[7] = [3,4,5]
p[8] = [1,2,3,4,5,6,7,8]
You can do it iterating the blocks in topological order. Here is one possible way to do it:
def precedents_transitive(blocks, precedents):
num_blocks = len(precedents)
# Mapping between block ids and indices
mapping = {b: i for i, b in enumerate(blocks)}
# Direct precedents as sets
ps = list(map(set, precedents))
# Transitive precedents as sets, starts with direct precedents
ps_transitive = list(map(set, precedents))
# Remaining blocks to visit
remaining = set(blocks)
# Visited blocks
visited = set()
while remaining:
# Find a non-visited block such that all its precedents have been visited
for block in remaining:
i_block = mapping[block]
if ps[i_block] <= visited:
break
else:
# If we get here the input was not valid
raise ValueError('Invalid precedents.')
# Add transitive precedents of direct predecessors
ps_transitive[i_block].update(*(ps_transitive[mapping[pred]] for pred in ps[i_block]))
remaining.remove(block)
visited.add(block)
return list(map(sorted, ps_transitive))
Here is a test with your data:
# List of blocks
blocks = [1, 2, 3, 4, 5, 6, 7, 8, 9]
# List of direct precedents
p = [[] for i in blocks]
p[0] = []
p[1] = []
p[2] = []
p[3] = []
p[4] = []
p[5] = [1, 2, 3]
p[6] = [2, 3, 4]
p[7] = [3, 4, 5]
p[8] = [6, 7, 8]
p_transitive = precedents_transitive(blocks, p)
print(p_transitive)
Output:
[[], [], [], [], [], [1, 2, 3], [2, 3, 4], [3, 4, 5], [1, 2, 3, 4, 5, 6, 7, 8]]
First let's build a more interesting example:
pit = """
1 2 3 4 5 6 7 8 9 10 11
12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31 32
33 34 35
36
"""
direct_predecessors = {}
row_below = []
for line in pit.strip().split( '\n' )[ ::-1 ]:
row_current = [ int( x ) for x in line.split() ]
for i, item in enumerate( row_below ):
direct_predecessors[ item ] = row_current[ i : i + 3 ]
row_below = row_current
(NB: unlike your question, where the predecessor-lookup keys are zero-based but the values are one-based, I've chosen to use a dictionary where the value entries can be used directly as keys. In fact, if I hadn't converted them explicitly to int(), the block labels could be arbitrary strings. If we print the content:
for key, value in sorted( direct_predecessors.items() ):
print( '%r : %r' % ( key, value ) )
then we get the following output:
12 : [1, 2, 3]
13 : [2, 3, 4]
14 : [3, 4, 5]
15 : [4, 5, 6]
16 : [5, 6, 7]
17 : [6, 7, 8]
18 : [7, 8, 9]
19 : [8, 9, 10]
20 : [9, 10, 11]
21 : [12, 13, 14]
22 : [13, 14, 15]
23 : [14, 15, 16]
24 : [15, 16, 17]
25 : [16, 17, 18]
26 : [17, 18, 19]
27 : [18, 19, 20]
28 : [21, 22, 23]
29 : [22, 23, 24]
30 : [23, 24, 25]
31 : [24, 25, 26]
32 : [25, 26, 27]
33 : [28, 29, 30]
34 : [29, 30, 31]
35 : [30, 31, 32]
36 : [33, 34, 35]
OK, now to answer your question: given the direct predecessors, get all the (direct and indirect) predecessors. A recursive approach is one way to go:
def Predecessors( n ):
result = set( direct_predecessors.get( n, [] ) )
result |= { indirect for direct in result for indirect in Predecessors( direct ) }
return result
If we try an example:
print( Predecessors( 31 ) )
the output is as follows:
set([4, 5, 6, 7, 8, 9, 10, 15, 16, 17, 18, 19, 24, 25, 26])

Categories

Resources