I'm working on a Monte Carlo radiative transfer code, which simulates firing photons through a medium and statistically modelling their random walk. It runs slowly firing one photon at a time, so I'd like to vectorize it and run perhaps 1000 photons at once.
I have divided my slab through which the photons are passing into nlayers slices between optical depth 0 and depth. Effectively, that means that I have nlayers + 2 regions (nlayers plus the region above the slab and the region below the slab). At each step, I have to keep track of which layers each photon passes through.
Let's suppose that I already know that two photons start in layer 0. One takes a step and ends up in layer 2, and the other takes a step and ends up in layer 6. This is represented by an array pastpresent that looks like this:
[[ 0 2]
[ 0 6]]
I want to generate an array traveled_through with (nlayers + 2) columns and 2 rows, describing whether photon i passed through layer j (endpoint-inclusive). It would look something like this (with nlayers = 10):
[[ 1 1 1 0 0 0 0 0 0 0 0 0]
[ 1 1 1 1 1 1 1 0 0 0 0 0]]
I could do this by iterating over the photons and generating each row of traveled_through individually, but that's rather slow, and sort of defeats the point of running many photons at once, so I'd rather not do that.
I tried to define the array as follows:
traveled_through = np.zeros((2, nlayers)).astype(int)
traveled_through[ : , np.min(pastpresent, axis = 1) : np.max(pastpresent, axis = 1) + ] = 1
The idea was that in a given photon's row, the indices from the starting layer through and including the ending layer would be set to 1, with all others remaining 0. However, I get the following error:
traveled_through[ : , np.min(pastpresent, axis = 1) : np.max(pastpresent, axis = 1) + 1 ] = 1
IndexError: invalid slice
My best guess is that numpy does not allow different rows of an array to be indexed differently using this method. Does anyone have suggestions for how to generate traveled_through for an arbitrary number of photons and an arbitrary number of layers?
If the two photons always start at 0, you could perhaps construct your array as follows.
First setting the variables...
>>> pastpresent = np.array([[0, 2], [0, 6]])
>>> nlayers = 10
...and then constructing the array:
>>> (pastpresent[:,1][:,np.newaxis] + 1 > np.arange(nlayers+2)).astype(int)
array([[1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0]])
Or if the photons have an arbitrary starting layer:
>>> pastpresent2 = np.array([[1, 7], [3, 9]])
>>> (pastpresent2[:,0][:,np.newaxis] < np.arange(nlayers+2)) &
(pastpresent2[:,1][:,np.newaxis] + 1 > np.arange(nlayers+2)).astype(int)
array([[0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0]])
A little trick I kind of like for this kind of thing involves the accumulate method of the logical_xor ufunc:
>>> a = np.zeros(10, dtype=int)
>>> b = [3, 7]
>>> a[b] = 1
>>> a
array([0, 0, 0, 1, 0, 0, 0, 1, 0, 0])
>>> np.logical_xor.accumulate(a, out=a)
array([0, 0, 0, 1, 1, 1, 1, 0, 0, 0])
Note that this sets to 1 the entries between the positions in b, first index inclusive, last index exclusive, so you have to handle off by 1 errors depending on what exactly you are after.
With several rows, you could make it work as:
>>> a = np.zeros((3, 10), dtype=int)
>>> b = np.array([[1, 7], [0, 4], [3, 8]])
>>> b[:, 1] += 1 # handle the off by 1 error
>>> a[np.arange(len(b))[:, None], b] = 1
>>> a
array([[0, 1, 0, 0, 0, 0, 0, 0, 1, 0],
[1, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 0, 0, 1]])
>>> np.logical_xor.accumulate(a, axis=1, out=a)
array([[0, 1, 1, 1, 1, 1, 1, 1, 0, 0],
[1, 1, 1, 1, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 1, 1, 1, 1, 1, 0]])
Related
I have a binary mask like this:
X = [[0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 1, 1],
[0, 0, 0, 1, 1, 1],
[0, 0, 1, 1, 1, 1],
[0, 0, 1, 1, 1, 1],
[0, 0, 0, 1, 1, 1]]
I have a certain index in this array and want to compute the distance from that index to the closest 1 in the mask. If there's already a 1 at that index, the distance should be zero.
Examples (assuming Manhattan distance):
distance(X, idx=(0, 5)) == 0 # already is a 1 -> distance is zero
distance(X, idx=(1, 2)) == 2 # second row, third column
distance(X, idx=(0, 0)) == 5 # upper left corner
Is there already existing functionality like this in Python/NumPy/SciPy? Both Euclidian and Manhattan distance would be fine.
I'd prefer to avoid computing distances for the entire matrix (as that is pretty big in my case), and only get the distance for my one index.
Here's one for manhattan distance metric for one entry -
def bwdist_manhattan_single_entry(X, idx):
nz = np.argwhere(X==1)
return np.abs((idx-nz).sum(1)).min()
Sample run -
In [143]: bwdist_manhattan_single_entry(X, idx=(0,5))
Out[143]: 0
In [144]: bwdist_manhattan_single_entry(X, idx=(1,2))
Out[144]: 2
In [145]: bwdist_manhattan_single_entry(X, idx=(0,0))
Out[145]: 5
Optimize further on performance by extracting the boudary elements only off the blobs of 1s -
from scipy.ndimage.morphology import binary_erosion
def bwdist_manhattan_single_entry_v2(X, idx):
k = np.ones((3,3),dtype=int)
nz = np.argwhere((X==1) & (~binary_erosion(X,k,border_value=1)))
return np.abs((idx-nz).sum(1)).min()
Number of elements in nz with this method would be smaller number than the earlier one, hence it improves.
You can use scipy.ndimage.morphology.distance_transform_cdt to compute the "taxicab" (Manhattan) distance transform:
import numpy as np
import scipy.ndimage.morphology
x = np.array([[0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 1, 1],
[0, 0, 0, 1, 1, 1],
[0, 0, 1, 1, 1, 1],
[0, 0, 1, 1, 1, 1],
[0, 0, 0, 1, 1, 1]])
d = scipy.ndimage.morphology.distance_transform_cdt(1 - x, 'taxicab')
print(d[0, 5])
# 0
print(d[1, 2])
# 2
print(d[0, 0])
# 5
You can do it like this:
def Manhattan_distance(X, idx):
dist = min([ abs(i-idx[0]) + abs(j-idx[1]) for i, row in enumerate(X) for j, val in enumerate(X[i]) if val == 1])
return dist
Thanks.
Given the below matrix ixs with indices, I am looking for a vector in the ixs that is equivalent to ix (also a row/vector of ixs), except for dimension1 (which could assume any value) and dimension3 which needs to be set to 1.
ixs = np.asarray([
[0, 0, 3, 0, 1], # 0. current value of `ix`
[0, 0, 3, 1, 1], # 1.
[0, 1, 3, 0, 0], # 2.
[0, 1, 3, 0, 1], # 3.
[0, 1, 3, 1, 1], # 4.
[0, 2, 3, 0, 1], # 5.
[0, 2, 3, 1, 1] # 6.
])
ix = np.asarray([0, 0, 3, 0, 1])
So with ix of [0, 0, 3, 0, 1], I'd be looking at all rows that are below that one (row 1..6), and look for the pattern [0, *, 3, 1, 1] i.e. 1. [0, 0, 3, 1, 1], 4. [0, 1, 3, 1, 1], 6. [0, 2, 3, 1, 1].
What's the best (concise) way to get those vectors?
Here is an easy to understand approach using cdist:
We use a weighted hamming distance between ix and every row of ixs. This distance is 0 if the rows are identical (we use that to doublecheck that ix is in ixs) and adds a penalty for every difference. We chose the weights such that a difference in position 0,2 or 4 adds 3/11 and in position 1 or 3 adds 1/11. Later, we keep only vectors with distance < 1/4, this allows vectors which deviate from ix at 1 or 3 or both through and blocks all others. We then checck separately for a 1 in position 3.
from scipy.spatial.distance import cdist
# compute distance note that weights are automatically normalized to sum 1
d = cdist([ix],ixs,"hamming",w=[3,1,3,1,3])[0]
# find ix
ixloc = d.argmin()
# make sure its exactly ix
assert d[ixloc] == 0
# filter out all rows that are different in col 0,2 or 4
hits, = ((d < 1/4) & (ixs[:,3] == 1)).nonzero()
# only keep hits below the row of ix:
hits = hits[hits.searchsorted(ixloc):]
hits
# array([1, 4, 6])
This solution only uses numpy (very fast) with several logical operations.
At the end, it gives the right columns.
ixs = np.matrix([
[0, 0, 3, 0, 1], # 0. current value of `ix`
[0, 0, 3, 1, 1], # 1.
[0, 1, 3, 0, 0], # 2.
[0, 1, 3, 0, 1], # 3.
[0, 1, 3, 1, 1], # 4.
[0, 2, 3, 0, 1], # 5.
[0, 2, 3, 1, 1] # 6.
])
newixs = ixs
#since the second column does not matter, we just assign it 0 in the new matrix.
newixs[:,1] = 0
#here it compares the each row against the 0 indexed row
#then, it multiplies the True and False values with 1
#and the result is 0,1 values in an array.
#then it takes the averages at the row level
#if the average is 1, then it means that all values match
mask = ((newixs == newixs[0])*1).mean(axis=1) == 1
#it then converts the matrix to array for masking
mask = np.squeeze(np.asarray(mask))
#using the mask value, we select the matched columns
ixs[mask,:]
matrix([[0, 0, 3, 0, 1],
[0, 1, 3, 0, 1],
[0, 2, 3, 0, 1]])
Consider the following toy array a:
a = np.array([[1074279, 937077, 1445858, 1679465],
[1074280, 1023600, 1679465, 937077],
[1074281, 908450, 1932761, 1100360],
[1074282, 1445858, 893656, 908183],
[1074283, 1958030, 1932761, 1445858]])
The first column is an identifier.
How can I transform the array in a way that shows when an identifier is related
to another? A relation exists if two identifiers have in common at least one
value in columns 2-4 of a.
The end result should be the array b below:
b = np.array([[1, 1, 0, 1, 1],
[1, 1, 0, 0, 0],
[0, 0, 1, 0, 1],
[1, 0, 0, 1, 1],
[1, 0, 1, 1, 1]])
This can perhaps better be understood as follows:
1074279 1074280 1074281 1074282 1074283
1074279 1 1 0 1 1
1074280 1 1 0 0 0
1074281 0 0 1 0 1
1074282 1 0 0 1 1
1074283 1 0 1 1 1
I have tried (double) looping over elements to find all the combinations and
then reduce that to the desired array but I cannot get it right.
Outer-equality does the job for a vectorized solution -
In [90]: np.equal.outer(a[:,1:],a[:,1:]).any(axis=(1,3)).view('i1')
Out[90]:
array([[1, 1, 0, 1, 1],
[1, 1, 0, 0, 0],
[0, 0, 1, 0, 1],
[1, 0, 0, 1, 1],
[1, 0, 1, 1, 1]], dtype=int8)
Explanation
Basically, we are performing pairwise equality comparison for all rows and within each row pairwise equality comparison with np.equal.outer(..). The equality comparison is a 4D array. Thus, for the slice a[:,1:] being (m,n) shaped, would give us a equality comparison array of shape (m,n,m,n). So, then we ANY reduce it along the axes - 1 and 3 to give us a 2D boolean array of shape (m,m) and that's our final output after conversion to an int array.
An alternative with explicit dimension-expansion would be -
In [92]: (a[:,1:,None,None]==a[:,1:]).any(axis=(1,3)).view('i1')
Out[92]:
array([[1, 1, 0, 1, 1],
[1, 1, 0, 0, 0],
[0, 0, 1, 0, 1],
[1, 0, 0, 1, 1],
[1, 0, 1, 1, 1]], dtype=int8)
So, the only change is that we are adding new axes for the first version of the slice with None/np.newaxis to create a 4D version. This is then compared against the original 2D version to result in the 4D equality compared boolean array.
A simpler classic solution that is easily understandable:
def has_in_common(a1, a2):
"""
#param a1, a2: two input arrays
#returns True if a1 and a2 has at least one value in common, otherwise False
"""
for v1 in a1[1:]:
for v2 in a2[1:]:
if v1 == v2:
return True
return False
def relation_matrix(a):
"""
#param a: an input array
#returns m a matrix specifying the relationship between the rows of a
ex: a = [[1074279, 937077, 1445858, 1679465],
[1074280, 1023600, 1679465, 937077],
[1074281, 908450, 1932761, 1100360],
[1074282, 1445858, 893656, 908183],
[1074283, 1958030, 1932761, 1445858]]
m = [[1, 1, 0, 1, 1],
[1, 1, 0, 0, 0],
[0, 0, 1, 0, 1],
[1, 0, 0, 1, 1],
[1, 0, 1, 1, 1]]
more precisely
m = 1074279 1074280 1074281 1074282 1074283
1074279 1 1 0 1 1
1074280 1 1 0 0 0
1074281 0 0 1 0 1
1074282 1 0 0 1 1
1074283 1 0 1 1 1
"""
m = np.zeros((a.shape[0], a.shape[0]))
for i in range(len(a)):
for j in range(len(a)):
if has_in_common(a[i], a[j]):
m[i, j] = 1
return m.astype('int')
Demo:
In [1]:relation_matrix(a)
Out[1]:
array([[1, 1, 0, 1, 1],
[1, 1, 0, 0, 0],
[0, 0, 1, 0, 1],
[1, 0, 0, 1, 1],
[1, 0, 1, 1, 1]])
In the code that I am writing, I have three 2D numpy arrays with the same dimensions (m x n), with each 2D array containing info about a specific trait, but each corresponding cell (with a specific row/col value) across all three 2D arrays corresponding to a specific person. The three 2D arrays are trait1, trait2, and trait3. As an example, person (0, 0) will have traits 1, 2, but not three, if only trait1 and trait2 have a value of 1 at location (0,0), but trait3 does not.
What would be an efficient method of updating a 2D array at a specific location based on the values of other corresponding 2D arrays of the same dimension at the same location? That is, how can I efficiently update a 2D array at a specific location such that the other 2D arrays at this same location fulfill specific conditions?
I am currently trying to update the values of the 2D array trait1 and trait2 according to the current values of trait1 and trait2 (such that the corresponding trait1 value == 1, and the corresponding trait2 value == 0); I am also trying to update the values of trait3 according to the current values of trait1, and trait2 (under the same conditions as the previous). However, I am having trouble doing this without using nested for loops, which greatly slows down my program.
Below is my current approach, which works, but is much too slow for my purposes:
for i in range (0, m):
for j in range (0, n):
if trait1[i][j] == 1:
if trait2[i][j] == 0:
trait1[i][j] = 0
trait2[i][j] = 1
new_color(i, j, 1) #updates the color of the specific person on a grid
trait3[i][j] = 0
elif trait1[i][j] == 0:
if trait2[i][j] <= 0:
trait1[i][j] = 1
trait2[i][j] = 0
new_color(i, j, 0)
Numpy array are really slow if you use loop indeed. If you can use matrices operations / numpy function for everything, it will go much faster.
In your case, you could first extract the indices you're interested about, and then update your matrices like this:
import numpy as np
np.random.seed(1)
# Generate some sample data
trait1, trait2, trait3 = ( np.random.randint(0,2, [4,4]) for _ in range(3) )
In [4]: trait1
Out[4]:
array([[1, 1, 0, 0],
[1, 1, 1, 1],
[1, 0, 0, 1],
[0, 1, 1, 0]])
In [5]: trait2
Out[5]:
array([[0, 1, 0, 0],
[0, 1, 0, 0],
[1, 0, 0, 0],
[1, 0, 0, 0]])
In [6]: trait3
Out[6]:
array([[1, 1, 1, 1],
[1, 0, 0, 0],
[1, 1, 1, 1],
[1, 1, 0, 1]])
And then:
cond1_idx = np.where((trait1 == 1) & (trait2==0))
cond2_idx = np.where((trait1 == 0) & (trait2<=0))
trait1[cond1_idx] = 0
trait2[cond1_idx] = 1
trait3[cond1_idx] = 0
[ new_color(i, j, 1) for i,j in zip(*cond1_idx) ]
trait1[cond2_idx] = 1
trait2[cond2_idx] = 0
[ new_color(i, j, 0) for i,j in zip(*cond2_idx) ]
Result:
In [2]: trait1
Out[2]:
array([[0, 1, 1, 1],
[0, 1, 0, 0],
[1, 1, 1, 0],
[0, 0, 0, 1]])
In [3]: trait2
Out[3]:
array([[1, 1, 0, 0],
[1, 1, 1, 1],
[1, 0, 0, 1],
[1, 1, 1, 0]])
In [4]: trait3
Out[4]:
array([[0, 1, 1, 1],
[0, 0, 0, 0],
[1, 1, 1, 0],
[1, 0, 0, 1]])
I cannot really test the new_color though since I don't have the function
How can I find the amount of consecutive 1s (or any other value) in each row for of the following numpy array? I need a pure numpy solution.
array([[0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 1, 2, 0, 0, 1, 1, 1],
[0, 0, 0, 4, 1, 0, 0, 0, 0, 1, 1, 0]])
There are two parts to my question, first: what is the maximum number of 1s in a row? Should be
array([2,3,2])
in the example case.
And second, what is the index of the start of the first set of multiple consecutive 1s in a row? For the example case this would be
array([3,9,9])
In this example I put 2 consecutive 1s in a row. But it should be possible to change that to 5 consecutive 1s in a row, this is important.
A similar question was answered using np.unique, but it only works for one row and not an array with multiple rows as the result would have different lengths.
Here's a vectorized approach based on differentiation -
import numpy as np
import pandas as pd
# Append zeros columns at either sides of counts
append1 = np.zeros((counts.shape[0],1),dtype=int)
counts_ext = np.column_stack((append1,counts,append1))
# Get start and stop indices with 1s as triggers
diffs = np.diff((counts_ext==1).astype(int),axis=1)
starts = np.argwhere(diffs == 1)
stops = np.argwhere(diffs == -1)
# Get intervals using differences between start and stop indices
start_stop = np.column_stack((starts[:,0], stops[:,1] - starts[:,1]))
# Get indices corresponding to max. interval lens and thus lens themselves
SS_df = pd.DataFrame(start_stop)
out = start_stop[SS_df.groupby([0],sort=False)[1].idxmax(),1]
Sample input, output -
Original sample case :
In [574]: counts
Out[574]:
array([[0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 1, 2, 0, 0, 1, 1, 1],
[0, 0, 0, 4, 1, 0, 0, 0, 0, 1, 1, 0]])
In [575]: out
Out[575]: array([2, 3, 2], dtype=int64)
Modified case :
In [577]: counts
Out[577]:
array([[0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 1, 2, 0, 1, 1, 1, 1],
[0, 0, 0, 4, 1, 1, 1, 1, 1, 0, 1, 0]])
In [578]: out
Out[578]: array([2, 4, 5], dtype=int64)
Here's a Pure NumPy version that is identical to the previous until we have start, stop. Here's the full implementation -
# Append zeros columns at either sides of counts
append1 = np.zeros((counts.shape[0],1),dtype=int)
counts_ext = np.column_stack((append1,counts,append1))
# Get start and stop indices with 1s as triggers
diffs = np.diff((counts_ext==1).astype(int),axis=1)
starts = np.argwhere(diffs == 1)
stops = np.argwhere(diffs == -1)
# Get intervals using differences between start and stop indices
intvs = stops[:,1] - starts[:,1]
# Store intervals as a 2D array for further vectorized ops to make.
c = np.bincount(starts[:,0])
mask = np.arange(c.max()) < c[:,None]
intvs2D = mask.astype(float)
intvs2D[mask] = intvs
# Get max along each row as final output
out = intvs2D.max(1)
I think one problem that is very similar is to check if between the sorted rows the element wise difference is a certain amount. Here if there is a difference of 1 between 5 consecutive would be as follows. It can also be done for difference of 0 for two cards:
cardAmount=cards[0,:].size
has4=cards[:,np.arange(0,cardAmount-4)]-cards[:,np.arange(cardAmount-3,cardAmount)]
isStraight=np.any(has4 == 4, axis=1)