For a given array (1 or 2-dimensional) I would like to know, how many "patches" there are of nonzero elements. For example, in the array [0, 0, 1, 1, 0, 1, 0, 0] there are two patches.
I came up with a function for the 1-dimensional case, where I first assume the maximal number of patches and then decrease that number if a neighbor of a nonzero element is nonzero, too.
def count_patches_1D(array):
patches = np.count_nonzero(array)
for i in np.nonzero(array)[0][:-1]:
if (array[i+1] != 0):
patches -= 1
return patches
I'm not sure if that method works for two dimensions as well. I haven't come up with a function for that case and I need some help for that.
Edit for clarification:
I would like to count connected patches in the 2-dimensional case, including diagonals. So an array [[1, 0], [1, 1]] would have one patch as well as [[1, 0], [0, 1]].
Also, I am wondering if there is a build-in python function for this.
The following should work:
import numpy as np
import copy
# create an array
A = np.array(
[
[0, 1, 1, 1, 0, 1],
[0, 0, 1, 0, 0, 0],
[1, 0, 0, 1, 0, 1],
[1, 0, 0, 0, 0, 1],
[0, 0, 1, 0, 0, 1]
]
)
def isadjacent(pos, newpos):
"""
Check whether two coordinates are adjacent
"""
# check for adjacent columns and rows
return np.all(np.abs(np.array(newpos) - np.array(pos)) < 2):
def count_patches(A):
"""
Count the number of non-zero patches in an array.
"""
# get non-zero coordinates
coords = np.nonzero(A)
# add them to a list
inipatches = list(zip(*coords))
# list to contain all patches
allpatches = []
while len(inipatches) > 0:
patch = [inipatches.pop(0)]
i = 0
# check for all points adjacent to the points within the current patch
while True:
plen = len(patch)
curpatch = patch[i]
remaining = copy.deepcopy(inipatches)
for j in range(len(remaining)):
if isadjacent(curpatch, remaining[j]):
patch.append(remaining[j])
inipatches.remove(remaining[j])
if len(inipatches) == 0:
break
if len(inipatches) == 0 or plen == len(patch):
# nothing added to patch or no points remaining
break
i += 1
allpatches.append(patch)
return len(allpatches)
print(f"Number of patches is {count_patches(A)}")
Number of patches is 5
This should work for arrays with any number of dimensions.
Related
def create_matrix(xy):
matrix = []
matrix_y = []
x = xy[0]
y = xy[1]
for z in range(y):
matrix_y.append(0)
for n in range(x):
matrix.append(matrix_y)
return matrix
def set_matrix(matrix,xy,set):
x = xy[0]
y = xy[1]
matrix[x][y] = set
return matrix
index = [4,5]
index_2 = [3,4]
z = create_matrix(index)
z = set_matrix(z,index_2, 12)
print(z)
output:
[[0, 0, 0, 0, 12], [0, 0, 0, 0, 12], [0, 0, 0, 0, 12], [0, 0, 0, 0, 12]]
This code should change only the last array
In your for n in range(x): loop you are appending the same y matrix multiple times. Python under the hood does not copy that array, but uses a pointer. So you have a row of pointers to the same one column.
Move the matrix_y = [] stuff inside the n loop and you get unique y arrays.
Comment: python does not actually have a pointer concept but it does use them. It hides from you when it does a copy data and when it only copies a pointer to that data. That's kind of bad language design, and it tripped you up here. So now you now that pointers exist, and that most of the time when you "assign arrays" you will actually only set a pointer.
Another comment: if you are going to be doing anything serious with matrices, you should really look into numpy. That will be many factors faster if you do numerical computations.
you don't need first loop in create_matrix, hide them with comment:
#for z in range(y):
# matrix_y.append(0)
change second one like this, it means an array filled with and length = y:
for n in range(x):
matrix.append([0] * y)
result (only last cell was changed in matrix):
z = set_matrix(z,index_2, 12)
print(z)
# [[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 12]]
Ive been breaking my head over trying to come up with a recursive way to build the following matrix in python. It is quite a challenge without pointers. Could anyone maybe help me out?
The recursion is the following:
T0 = 1,
Tn+1 = [[Tn, Tn],
[ 0, Tn]]
I have tried many iterations of some recursive function, but I cannot wrap my head around it.
def T(n, arr):
n=int(n)
if n == 0:
return 1
else:
c = 2**(n-1)
Tn = np.zeros((c,c))
Tn[np.triu_indices(n=c)] = self.T(n=n-1, arr=arr)
return Tn
arr = np.zeros((8,8))
T(arr=arr, n=3)
It's not hard to do this, but you need to be careful about the meaning of the zero in the recursion. This isn't really precise for larger values of n:
Tn+1 = [[Tn, Tn],
[ 0, Tn]]
Because that zero can represent a block of zeros for example on the second iteration you have this:
[1, 1, 1, 1],
[0, 1, 0, 1],
[0, 0, 1, 1],
[0, 0, 0, 1]
Those four zeros in the bottom-left are all represented by the one zero in the formula. The block of zeros needs to be the same shape as the blocks around it.
After that it's a matter of making Numpy put thing in the right order and shape for you. numpy.block is really handy for this and makes it pretty simple:
import numpy as np
def makegasket(n):
if n == 0:
return np.array([1], dtype=int)
else:
node = makegasket(n-1)
return np.block([[node, node], [np.zeros(node.shape, dtype=int), node]])
makegasket(3)
Result:
array([[1, 1, 1, 1, 1, 1, 1, 1],
[0, 1, 0, 1, 0, 1, 0, 1],
[0, 0, 1, 1, 0, 0, 1, 1],
[0, 0, 0, 1, 0, 0, 0, 1],
[0, 0, 0, 0, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 1, 0, 1],
[0, 0, 0, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 1]])
If you use larger n you might enjoy matplotlib.pyplot.imshow for display:
from matplotlib.pyplot import imshow
# ....
imshow(makegasket(7))
You don't really need a recursive function to implement this recursion. The idea is to start with the UR corner and build outward. You can even start with the UL corner to avoid some of the book-keeping and flip the matrix along either axis, but this won't be as efficient in the long run.
def build_matrix(n):
size = 2**n
# Depending on the application, even dtype=np.bool might work
matrix = np.zeros((size, size), dtype=np.int)
# This is t[0]
matrix[0, -1] = 1
for i in range(n):
k = 2**i
matrix[:k, -2 * k:-k] = matrix[k:2 * k, -k:] = matrix[:k, -k:]
return matrix
Just for fun, here is a plot of timing results for this implementation vs #Mark Meyer's answer. It shows the slight timing advantage (also memory) of using a looping approach in this case:
Both algorithms run out of memory around n=15 on my machine, which is not too surprising.
I am using Python and I need to find the most efficient way to perform the following task.
Task: Given any 1-dimensional array v of zeros and ones, denote by k>=0 the number of subsequences of all ones of v.
I need to obtain from v a 2-dimensional array w such that:
1) shape(w)=(k,len(v)),
2) for every i=1,..,k, the i-th row of "w" is an array of all zeros except for the i-th subsequence of all ones of v.
Let me make an example: suppose $v$ is the array
v=[0,1,1,0,0,1,0,1,1,1]
Then k=3 and w should be the array
w=[[0,1,1,0,0,0,0,0,0,0],[0,0,0,0,0,1,0,0,0,0],[0,0,0,0,0,0,0,1,1,1]]
It is possible to write the code to perform this task in many ways, for example:
import numpy as np
start=[]
end=[]
for ii in range(len(v)-1):
if (v[ii:ii+2]==[0,1]).all():
start.append(ii)
if (v[ii:ii+2]==[1,0]).all():
end.append(ii)
if len(start)>len(end):
end.append(len(v)-1)
w=np.zeros((len(start),len(v)))
for jj in range(len(start)):
w[jj,start[jj]+1:end[jj]+1]=np.ones(end[jj]-start[jj])
But I need to perform this task on a very big array v and this task is part of a function which then undergoes minimization.. so I need it to be as efficient and fast as possible..
So in conclusion my question is: what is the most computationally efficient way to perform it in Python?
Here's one vectorized way -
def expand_islands2D(v):
# Get start, stop of 1s islands
v1 = np.r_[0,v,0]
idx = np.flatnonzero(v1[:-1] != v1[1:])
s0,s1 = idx[::2],idx[1::2]
# Initialize 1D id array of size same as expected o/p and has
# starts and stops assigned as 1s and -1s, so that a final cumsum
# gives us the desired o/p
N,M = len(s0),len(v)
out = np.zeros(N*M,dtype=int)
# Setup starts with 1s
r = np.arange(N)*M
out[s0+r] = 1
# Setup stops with -1s
if s1[-1] == M:
out[s1[:-1]+r[:-1]] = -1
else:
out[s1+r] = -1
# Final cumsum on ID array
out2D = out.cumsum().reshape(N,-1)
return N, out2D
Sample run -
In [105]: v
Out[105]: array([0, 1, 1, 0, 0, 1, 0, 1, 1, 1])
In [106]: k,out2D = expand_islands2D(v)
In [107]: k # number of islands
Out[107]: 3
In [108]: out2D # 2d output with 1s islands on different rows
Out[108]:
array([[0, 1, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1, 1, 1]])
Is this best way or most efficient way to generate random numbers from a geometric distribution with an array of parameters that may contain 0?
allids["c"]=[2,0,1,1,3,0,0,2,0]
[ 0 if x == 0 else numpy.random.geometric(1./x) for x in allids["c"]]
Note I am somewhat concerned about optimization.
EDIT:
A bit of context: I have an sequence of characters (i.e. ATCGGGA) and I would like to expand/contract runs of a single character (i.e. if original sequence had a run of 2 'A's I want to simulate a sequence that will have an expected value of 2 'A's, but vary according to a geometric distribution). All the characters that are runs of length 1 I do NOT want to be of variable length.
So if
seq = 'AATCGGGAA'
allids["c"]=[2,0,1,1,3,0,0,2,0]
rep=[ 0 if x == 0 else numpy.random.geometric(1./x) for x in allids["c"]]
"".join([s*r for r, s in zip(rep, seq)])
will output (when rep is [1, 0, 1, 1, 3, 0, 0, 1, 0])
"ATCGGGA"
You can use a masked array to avoid the division by zero.
import numpy as np
a = np.ma.masked_equal([2, 0, 1, 1, 3, 0, 0, 2, 0], 0)
rep = np.random.geometric(1. / a)
rep[a.mask] = 0
This generates a random sample for each element of a, and then deletes some of them later. If you're concerned about this waste of random numbers, you could generate just enough, like so:
import numpy as np
a = np.ma.masked_equal([2, 0, 1, 1, 3, 0, 0, 2, 0], 0)
rep = np.zeros(a.shape, dtype=int)
rep[~a.mask] = np.random.geometric(1. / a[~a.mask])
What about this:
counts = array([2, 0, 1, 1, 3, 0, 0, 2, 0], dtype=float)
counts_ma = numpy.ma.array(counts, mask=(counts == 0))
counts[logical_not(counts.mask)] = \
array([numpy.random.geometric(v) for v in 1.0 / counts[logical_not(counts.mask)]])
You could potentially precompute the distribution of homopolymer runs and limit the number of calls to geometric as fetching large numbers of values from RNGs is more efficient than individual calls
I'm trying to build a method in my class Circle which gets a matrix (represented by a list of lists, each sublist represents a row) as an input.
The matrix has zero in every cell, and I'm supposed to place my circle in the center of the matrix and check if the (i,j) cell which represents the (i,j) point is contained in the circle, but for some reason I get a different output.
Here is an example:
mat = [[0 for j in range(5)] for i in range(7)]
Circle(40, 10, 1).draw(mat)
The output I expect is:
[[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 1, 1, 1, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]]
But the output I get is:
[[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 1],
[0, 0, 0, 1, 1],
[0, 0, 0, 0, 1],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]]
Here's my code:
class Point():
""" Holds data on a point (x,y) in the plane """
def __init__(self, x=0, y=0):
assert isinstance(x,(int, float)) and isinstance(y,(int, float))
self.x = x
self.y = y
class Circle():
""" Holds data on a circle in the plane """
def __init__(self,*args):
if len(args)==2:
if isinstance(args[0],Point) and isinstance(args[1],(float,int)):
assert args[1]>0
self.center= args[0]
self.radius= args[1]
if len(args)==3:
assert args[2]>0
self.a=args[0]
self.b=args[1]
self.center= Point(self.a,self.b)
self.radius= args[2]
def contains(self,check):
if isinstance(check,(Point)):
if math.sqrt((self.center.x-check.x)**2 + (self.center.y-check.y)**2) <= self.radius:
return True
if isinstance(check,Circle):
test= math.sqrt((self.center.x-check.center.x)**2 + (self.center.x-check.center.x)**2)
if test < (abs((self.radius)-(check.radius))):
return True
else:
return False
def draw(self,mat):
n=len(mat)
m=len(mat[0])
newcircle=Circle((int(m/2)+1),(int(n/2)+1),self.radius)
for i,lst in enumerate(mat):
for j,val in enumerate(lst):
if newcircle.contains(Point(i,j)):
mat[i][j]=1
You're not placing your circle in the middle of the matrix.
newcircle=Circle((int(m/2)+1),(int(n/2)+1),self.radius)
should be
newcircle=Circle((int(n/2)),(int(m/2)),self.radius)
or possibly, since there is no need to use just integers here.
newcircle=Circle((n-1)/2.0,(m-1)/2.0,self.radius)
To draw a circle, you create a new circle in the center of the matrix, you check the cells that are inside and then you turn the value to 1 for those inside.
1) First, there is a problem with the function contains :
def contains(...):
if (cond1):
if (cond11)
return True
if (cond2):
if (cond21)
return True
else:
return False
If cond2 is true and cond21 is false, you will get a None.
To be more Pythonic, try:
def contains(...):
if (cond1) and (cond11):
return True
elif (cond2) and (cond21):
return True
else:
return False
You are sure in this case to have a True or a False.
2) Copy / Paste mistake
There is a copy / paste mistake in the function contains when instance is Circle.
You have forgotten to turn y into x.
3) Function draw
All you need for this function is the radius.
Be careful with the use of integers and floats; we have:
(int(5 / 2)) == (5 / 2) != (5 / 2.)
To be sure to have a float, write the divisor as a float 2. instead of 2
Your circle is created on a matrix which rows and columns indices start with 1 if you define the center using int(len(mat) / 2.) + 1.
Don't forget that enumerate indices starts at 0, not 1.
So, int((len(mat) - 1) / 2.) + 1 (which is the same as len(mat) / 2) would be more accurate.
Seriously, Taha!