How to Find the Max Plot of Land From a 2D Array?

How to Find the Max Plot of Land From a 2D Array? - python

Sorry if the title was a little confusing, I didn't know what to call it. However, I'm still new to programming and I'm stuck on this coding problem that I just have no idea where to start.
Here is the summarized version of the problem:
I have a randomized plot of land, lets just call the variables x and y. This plot of land is a 2D array of all numbers that can be negative or positive. Now, there will be another, smaller plot of randomized numbers, lets call them width, height. With these new variables I need to find the greatest number from the x,y array that is width, height in size.
All numbers will be valid integers.
x ≥ width > 0
y ≥ height > 0
I will need to output the largest sum of land in the x y plot that is width, height in size.
Here is an example
3 - randomly picked y value
4 - randomly picked x value
2 - randomly picked height
1 - randomly picked width
1 2 3 4
-1 0 -1 9
-4 1 -2 7
Now, you can see from the example that the output will be 16, because the biggest 1x2 plot in the 4x3 plot is 16. I was wondering if anyone could point me in the right direction and give me tips on where to start. I have tried researching this, but it has led nowhere because I have no idea what to look up.

A summed-area table seems to be an interesting way to tackle this problem. If I'm not mistaken such an algorithm would be linear in the number of cells (x*y).
The basic idea of a summed-area table is that the sum of a subparcel can be calculated by adding the values for two corners and subtracting the values of the opposite corners, as explained in the Wikipedia article.
Numpy's cumsum helps to quickly create the summed-area table. Maybe there is also a numpy way to calculate the areas?
Here's my sample code (note that numpy first indexes the vertical direction, and then the horizontal). The tests inside the loop could be skipped if we added an extra row and extra column of zeros (but would make the code slightly more difficult to understand).
import numpy as np
def find_highest_area_sum(parcel, x, y, width, height):
sums = np.cumsum(np.cumsum(parcel, axis=0), axis=1)
areas = np.zeros((y - height + 1, x - width + 1), dtype=sums.dtype)
print("Given parcel:")
print(parcel)
print("Cumulative area sums:")
print(sums)
for i in range(x - width + 1):
for j in range(y - height + 1):
areas[j, i] = sums[j + height - 1, i + width - 1]
if i > 0:
areas[j, i] -= sums[j + height - 1, i - 1]
if j > 0:
areas[j, i] -= sums[j - 1, i + width - 1]
if i > 0 and j > 0:
areas[j, i] += sums[j - 1, i - 1]
print("Areas of each subparcel:")
print(areas)
ind_highest = np.unravel_index(np.argmax(areas), areas.shape)
print(f'The highest area sum is {areas[ind_highest]} at pos ({ind_highest[1]}, {ind_highest[0]}) to pos ({ind_highest[1] + width - 1}, {ind_highest[0] + height - 1}) ')
x, y = 4, 3
width, height = 1, 2
parcel = np.array([[1, 2, 3, 4],
[-1, 0, -1, 9],
[-4, 1, -2, 7]])
find_highest_area_sum(parcel, x, y, width=1, height=2)
x = 12
y = 20
parcel = np.random.randint(-10, 20, (y, x))
find_highest_area_sum(parcel, x, y, width=10, height=12)
Output of the first part:
Given parcel:
[[ 1 2 3 4]
[-1 0 -1 9]
[-4 1 -2 7]]
Cumulative area sums:
[[ 1 3 6 10]
[ 0 2 4 17]
[-4 -1 -1 19]]
Areas of each subparcel:
[[ 0 2 2 13]
[-5 1 -3 16]]
The highest area sum is 16 at pos (3, 1) to pos (3, 2)

Related

When plotting, wanting to 'hold' a y value over an x interval. [Not a 'bar plot' question]

For some reason I am blank on this one:
x = np.arange(5)
[0 1 2 3 4]
y = np.array((3, 6, 1, 9))
[3 6 1 9]
plt.plot(???)
What to do with the arrays so that a plot will hold constant over a 'bin' of 0 to 1. i.e the value 3 between the interval 0 to 1, the hold the value 6 between 1 to 2 and so on.
This is a concept example because my real problem is that I am making a Spectrogram from scratch and am using plt.pcolormesh(X, Y, Z). I have arrays in similar fashion to this example and the size is not matching. Either I have to stop using the number 0 or number 4 to match but then the matching is wrong by 1 datapoint in respective direction.

You would need to append the last y point, such that both arrays have the same number of elements, then use drawstyle="steps-post" to get a step-like appearance.
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(5)
#[0 1 2 3 4]
y = np.array((3, 6, 1, 9))
#[3 6 1 9]
plt.plot(x, np.concatenate((y, [y[-1]])), drawstyle="steps-post")
plt.show()
Note that this is unrelated to plt.pcolormesh(X, Y, Z), where Z can (or should even) have one element less than the grid in each direction.

How to sort & extract values with multiple conditions in R?

I have a basic conditional data extraction issue. I have already written a code in Python. I am learning R; and I would like to replicate the same code in R.
I tried to put conditional arguments using which, but that doesn't seem to work. I am not yet fully versed with R syntax.
I have a dataframe with 2 columns: x and y
The idea is to extract a list of maximum 5 x-values multiplied by 2 corresponding to the maximum y-values with a condition that we will select only those values of y which are at least 0.45 times the peak y-value.
So, the algorithm will have the following steps:
We find the peak value of y: max_y
We define the threshold = 0.45 * max_y
We apply a filter, to get the list of all y-values that are greater than the threshold value: y_filt
We get a list of x-values corresponding to the y-values in step 3: x_filt
If the number of values in x_filt is less than or equal to 5, then our result would be the values in x_filt multiplied by 2
If x_filt has more than 5 values, we only select the 5 values corresponding to the 5 maximum y-values in the list. Then we multiply by 2 to get our result
Python Code
max_y = max(y)
max_x = x[y.argmax()]
print (max_x, max_y)
threshold = 0.45 * max_y
y_filt = y [y > threshold]
x_filt = x [y > threshold]
if len(y_filt) > 4:
n_highest = 5
else:
n_highest = len(y_filt)
y_filt_highest = y_filt.argsort()[-n_highest:][::-1]
result = [x_filt[i]*2 for i in range(len(x_filt)) if i in y_filt_highest]
For Example Data-set
x y
1 20
2 7
3 5
4 11
5 0
6 8
7 3
8 10
9 2
10 6
11 15
12 18
13 0
14 1
15 12
The above code will give the following results
max_y = 20
max_x = 1
threshold = 9
y_filt = [20, 11, 10, 15, 18, 12]
x_filt = [1, 4, 8, 11, 12, 15]
n_highest = 5
y_filt_highest = [20, 11, 15, 18, 12]
result = [2, 8, 22, 24, 30]
I wish to do the same in R.

One of the reasons that R is so powerful/easy to use for statistical work is that the built in data.frame is foundational. Using one here simplifies things:
# Create a dataframe with the toy data
df <- data.frame(x = 1:10, y = c(20, 7, 5, 11, 0, 8, 3, 10, 2, 6))
# Refer to columns with the $ notation
max_y <- max(df$y)
max_x <- df$x[which(df$y == max_y)]
# If you want to print both values, you need to create a list with c()
print(c(max_x, max_y))
# But you could also just call the values directly, as in python
max_x
max_y
# Calculate a threshold and then create a filtered data.frame
threshold <- 0.45 * max_y
df_filt <- df[which(df$y > threshold), ]
df_filt <- df_filt[order(-df_filt$y), ]
if(nrow(df_filt) > 5){
df_filt <- df_filt[1:5, ]
}
# Calculate the result
result <- df_filt$x * 2
# Alternatively, you may want the result to be part of your data.frame
df_filt$result <- df_filt$x*2
# Should show identical results
max_y
max_x
threshold
df_filt # Probably don't want to print a df if it is large
result
Of course if you really need separate vectors for y_filt and x_filt, you could create them easily after the fact:
y_filt <- df_filt$y
x_filt <- df_filt$x
Note that like numpy.argmax, which(df$y == max(y)) will return multiple values if your maximum is not unique.

Pandas: vectorization with function on two dataframes

I'm having trouble with implementing vectorization in pandas. Let me preface this by saying I am a total newbie to vectorization so it's extremely likely that I'm getting some syntax wrong.
Let's say I've got two pandas dataframes.
Dataframe one describes the x,y coordinates of some circles with radius R, with unique IDs.
>>> data1 = {'ID': [1, 2], 'x': [1, 10], 'y': [1, 10], 'R': [4, 5]}
>>> df_1=pd.DataFrame(data=data1)
>>>
>>> df_1
ID x y R
1 1 1 4
2 10 10 5
Dataframe two describes the x,y coordinates of some points, also with unique IDs.
>>> data2 = {'ID': [3, 4, 5], 'x': [1, 3, 9], 'y': [2, 5, 9]}
>>> df_2=pd.DataFrame(data=data2)
>>>
>>> df_2
ID x y
3 1 2
4 3 5
5 9 9
Now, imagine plotting the circles and the points on a 2D plane. Some of the points will reside inside the circles. See the image below.
All I want to do is create a new column in df_2 called "host_circle" that indicates the ID of the circle that each point resides in. If the particle does not reside in a circle, the value should be "None".
My desired output would be
>>> df_2
ID x y host_circle
3 1 2 1
4 3 5 None
5 9 9 2
First, define a function that checks if a given particle (x2,y2) resides inside a given circle (x1,y1,R1,ID_1). If it does, return the ID of the circle; else, return None.
>>> def func(x1,y1,R1,ID_1,x2,y2):
... dist = np.sqrt( (x1-x2)**2 + (y1-y2)**2 )
... if dist < R:
... return ID_1
... else:
... return None
Next, the actual vectorization. I'm sorta lost here. I think it should be something like
df_2['host']=func(df_1['x'],df_1['y'],df_1['R'],df_1['ID'],df_2['x'],df_2['y'])
but that just throws errors. Can someone help me?
One final note: My actual data I'm working with is VERY large; tens of millions of rows. Speed is crucial, hence why I'm trying to make vectorization work.

Numba v1
You might have to install numba with
pip install numba
Then use numbas jit compiler via the njit function decorator
from numba import njit
#njit
def distances(point, points):
return ((points - point) ** 2).sum(1) ** .5
#njit
def find_my_circle(point, circles):
points = circles[:, :2]
radii = circles[:, 2]
dist = distances(point, points)
mask = dist < radii
i = mask.argmax()
return i if mask[i] else -1
#njit
def find_my_circles(points, circles):
n = len(points)
out = np.zeros(n, np.int64)
for i in range(n):
out[i] = find_my_circle(points[i], circles)
return out
ids = np.append(df_1.ID.values, np.nan)
i = find_my_circles(points, df_1[['x', 'y', 'R']].values)
df_2['host_circle'] = ids[i]
df_2
ID x y host_circle
0 3 1 2 1.0
1 4 3 5 NaN
2 5 9 9 2.0
This iterates row by row... meaning one point at a time it tries to find the host circle. Now, that part is still vectorized. And the loop should be very fast. The massive benefit is that you don't occupy tons of memory.
Numba v2
This one is more loopy but short circuits when it finds a host
from numba import njit
#njit
def distance(a, b):
return ((a - b) ** 2).sum() ** .5
#njit
def find_my_circles(points, circles):
n = len(points)
m = len(circles)
out = -np.ones(n, np.int64)
centers = circles[:, :2]
radii = circles[:, 2]
for i in range(n):
for j in range(m):
if distance(points[i], centers[j]) < radii[j]:
out[i] = j
break
return out
ids = np.append(df_1.ID.values, np.nan)
i = find_my_circles(points, df_1[['x', 'y', 'R']].values)
df_2['host_circle'] = ids[i]
df_2
Vectorized
But still problematic
c = ['x', 'y']
centers = df_1[c].values
points = df_2[c].values
radii = df_1['R'].values
i, j = np.where(((points[:, None] - centers) ** 2).sum(2) ** .5 < radii)
df_2.loc[df_2.index[i], 'host_circle'] = df_1['ID'].iloc[j].values
df_2
ID x y host_circle
0 3 1 2 1.0
1 4 3 5 NaN
2 5 9 9 2.0
Explanation
Distance from any point from the center of a circle is
((x1 - x0) ** 2 + (y1 - y0) ** 2) ** .5
I can use broadcasting if I extend one of my arrays into a third dimension
points[:, None] - centers
array([[[ 0, 1],
[-9, -8]],
[[ 2, 4],
[-7, -5]],
[[ 8, 8],
[-1, -1]]])
That is all six combinations of vector differences. Now to calculate the distances.
((points[:, None] - centers) ** 2).sum(2) ** .5
array([[ 1. , 12.04159458],
[ 4.47213595, 8.60232527],
[11.3137085 , 1.41421356]])
Thats all 6 combinations of distances and I can compare against the radii to see which are within the circles
((points[:, None] - centers) ** 2).sum(2) ** .5 < radii
array([[ True, False],
[False, False],
[False, True]])
Ok, I want to find where the True values are. That is a perfect use case for np.where. It will give me two arrays, the first will be the row positions, the second the column positions of where these True values are. Turns out, the row positions are the points and column positions are the circles.
i, j = np.where(((points[:, None] - centers) ** 2).sum(2) ** .5 < radii)
Now I just have to slice df_2 with i somehow and assign to it values I get from df_1 using j somehow... But I showed that above.

Try this. I have modified your function a bit for calculation and I am getting as list assuming there are many circle satisfying one point. You can modify it if that's not the case. Also it will be zero member list in case particle do not reside in any of the circle
def func(df, x2,y2):
val = df.apply(lambda row: np.sqrt((row['x']-x2)**2 + (row['y']-y2)**2) < row['R'], axis=1)
return list(val.index[val==True])
df_2['host'] = df_2.apply(lambda row: func(df_1, row['x'],row['y']), axis=1)

Finding intersections of a skeletonised image in python opencv

I have a skeletonised image (shown below).
I would like to get the intersections of the lines. I have tried the following method below, skeleton is a openCV image and the algorithm returns a list of coordinates:
def getSkeletonIntersection(skeleton):
image = skeleton.copy();
image = image/255;
intersections = list();
for y in range(1,len(image)-1):
for x in range(1,len(image[y])-1):
if image[y][x] == 1:
neighbourCount = 0;
neighbours = neighbourCoords(x,y);
for n in neighbours:
if (image[n[1]][n[0]] == 1):
neighbourCount += 1;
if(neighbourCount > 2):
print(neighbourCount,x,y);
intersections.append((x,y));
return intersections;
It finds the coordinates of white pixels where there are more than two adjacent pixels. I thought that this would only return corners but it does not - it returns many more points.
This is the output with the points it detects marked on the image. This is because it detects some of the examples shown below that are not intersections.
0 0 0 1 1 0 0 1 1
1 1 1 0 1 0 1 1 0
0 0 1 0 0 1 0 0 0
And many more examples. Is there another method I should look at to detect intersections. All input and ideas appreciated, thanks.

I am not sure about OpenCV features, but you should maybe try using Hit and Miss morphology which is described here.
Read up on Line Junctions and see the 12 templates you need to test for:

I received an email recently asking for my eventual solution to the problem. It is posted below such that it could inform others. I make no claim that this code is particularly fast or stable - only that it's what worked for me! The function also includes filtering of duplicates and intersections detected too close together, suggesting that they are not real intersections and instead introduced noise from the skeletonisation process.
def neighbours(x,y,image):
"""Return 8-neighbours of image point P1(x,y), in a clockwise order"""
img = image
x_1, y_1, x1, y1 = x-1, y-1, x+1, y+1;
return [ img[x_1][y], img[x_1][y1], img[x][y1], img[x1][y1], img[x1][y], img[x1][y_1], img[x][y_1], img[x_1][y_1] ]
def getSkeletonIntersection(skeleton):
""" Given a skeletonised image, it will give the coordinates of the intersections of the skeleton.
Keyword arguments:
skeleton -- the skeletonised image to detect the intersections of
Returns:
List of 2-tuples (x,y) containing the intersection coordinates
"""
# A biiiiiig list of valid intersections 2 3 4
# These are in the format shown to the right 1 C 5
# 8 7 6
validIntersection = [[0,1,0,1,0,0,1,0],[0,0,1,0,1,0,0,1],[1,0,0,1,0,1,0,0],
[0,1,0,0,1,0,1,0],[0,0,1,0,0,1,0,1],[1,0,0,1,0,0,1,0],
[0,1,0,0,1,0,0,1],[1,0,1,0,0,1,0,0],[0,1,0,0,0,1,0,1],
[0,1,0,1,0,0,0,1],[0,1,0,1,0,1,0,0],[0,0,0,1,0,1,0,1],
[1,0,1,0,0,0,1,0],[1,0,1,0,1,0,0,0],[0,0,1,0,1,0,1,0],
[1,0,0,0,1,0,1,0],[1,0,0,1,1,1,0,0],[0,0,1,0,0,1,1,1],
[1,1,0,0,1,0,0,1],[0,1,1,1,0,0,1,0],[1,0,1,1,0,0,1,0],
[1,0,1,0,0,1,1,0],[1,0,1,1,0,1,1,0],[0,1,1,0,1,0,1,1],
[1,1,0,1,1,0,1,0],[1,1,0,0,1,0,1,0],[0,1,1,0,1,0,1,0],
[0,0,1,0,1,0,1,1],[1,0,0,1,1,0,1,0],[1,0,1,0,1,1,0,1],
[1,0,1,0,1,1,0,0],[1,0,1,0,1,0,0,1],[0,1,0,0,1,0,1,1],
[0,1,1,0,1,0,0,1],[1,1,0,1,0,0,1,0],[0,1,0,1,1,0,1,0],
[0,0,1,0,1,1,0,1],[1,0,1,0,0,1,0,1],[1,0,0,1,0,1,1,0],
[1,0,1,1,0,1,0,0]];
image = skeleton.copy();
image = image/255;
intersections = list();
for x in range(1,len(image)-1):
for y in range(1,len(image[x])-1):
# If we have a white pixel
if image[x][y] == 1:
neighbours = neighbours(x,y,image);
valid = True;
if neighbours in validIntersection:
intersections.append((y,x));
# Filter intersections to make sure we don't count them twice or ones that are very close together
for point1 in intersections:
for point2 in intersections:
if (((point1[0] - point2[0])**2 + (point1[1] - point2[1])**2) < 10**2) and (point1 != point2):
intersections.remove(point2);
# Remove duplicates
intersections = list(set(intersections));
return intersections;
This is also available on github here.

It might help if when for a given pixel, instead of counting the number of total 8-neighbors (= neighbors with a connectivity 8), you count the number of 8-neighbors which are not 4-neighbors with each other
So in your example of false positives
0 0 0 1 1 0 0 1 1
1 1 1 0 1 0 1 1 0
0 0 1 0 0 1 0 0 0
For every case, you have 3 neighbors, but each time, 2 of them are 4-connected. (pixels marked "2" in next snippet)
0 0 0 2 2 0 0 2 2
1 1 2 0 1 0 1 1 0
0 0 2 0 0 1 0 0 0
If you consider only one of these for your counts (instead of both of them in your code right now), you indeed have only 2 total newly-defined "neighbors" and the considered points are not considered intersections.
Other "real intersections" would still be kept, like the following
0 1 0 0 1 0 0 1 0
1 1 1 0 1 0 1 1 0
0 0 0 1 0 1 0 0 1
which still have 3 newly-defined neighbors.
I haven't checked on your image if it works perfectly, but I had implemented something like this for this problem a while back...

Here is my solution:
# Functions to generate kernels of curve intersection
def generate_nonadjacent_combination(input_list,take_n):
"""
It generates combinations of m taken n at a time where there is no adjacent n.
INPUT:
input_list = (iterable) List of elements you want to extract the combination
take_n = (integer) Number of elements that you are going to take at a time in
each combination
OUTPUT:
all_comb = (np.array) with all the combinations
"""
all_comb = []
for comb in itertools.combinations(input_list, take_n):
comb = np.array(comb)
d = np.diff(comb)
fd = np.diff(np.flip(comb))
if len(d[d==1]) == 0 and comb[-1] - comb[0] != 7:
all_comb.append(comb)
print(comb)
return all_comb
def populate_intersection_kernel(combinations):
"""
Maps the numbers from 0-7 into the 8 pixels surrounding the center pixel in
a 9 x 9 matrix clockwisely i.e. up_pixel = 0, right_pixel = 2, etc. And
generates a kernel that represents a line intersection, where the center
pixel is occupied and 3 or 4 pixels of the border are ocuppied too.
INPUT:
combinations = (np.array) matrix where every row is a vector of combinations
OUTPUT:
kernels = (List) list of 9 x 9 kernels/masks. each element is a mask.
"""
n = len(combinations[0])
template = np.array((
[-1, -1, -1],
[-1, 1, -1],
[-1, -1, -1]), dtype="int")
match = [(0,1),(0,2),(1,2),(2,2),(2,1),(2,0),(1,0),(0,0)]
kernels = []
for n in combinations:
tmp = np.copy(template)
for m in n:
tmp[match[m][0],match[m][1]] = 1
kernels.append(tmp)
return kernels
def give_intersection_kernels():
"""
Generates all the intersection kernels in a 9x9 matrix.
INPUT:
None
OUTPUT:
kernels = (List) list of 9 x 9 kernels/masks. each element is a mask.
"""
input_list = np.arange(8)
taken_n = [4,3]
kernels = []
for taken in taken_n:
comb = generate_nonadjacent_combination(input_list,taken)
tmp_ker = populate_intersection_kernel(comb)
kernels.extend(tmp_ker)
return kernels
# Find the curve intersections
def find_line_intersection(input_image, show=0):
"""
Applies morphologyEx with parameter HitsMiss to look for all the curve
intersection kernels generated with give_intersection_kernels() function.
INPUT:
input_image = (np.array dtype=np.uint8) binarized m x n image matrix
OUTPUT:
output_image = (np.array dtype=np.uint8) image where the nonzero pixels
are the line intersection.
"""
kernel = np.array(give_intersection_kernels())
output_image = np.zeros(input_image.shape)
for i in np.arange(len(kernel)):
out = cv2.morphologyEx(input_image, cv2.MORPH_HITMISS, kernel[i,:,:])
output_image = output_image + out
if show == 1:
show_image = np.reshape(np.repeat(input_image, 3, axis=1),(input_image.shape[0],input_image.shape[1],3))*255
show_image[:,:,1] = show_image[:,:,1] - output_image *255
show_image[:,:,2] = show_image[:,:,2] - output_image *255
plt.imshow(show_image)
return output_image
# finding corners
def find_endoflines(input_image, show=0):
"""
"""
kernel_0 = np.array((
[-1, -1, -1],
[-1, 1, -1],
[-1, 1, -1]), dtype="int")
kernel_1 = np.array((
[-1, -1, -1],
[-1, 1, -1],
[1,-1, -1]), dtype="int")
kernel_2 = np.array((
[-1, -1, -1],
[1, 1, -1],
[-1,-1, -1]), dtype="int")
kernel_3 = np.array((
[1, -1, -1],
[-1, 1, -1],
[-1,-1, -1]), dtype="int")
kernel_4 = np.array((
[-1, 1, -1],
[-1, 1, -1],
[-1,-1, -1]), dtype="int")
kernel_5 = np.array((
[-1, -1, 1],
[-1, 1, -1],
[-1,-1, -1]), dtype="int")
kernel_6 = np.array((
[-1, -1, -1],
[-1, 1, 1],
[-1,-1, -1]), dtype="int")
kernel_7 = np.array((
[-1, -1, -1],
[-1, 1, -1],
[-1,-1, 1]), dtype="int")
kernel = np.array((kernel_0,kernel_1,kernel_2,kernel_3,kernel_4,kernel_5,kernel_6, kernel_7))
output_image = np.zeros(input_image.shape)
for i in np.arange(8):
out = cv2.morphologyEx(input_image, cv2.MORPH_HITMISS, kernel[i,:,:])
output_image = output_image + out
if show == 1:
show_image = np.reshape(np.repeat(input_image, 3, axis=1),(input_image.shape[0],input_image.shape[1],3))*255
show_image[:,:,1] = show_image[:,:,1] - output_image *255
show_image[:,:,2] = show_image[:,:,2] - output_image *255
plt.imshow(show_image)
return output_image#, np.where(output_image == 1)
# 0- Find end of lines
input_image = img.astype(np.uint8) # must be blaack and white thin network image
eol_img = find_endoflines(input_image, 0)
# 1- Find curve Intersections
lint_img = find_line_intersection(input_image, 0)
# 2- Put together all the nodes
nodes = eol_img + lint_img
plt.imshow(nodes)

"Slice" a number into three random numbers

I need to generate a file filled with three "random" values per line (10 lines), but those values sum must equal 15.
The structure is: "INDEX A B C".
Example:
1 15 0 0
2 0 15 0
3 0 0 15
4 1 14 0
5 2 13 0
6 3 12 0
7 4 11 0
8 5 10 0
9 6 9 0
10 7 8 0

If you want to avoid needing to create (or iterate through) the full space of satisfying permutations (which, for large N is important), then you can solve this problem with sequential sample.
My first approach was to just draw a value uniformly from [0, N], call it x. Then draw a value uniformly from [0, N-x] and call it y, then set z = N - x - y. If you then shuffle these three, you'll get back a reasonable draw from the space of solutions, but it won't be exactly uniform.
As an example, consider where N=3. Then the probability of some permutation of (3, 0, 0) is 1/4, even though it is only one out of 10 possible triplets. So this privileges values that contain a high max.
You can perfectly counterbalance this effect by sampling the first value x proportionally to how many values will be possible for y conditioned on x. So for example, if x happened to be N, then there is only 1 compatible value for y, but if x is 0, then there are 4 compatible values, namely 0 through 3.
In other words, let Pr(X=x) be (N-x+1)/sum_i(N-i+1) for i from 0 to N. Then let Pr(Y=y | X=x) be uniform on [0, N-x].
This works out to P(X,Y) = P(Y|X=x) * P(X) = 1/(N-x+1) * [N - x + 1]/sum_i(N-i+1), which is seen to be uniform, 1/sum_i(N-i+1), for each candidate triplet.
Note that sum(N-i+1 for i in range(0, N+1)) gives the number of different ways to sum 3 non-negative integers to get N. I don't know a good proof of this, and would happy if someone adds one to the comments!
Here's a solution that will sample this way:
import random
from collections import Counter
def discrete_sample(weights):
u = random.uniform(0, 1)
w_t = 0
for i, w in enumerate(weights):
w_t += w
if u <= w_t:
return i
return len(weights)-1
def get_weights(N):
vals = [(N-i+1.0) for i in range(0, N+1)]
totl = sum(vals)
return [v/totl for v in vals]
def draw_summing_triplet(N):
weights = get_weights(N)
x = discrete_sample(weights)
y = random.randint(0, N-x)
triplet = [x, y, N - x - y]
random.shuffle(triplet)
return tuple(triplet)
Much credit goes to #DSM in the comments for questioning my original answer and providing good feedback.
In this case, we can test out the sampler like this:
foo = Counter(draw_summing_triplet(3) for i in range(10**6))
print foo
Counter({(1, 2, 0): 100381,
(0, 2, 1): 100250,
(1, 1, 1): 100027,
(2, 1, 0): 100011,
(0, 3, 0): 100002,
(3, 0, 0): 99977,
(2, 0, 1): 99972,
(1, 0, 2): 99854,
(0, 0, 3): 99782,
(0, 1, 2): 99744})

If the numbers can by any just use combinations:
from itertools import combinations
with open("rand.txt","w") as f:
combs = [x for x in combinations(range(16),3) if sum(x ) == 15 ][:10]
for a,b,c in combs:
f.write("{} {} {}\n".format(a,b,c))

This seems straight forward to me and it utilizes the random module.
import random
def foo(x):
a = random.randint(0,x)
b = random.randint(0,x-a)
c = x - (a +b)
return (a,b,c)
for i in range(100):
print foo(15)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to Find the Max Plot of Land From a 2D Array? - python

Related

When plotting, wanting to 'hold' a y value over an x interval. [Not a 'bar plot' question]

How to sort & extract values with multiple conditions in R?

Pandas: vectorization with function on two dataframes

Finding intersections of a skeletonised image in python opencv

"Slice" a number into three random numbers

Categories

Resources