I'm trying to return a list of list of vertical, horizontal and diagonal nearest neighbors of every item of a 2D numpy array
import numpy as np
import copy
tilemap = np.arange(99).reshape(11, 9)
print(tilemap)
def get_neighbor(pos, array):
x = copy.deepcopy(pos[0])
y = copy.deepcopy(pos[1])
grid = copy.deepcopy(array)
split = []
split.append([grid[y-1][x-1]])
split.append([grid[y-1][x]])
split.append([grid[y-1][x+1]])
split.append([grid[y][x - 1]])
split.append([grid[y][x+1]])
split.append([grid[y+1][x-1]])
split.append([grid[y+1][x]])
split.append([grid[y+1][x+1]])
print("\n Neighbors of ITEM[{}]\n {}".format(grid[y][x],split))
cordinates = [5, 6]
get_neighbor(pos=cordinates, array=tilemap)
i would want a list like this:
first item = 0
[[1],[12],[13],
[1,2], [12,24],[13,26],
[1,2,3], [12,24,36], [13,26,39]....
till it get to the boundaries completely then proceeds to second item = 1
and keeps adding to the list. if there is a neighbor above it should be add too..
MY RESULT
[[ 0 1 2 3 4 5 6 7 8]
[ 9 10 11 12 13 14 15 16 17]
[18 19 20 21 22 23 24 25 26]
[27 28 29 30 31 32 33 34 35]
[36 37 38 39 40 41 42 43 44]
[45 46 47 48 49 50 51 52 53]
[54 55 56 57 58 59 60 61 62]
[63 64 65 66 67 68 69 70 71]
[72 73 74 75 76 77 78 79 80]
[81 82 83 84 85 86 87 88 89]
[90 91 92 93 94 95 96 97 98]]
Neighbors of ITEM[59]
[[49], [50], [51], [58], [60], [67], [68], [69]]
Alright, what about a using a function like this? This takes the array, your target index, and the "radius" of the elements to be included.
def get_idx_adj(arr, idx, radius):
num_rows, num_cols = arr.shape
idx_row, idx_col = idx
slice_1 = np.s_[max(0, idx_row - radius):min(num_rows, idx_row + radius + 1)]
slice_2 = np.s_[max(0, idx_col - radius):min(num_cols, idx_col + radius + 1)]
return arr[slice_1, slice_2]
I'm currently trying to find the best way to transform the index of the element, so that the function can be used on its own output successively to get all the subarrays of various sizes.
Related
I want to get some pixel values from an image which are not equal with a specified value. But I want to get back in RGB format and not as a long vector. How can I do it?
import cv2
import numpy as np
image = cv2.imread('image.jpg')
sought = [36,255,12]
result = image[image!=sought]
import sys
np.set_printoptions(threshold=sys.maxsize)
print(result)
And I've got:
[20 103 75 21 98 70 16 100 72 18 101 73 19 97 69 15 95 66
15 95 67 13 101 73 19 104 77 21 96 69 13 94 65 8 99 69
14 98 68 13 94 63 10 88 66 24 92 69 24 92 67 23 93 67
13 93 67 13 93 67 13 97 72 16 96 70 16 93 66 15 96 68
.....
99 69 14 96 66 11 91 67 25 88 65 20 92 68 14 96 69 18
96 70 16 91 64 13 95 67 13 92 64 10 90 63]
But I want something like this:
[[[R,G,B], [R,G,B], [R,G,B], [R,G,B]],
......
[[R,G,B], [R,G,B], [R,G,B], [R,G,B]]]
What did I miss here?
If the wanted output is a list of pixels, then after the component-wise comparison, you must check what pixels differ on any of the R,G or B with .any(axis = 2):
image[(image != sought).any(axis=2)]
output of the form:
array([[ 22, 136, 161],
[197, 141, 153],
[173, 122, 65],
[137, 189, 67],
...
[166, 205, 238],
[207, 99, 129],
[ 44, 76, 97]])
With result = image[image != sought] you lost the shape of image.
The solution is to get a mask (image != sought) and work on the image with that mask (e.g. using np.where)
Generate some data:
import numpy as np
H, W, C = 8, 8, 3
sought = [255, 0, 0]
colors = np.array(
[sought, [0, 0, 255], [0, 255, 0], [0, 255, 255], [255, 0, 255], [255, 255, 0]]
)
colors_idxs = np.random.choice(np.arange(len(colors)), size=(H, W))
image = colors[colors_idxs]
Compute mask (note the keepdims=True for np.where to work easier):
mask = np.any(image != sought, axis=-1, keepdims=True)
# Get color back into the mask
mask_inpaint_pos = np.where(mask, image, 0)
mask_inpaint_neg = np.where(mask, 0, image)
Plot:
import matplotlib.pyplot as plt
fig, (ax_im, ax_mask, ax_mask_pos, ax_mask_neg) = plt.subplots(ncols=4, sharey=True)
ax_im.set_title("Original")
ax_im.imshow(image)
ax_mask.set_title("Mask binary")
ax_mask.imshow(mask.astype(int), cmap="gray")
ax_mask_pos.set_title("Mask True RGB")
ax_mask_pos.imshow(mask_inpaint_pos)
ax_mask_neg.set_title("Mask False RGB")
ax_mask_neg.imshow(mask_inpaint_neg)
plt.show()
I'm very new to python and I volunteered to help a colleague of mine to combine fatty acids with a certain threshold. There are only 2 acids that exceed this threshold when you combine all of the oils together, so they are my limiting factors. I found a sum(combination) method that works how I would like for each acid on its own but now I need to compare those results against each other to make sure that if one threshold is met the other isn't exceeded e.g. if Canola, Coconut and Sesame have a threshold of oleic acid that is less than 131 but the combination of these oils gives me a sum of linoleic acid over 131, I cannot use that combination. This is the code I have so far:
import itertools
Coconut_oleic = 6
Canola_oleic = 62
OliveOil_oleic = 71
PeanutOil_oleic = 48
SesameOil_oleic = 39
SunflowerOil_oleic = 30
ButterFat_oleic = 24
Coconut_linoleic = 2
Canola_linoleic = 22
Olive_linoleic = 10
Peanut_linoleic = 32
SesameOil_linoleic = 41
SunflowerOil_linoleic = 59
ButterFat_linoleic = 3
myList = [Coconut_oleic, Canola_oleic, OliveOil_oleic, PeanutOil_oleic, SunflowerOil_oleic, SesameOil_oleic, ButterFat_oleic]
myList2 = [Coconut_linoleic, Canola_linoleic, Olive_linoleic, Peanut_linoleic, SesameOil_linoleic, SunflowerOil_linoleic, ButterFat_linoleic]
for i in range(len(myList)):
for combinations in itertools.combinations(myList, i):
if 0 < sum(combinations) < 131:
print('Oleic',combinations, sum(combinations))
for i in range(len(myList2)):
for combinations in itertools.combinations(myList2, i):
if 0 < sum(combinations) < 131:
print('Linoleic',combinations, sum(combinations)
The programming problem that you describe can be solved with a few improvements to your code:
instead of defining all the values as single variables and then combining them in 2 lists, try keeping all the values in a single dictionary, so they are easier to use together and you don't end up defining every product twice; it's all a single definition
this also means you don't have to loop over them twice, or alternatively use one index to access them both, with the risk of there being differences between the two defined lists
you're only interested in outcomes that have both totals below 131, so you need to represent that (with a logical and for example)
For example:
import itertools
# oleic and linoleic content of oils
oils = {
'Coconut oil': (6, 2),
'Canola oil': (62, 22),
'Olive oil': (71, 10),
'Peanut oil': (48, 32),
'Sesame oil': (39, 41),
'Sunflower oil': (30, 59),
'Butter fat': (24, 3)
}
for i in range(len(oils)):
for c in itertools.combinations(oils.keys(), i):
if (0 < (o := sum(oils[part][0] for part in c)) < 131 and
0 < (l := sum(oils[part][1] for part in c)) < 131):
print(f'The combination {c} has oleic content {o} and linoleic content {l}')
Note the use of the walrus operator := to remember the computed value in a variable. It's needed twice, once to see if the value is acceptable and then again to be printed, and computing it twice would be wasteful.
Output (partial):
...
The combination ('Sesame oil', 'Butter fat') has oleic content 63 and linoleic content 44
The combination ('Sunflower oil', 'Butter fat') has oleic content 54 and linoleic content 62
The combination ('Coconut oil', 'Canola oil', 'Peanut oil') has oleic content 116 and linoleic content 56
The combination ('Coconut oil', 'Canola oil', 'Sesame oil') has oleic content 107 and linoleic content 65
...
However, it seems that the problem is a bit silly? Adding two oils together and adding up their indivual oileic or linoleic acid content doesn't really make sense? It would seem that the acid content of the combination would just be directly proportional to the amount of each oil you add to the mixture.
Which means every combination is fine, since all the individual values are below 131. If instead you were looking for mixtures with a combined value below, say, 40 - that would be a more sensible problem (and trickier, because there could be many different mix ratios of the same three oils that would work).
This is fairly similar to what you had, just using some formats that are a little easier to work with. The main step is merging the two results together where their combinations are the same... which since we filtered them out earlier, we're left with just combinations that both have < 131.
import itertools
import pandas as pd
oliec = [['Coconut', 6],
['Canola', 62],
['OliveOil', 71],
['PeanutOil', 48],
['SesameOil', 39],
['SunflowerOil', 30],
['ButterFat', 24]]
oliec = pd.DataFrame(oliec, columns=['oils', 'threshold'])
linoliec = [['Coconut', 2],
['Canola', 22],
['Olive', 10],
['Peanut', 32],
['SesameOil', 41],
['SunflowerOil', 59],
['ButterFat', 3]]
linoliec = pd.DataFrame(linoliec, columns=['oils', 'threshold'])
oliec_results = []
for i in range(len(oliec.oils)):
for combinations in itertools.combinations(oliec.oils, i):
combined_thresh = oliec[oliec.oils.isin(combinations)].threshold.sum()
if combined_thresh < 131:
oliec_results.append([combinations, combined_thresh])
oliec_results = pd.DataFrame(oliec_results, columns=['combinations', 'threshold'])
linoliec_results = []
for i in range(len(linoliec.oils)):
for combinations in itertools.combinations(linoliec.oils, i):
combined_thresh = linoliec[linoliec.oils.isin(combinations)].threshold.sum()
if combined_thresh < 131:
linoliec_results.append([combinations, combined_thresh])
linoliec_results = pd.DataFrame(linoliec_results, columns=['combinations', 'threshold'])
joined = linoliec_results.merge(oliec_results, on='combinations', suffixes=['_linoliec', '_oliec'])
print(joined)
Output:
combinations threshold_linoliec threshold_oliec
0 () 0 0
1 (Coconut,) 2 6
2 (Canola,) 22 62
3 (SesameOil,) 41 39
4 (SunflowerOil,) 59 30
5 (ButterFat,) 3 24
6 (Coconut, Canola) 24 68
7 (Coconut, SesameOil) 43 45
8 (Coconut, SunflowerOil) 61 36
9 (Coconut, ButterFat) 5 30
10 (Canola, SesameOil) 63 101
11 (Canola, SunflowerOil) 81 92
12 (Canola, ButterFat) 25 86
13 (SesameOil, SunflowerOil) 100 69
14 (SesameOil, ButterFat) 44 63
15 (SunflowerOil, ButterFat) 62 54
16 (Coconut, Canola, SesameOil) 65 107
17 (Coconut, Canola, SunflowerOil) 83 98
18 (Coconut, Canola, ButterFat) 27 92
19 (Coconut, SesameOil, SunflowerOil) 102 75
20 (Coconut, SesameOil, ButterFat) 46 69
21 (Coconut, SunflowerOil, ButterFat) 64 60
22 (Canola, SesameOil, ButterFat) 66 125
23 (Canola, SunflowerOil, ButterFat) 84 116
24 (SesameOil, SunflowerOil, ButterFat) 103 93
25 (Coconut, Canola, SunflowerOil, ButterFat) 86 122
26 (Coconut, SesameOil, SunflowerOil, ButterFat) 105 99
What I tried was this:
import numpy as np
def test_random(nr_selections, n, prob):
selected = np.random.choice(n, size=nr_selections, replace= False, p = prob)
print(str(nr_selections) + ': ' + str(selected))
n = 100
prob = np.random.choice(100, n)
prob = prob / np.sum(prob) #only for demonstration purpose
for i in np.arange(10, 100, 10):
np.random.seed(123)
test_random(i, n, prob)
The result was:
10: [68 32 25 54 72 45 96 67 49 40]
20: [68 32 25 54 72 45 96 67 49 40 36 74 46 7 21 20 53 65 89 77]
30: [68 32 25 54 72 45 96 67 49 40 36 74 46 7 21 20 53 62 86 60 35 37 8 48
52 47 31 92 95 56]
40: ...
Contrary to my expectation and hope, the 30 numbers selected do not contain all of the 20 numbers. I also tried using numpy.random.default_rng, but only strayed further away from my desired output. I also simplified the original problem somewhat in the above example. Any help would be greatly appreciated. Thank you!
Edit for clarification: I do not want to generate all the sequences in one loop (like in the example above) but rather use the related sequences in different runs of the same program. (Ideally, without storing them somewhere)
I have applied DBSCAN to perform clustering on a dataset consisting of X, Y and Z coordinates of each point in a point cloud. I want to plot only the clusters which have less than 100 points. This is what I have so far:
clustering = DBSCAN(eps=0.1, min_samples=20, metric='euclidean').fit(only_xy)
plt.scatter(only_xy[:, 0], only_xy[:, 1],
c=clustering.labels_, cmap='rainbow')
clusters = clustering.components_
#Store the labels
labels = clustering.labels_
#Then get the frequency count of the non-negative labels
counts = np.bincount(labels[labels>=0])
print(counts)
Output:
[1278 564 208 47 36 30 191 54 24 18 40 915 26 20
24 527 56 677 63 57 61 1544 512 21 45 187 39 132
48 55 160 46 28 18 55 48 35 92 29 88 53 55
24 52 114 49 34 34 38 52 38 53 69]
So I have found the number of points in each cluster, but I'm not sure how to select only the clusters which have less than 100 points.
You may find indexes of the labels where you have counts less than 100:
ls, cs = np.unique(labels,return_counts=True)
dic = dict(zip(ls,cs))
idx = [i for i,label in enumerate(labels) if dic[label] <100 and label >= 0]
Then you may apply resulting index to your DBSCAN results and labels like (more or less):
plt.scatter(only_xy[idx, 0], only_xy[idx, 1],
c=clustering.labels_[idx], cmap='rainbow')
I think if you run this code, you can get the labels, and cluster components of the cluster with size more than 100:
from collections import Counter
labels_with_morethan100=[label for (label,count) in Counter(clustering.labels_).items() if count>100]
clusters_biggerthan100= clustering.components_[np.isin(clustering.labels_[clustering.labels_>=0], labels_with_morethan100)]
Having a table "tempcc" of value with x,y geografic coords (don't know attaching files here, there is 86 rows in my csv):
X Y Temp
0 35.268 55.618 1.065389
1 35.230 55.682 1.119160
2 35.508 55.690 1.026214
3 35.482 55.652 1.007834
4 35.289 55.664 1.087598
5 35.239 55.655 1.099459
6 35.345 55.662 1.066117
7 35.402 55.649 1.035958
8 35.506 55.643 0.991939
9 35.526 55.688 1.018137
10 35.541 55.695 1.017870
11 35.471 55.682 1.033929
12 35.573 55.668 0.985559
13 35.547 55.651 0.982335
14 35.425 55.671 1.042975
15 35.505 55.675 1.016236
16 35.600 55.681 0.985532
17 35.458 55.717 1.063691
18 35.538 55.720 1.037523
19 35.230 55.726 1.146047
20 35.606 55.707 1.003364
21 35.582 55.700 1.006711
22 35.350 55.696 1.087173
23 35.309 55.677 1.088988
24 35.563 55.687 1.003785
25 35.510 55.764 1.079220
26 35.334 55.736 1.119026
27 35.429 55.745 1.093300
28 35.366 55.752 1.119061
29 35.501 55.745 1.068676
.. ... ... ...
56 35.472 55.800 1.117183
57 35.538 55.855 1.134721
58 35.507 55.834 1.129712
59 35.256 55.845 1.211969
60 35.338 55.823 1.174397
61 35.404 55.835 1.162387
62 35.460 55.826 1.138965
63 35.497 55.831 1.130774
64 35.469 55.844 1.148516
65 35.371 55.510 0.945187
66 35.378 55.545 0.969400
67 35.456 55.502 0.902285
68 35.429 55.517 0.925932
69 35.367 55.710 1.090652
70 35.431 55.490 0.903296
71 35.284 55.606 1.051335
72 35.234 55.634 1.088135
73 35.284 55.591 1.041181
74 35.354 55.587 1.010446
75 35.332 55.581 1.015004
76 35.356 55.606 1.023234
77 35.311 55.545 0.997468
78 35.307 55.575 1.020845
79 35.363 55.645 1.047831
80 35.401 55.628 1.021373
81 35.340 55.629 1.045491
82 35.440 55.643 1.017227
83 35.293 55.630 1.063910
84 35.370 55.623 1.029797
85 35.238 55.601 1.065699
I try to create isolines with:
from numpy import meshgrid,linspace
data=tempcc
m = Basemap(lat_0 = np.mean(tempcc['Y'].values),\
lon_0 = np.mean(tempcc['X'].values),\
llcrnrlon=35,llcrnrlat=55.3, \
urcrnrlon=35.9, urcrnrlat=56.0, resolution='l')
x = linspace(m.llcrnrlon, m.urcrnrlon, data.shape[1])
y = linspace(m.llcrnrlat, m.urcrnrlat, data.shape[0])
xx, yy = meshgrid(x, y)
m.contour(xx, yy, data,latlon=True)
#pt.legend()
m.scatter(tempcc['X'].values, tempcc['Y'].values, latlon=True)
#m.contour(x,y,data,latlon=True)
But I can't manage correctly, although everything seems to be fine. As far as I understand I have to make a 2D matrix of values, where i is lat, and j is lon, but I can't find the example.
The result I get
as you see, region is correct, but interpolation is not good.
What's the matter? Which parameter have I forgotten?
You could use a Triangulation and then call tricontour() instead of contour()
import matplotlib.pyplot as plt
from matplotlib.tri import Triangulation
from mpl_toolkits.basemap import Basemap
import numpy
m = Basemap(lat_0 = np.mean(tempcc['Y'].values),
lon_0 = np.mean(tempcc['X'].values),
llcrnrlon=35,llcrnrlat=55.3,
urcrnrlon=35.9, urcrnrlat=56.0, resolution='l')
triMesh = Triangulation(tempcc['X'].values, tempcc['Y'].values)
tctr = m.tricontour(triMesh, tempcc['Temp'].values,
levels=numpy.linspace(min(tempcc['Temp'].values),
max(tempcc['Temp'].values), 7),
latlon=True)