numpy detection of borders in 2-d array - python

I have matrix (below), which represents classes (e.x. 0,1,2). I am plottling it with plotly (python) using heatmap, and I can't find any function which will give me the coordinates of classes' borders.
array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 2, 2],
[1, 1, 1, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 0, 0],
[2, 2, 2, 2, 2, 2, 2, 0, 0, 0],
[2, 2, 2, 2, 2, 0, 0, 0, 0, 0],
[2, 2, 2, 2, 0, 0, 0, 0, 0, 0],
[2, 2, 2, 0, 0, 0, 0, 0, 0, 0],
[2, 2, 2, 0, 0, 0, 0, 0, 0, 0]])
The black lines below are borders, could you give some tips to calculate it in python efficient way? Checking every element in array and its neighbour works very slow.
upd. I also tried to look in plotly contour plot, but the shape of lines has interpolation so it looks no as in the exmaple above...

You can use skimage.measure to find the components in the array. Since 0 is considered to be "background", you'll have to change the label to another one, the maximum value +1 for instance. This will give you tuples of slices with the coordinates.
Labeling is necessary if blocks with the same label are not necessarily "attached", or in the same component. Otherwise you can directly use regionprops to find the slices.
from skimage.measure import label, regionprops
a[a==0]=a.max()+1
l = label(a)
for s in regionprops(l):
print(s.slice)
(slice(0, 4, None), slice(0, 10, None))
(slice(2, 10, None), slice(0, 10, None))
(slice(4, 10, None), slice(3, 10, None))
Input data:
a = np.array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 2, 2],
[1, 1, 1, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 0, 0],
[2, 2, 2, 2, 2, 2, 2, 0, 0, 0],
[2, 2, 2, 2, 2, 0, 0, 0, 0, 0],
[2, 2, 2, 2, 0, 0, 0, 0, 0, 0],
[2, 2, 2, 0, 0, 0, 0, 0, 0, 0],
[2, 2, 2, 0, 0, 0, 0, 0, 0, 0]])

Related

Why is memory not freed when reading files using `with as` in a loop?

I have several large files in a folder. Each single file fits in my RAM, but all of them not. I have the following loop processing each file:
for dataset_index,path in enumerate(file_paths_train):
with np.load(path) as dataset:
x_batch = dataset['data']
y_batch = dataset['labels']
for i in range(x_batch.shape[0]):
if y_batch[i] in INDICES:
# This selects a minimal subset of the data
data_list.append((y_batch[i], x_batch[i]))
# End loop
(the paths for all files are stored in the variable file_paths_train)
This answer stated that using with ... as ... would automatically delete the variable associated with the file once the program is out of the scope. Except it isn't, memory usage increases until the computer stops working and I need to restart.
Ideas?
Indexing a multidimensional array with a scalar creates a view. If that view is saved in a list, the original array remains, regardless of what happens to its variable references.
In [95]: alist = []
...: for i in range(3):
...: x = np.ones((10,10),int)*i
...: alist.append(x[0])
...:
In [96]: alist
Out[96]:
[array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]),
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1]),
array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2])]
In [97]: [item.base for item in alist]
Out[97]:
[array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]),
array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]),
array([[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2]])]
You have to append a copy if you want to truely 'throw-away' the original array.
In [98]: alist = []
...: for i in range(3):
...: x = np.ones((10,10),int)*i
...: alist.append(x[0].copy())
...:
...:
In [99]: alist
Out[99]:
[array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]),
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1]),
array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2])]
In [100]: [item.base for item in alist]
Out[100]: [None, None, None]

Adding link annotations to a PDF document

How can I add annotations (in a particular shape) to a PDF?
I want to be able to control:
the link target
the color
the shape of the link annotation
the location of the link annotation
Disclaimer: I am the author of the library being used in this answer
To showcase this behaviour, this example is going to re-create a shape using "pixel-art".
This array, together with these colors define the shape of super-mario:
m = [
[0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0],
[0, 0, 0, 2, 2, 2, 3, 3, 2, 3, 0, 0, 0, 0],
[0, 0, 2, 3, 2, 3, 3, 3, 2, 3, 3, 3, 0, 0],
[0, 0, 2, 3, 2, 2, 3, 3, 3, 2, 3, 3, 3, 0],
[0, 0, 2, 2, 3, 3, 3, 3, 2, 2, 2, 2, 0, 0],
[0, 0, 0, 0, 3, 3, 3, 3, 3, 3, 3, 0, 0, 0],
[0, 0, 0, 1, 1, 4, 1, 1, 1, 1, 1, 0, 0, 0],
[0, 0, 1, 1, 1, 4, 1, 1, 4, 1, 1, 1, 0, 0],
[0, 1, 1, 1, 1, 4, 4, 4, 4, 1, 1, 1, 1, 0],
[0, 3, 3, 1, 4, 5, 4, 4, 5, 4, 1, 3, 3, 0],
[0, 3, 3, 3, 4, 4, 4, 4, 4, 4, 3, 3, 3, 0],
[0, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 3, 3, 0],
[0, 0, 0, 4, 4, 4, 0, 0, 4, 4, 4, 0, 0, 0],
[0, 0, 2, 2, 2, 0, 0, 0, 0, 2, 2, 2, 0, 0],
[0, 2, 2, 2, 2, 0, 0, 0, 0, 2, 2, 2, 2, 0],
]
c = [
None,
X11Color("Red"),
X11Color("Black"),
X11Color("Tan"),
X11Color("Blue"),
X11Color("White"),
]
To manipulate the PDF, I am going to use pText.
First we are going to read an existing PDF:
# attempt to read PDF
doc = None
with open("boring-input.pdf", "rb") as in_file_handle:
print("\treading (1) ..")
doc = PDF.loads(in_file_handle)
Then we are going to add the annotations, using the array indices as references (and keeping in mind the coordinate system for PDF starts at the bottom left):
# add annotation
pixel_size = 2
for i in range(0, len(m)):
for j in range(0, len(m[i])):
if m[i][j] == 0:
continue
x = pixel_size * j
y = pixel_size * (len(m) - i)
doc.get_page(0).append_link_annotation(
page=Decimal(0),
color=c[m[i][j]],
location_on_page="Fit",
rectangle=(
Decimal(x),
Decimal(y),
Decimal(x + pixel_size),
Decimal(y + pixel_size),
),
)
Then we store the output PDF:
# attempt to store PDF
with open("its-a-me.pdf, "wb") as out_file_handle:
PDF.dumps(out_file_handle, doc)
This is a screenshot of Okular opening the PDF:

how to normalize python list with different length to a given length?

I have a python series. The series contains lists with different lengths.
0 [2, 0, 2, 0, 2, 1, 0, 0, 0, 1, 1, 2, 2, 0, 2, ...
1 [2, 2]
2 [2]
3 [1, 1, 0, 2, 2, 1, 0, 2, 2, 2, 0, 0, 0, 2, 0, ...
4 [1, 2, 0, 0, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2]
5 [2, 0, 1, 1]
6 [2, 2]
7 [0, 0, 2, 0, 2, 2]
8 [2, 0, 2, 0]
9 [2, 0, 2, 0, 2, 2, 2, 0, 2, 0, 2, 2, 2, 0, 2, ...
10 [1, 0]
11 [1, 2, 0, 0, 1, 2, 0, 2, 1, 1]
12 [1, 0, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 2, 2, 1, ...
13 [0, 1, 0, 0, 2, 0, 1, 2, 2, 2, 2, 0, 2, 1, 0, ...
14 [0, 0, 0, 2, 1, 0, 0, 2, 1, 2, 2, 2, 2, 0, 2, ...
15 [1, 1, 2, 0, 0, 0, 0, 2, 2]
What I want to do is to measure the Volatility of these lists. As far as I'm concerned, I need to do the normalization work(which means all these lists will share the same length) before measuring. I think parsing each list precentagewise plausibility is a good choice.Sadly, I don't know how to manage it.
Maybe the first step is to transform lists to a given length. Sec step is to calculate the new score in each percentile(something like max_pooling avg? I don't know^).How to extract items from lists by peicentile?

How to slice a list that contains two list for each?

I am trying to select the first list in each row and return them as list
using the slicing methods:
l = [[[0, 1, 1, 0, 0, 0, 2, 1, 2, 0, 1, 2], [0.0]],
[[1, 1, 0, 2, 0, 0, 0, 1, 2, 1, 2, 2], [0.0]],
[[0, 2, 2, 1, 2, 1, 0, 2, 2, 2, 1, 0], [0.0]],
[[1, 1, 0, 1, 2, 2, 2, 2, 0, 0, 2, 2], [0.0]],
[[1, 2, 2, 0, 0, 2, 1, 1, 1, 2, 1, 0], [0.0]],
[[0, 2, 1, 2, 0, 1, 2, 1, 1, 2, 2, 2], [0.0]],
[[0, 0, 2, 0, 0, 1, 1, 1, 1, 0, 0, 2], [0.0]],
[[0, 2, 1, 2, 2, 0, 0, 0, 0, 0, 1, 2], [0.0]],
[[1, 1, 1, 2, 1, 2, 2, 1, 0, 2, 0, 2], [0.0]],
[[0, 1, 1, 0, 0, 0, 1, 2, 1, 0, 1, 0], [0.0]]]
# i want to get this
l1 = [[0, 1, 1, 0, 0, 0, 2, 1, 2, 0, 1, 2],
[1, 1, 0, 2, 0, 0, 0, 1, 2, 1, 2, 2],
[0, 2, 2, 1, 2, 1, 0, 2, 2, 2, 1, 0],
[1, 1, 0, 1, 2, 2, 2, 2, 0, 0, 2, 2],
[1, 2, 2, 0, 0, 2, 1, 1, 1, 2, 1, 0],
[0, 2, 1, 2, 0, 1, 2, 1, 1, 2, 2, 2],
[0, 0, 2, 0, 0, 1, 1, 1, 1, 0, 0, 2],
[0, 2, 1, 2, 2, 0, 0, 0, 0, 0, 1, 2],
[1, 1, 1, 2, 1, 2, 2, 1, 0, 2, 0, 2],
[0, 1, 1, 0, 0, 0, 1, 2, 1, 0, 1, 0]]
This should work for you:
list(map(lambda x: x[0], l))
Output:
[[0, 1, 1, 0, 0, 0, 2, 1, 2, 0, 1, 2],
[1, 1, 0, 2, 0, 0, 0, 1, 2, 1, 2, 2],
[0, 2, 2, 1, 2, 1, 0, 2, 2, 2, 1, 0],
[1, 1, 0, 1, 2, 2, 2, 2, 0, 0, 2, 2],
[1, 2, 2, 0, 0, 2, 1, 1, 1, 2, 1, 0],
[0, 2, 1, 2, 0, 1, 2, 1, 1, 2, 2, 2],
[0, 0, 2, 0, 0, 1, 1, 1, 1, 0, 0, 2],
[0, 2, 1, 2, 2, 0, 0, 0, 0, 0, 1, 2],
[1, 1, 1, 2, 1, 2, 2, 1, 0, 2, 0, 2],
[0, 1, 1, 0, 0, 0, 1, 2, 1, 0, 1, 0]]

Neighbourhood of Scipy Labels

I've got an array of objects labeled with scipy.ndimage.measurements.label called Labels. I've got other array Data containing stuff related to Labels. How can I make a third array Neighbourhoods which could serve to map the nearest label to x,y is L
Given Labels and Data, how can I use python/numpy/scipy to get Neighbourhoods?
Labels = array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 2, 2, 2, 0],
[0, 0, 0, 0, 0, 0, 2, 2, 2, 0],
[0, 0, 0, 0, 0, 0, 2, 2, 2, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]] )
Data = array([[1, 1, 1, 1, 1, 1, 2, 3, 4, 5],
[1, 0, 0, 0, 0, 1, 2, 3, 4, 5],
[1, 0, 0, 0, 0, 1, 2, 3, 4, 4],
[1, 0, 0, 0, 0, 1, 2, 3, 3, 3],
[1, 0, 0, 0, 0, 1, 2, 2, 2, 2],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 1, 0, 0, 0, 1],
[3, 3, 3, 3, 2, 1, 0, 0, 0, 1],
[4, 4, 4, 3, 2, 1, 0, 0, 0, 1],
[5, 5, 4, 3, 2, 1, 1, 1, 1, 1]] )
Neighbourhoods = array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 0, 0, 0, 0, 1, 1, 1, 1, 1],
[1, 0, 0, 0, 0, 1, 1, 1, 0, 2],
[1, 0, 0, 0, 0, 1, 1, 0, 2, 2],
[1, 0, 0, 0, 0, 1, 0, 2, 2, 2],
[1, 1, 1, 1, 1, 0, 2, 2, 2, 2],
[1, 1, 1, 1, 0, 2, 0, 0, 0, 2],
[1, 1, 1, 0, 2, 2, 0, 0, 0, 2],
[1, 1, 0, 2, 2, 2, 0, 0, 0, 2],
[1, 1, 2, 2, 2, 2, 2, 2, 2, 2]] )
Note: I'm not sure what should happen with ties, so used zeros in the above Neighbourhoods
As suggested by David Zaslavsky, this is the job for a voroni diagram. Here is a numpy implementation: http://blancosilva.wordpress.com/2010/12/15/image-processing-with-numpy-scipy-and-matplotlibs-in-sage/
The relevant function is scipy.ndimage.distance_transform_edt. It has a return_indices option that can be exploited to do what you need (as well as calculate the raw distances (data in your example)).
As an example:
import numpy as np
from scipy.ndimage import distance_transform_edt
labels = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 2, 2, 2, 0],
[0, 0, 0, 0, 0, 0, 2, 2, 2, 0],
[0, 0, 0, 0, 0, 0, 2, 2, 2, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]] )
i, j = distance_transform_edt(labels == 0, return_distances=False,
return_indices=True)
neighborhoods = labels[i,j]
print neighborhoods
This yields:
array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 2],
[1, 1, 1, 1, 1, 1, 1, 1, 2, 2],
[1, 1, 1, 1, 1, 1, 1, 2, 2, 2],
[1, 1, 1, 1, 1, 1, 2, 2, 2, 2],
[1, 1, 1, 1, 1, 2, 2, 2, 2, 2],
[1, 1, 1, 1, 2, 2, 2, 2, 2, 2],
[1, 1, 1, 2, 2, 2, 2, 2, 2, 2],
[1, 1, 2, 2, 2, 2, 2, 2, 2, 2]])

Categories

Resources