Related
I have several large files in a folder. Each single file fits in my RAM, but all of them not. I have the following loop processing each file:
for dataset_index,path in enumerate(file_paths_train):
with np.load(path) as dataset:
x_batch = dataset['data']
y_batch = dataset['labels']
for i in range(x_batch.shape[0]):
if y_batch[i] in INDICES:
# This selects a minimal subset of the data
data_list.append((y_batch[i], x_batch[i]))
# End loop
(the paths for all files are stored in the variable file_paths_train)
This answer stated that using with ... as ... would automatically delete the variable associated with the file once the program is out of the scope. Except it isn't, memory usage increases until the computer stops working and I need to restart.
Ideas?
Indexing a multidimensional array with a scalar creates a view. If that view is saved in a list, the original array remains, regardless of what happens to its variable references.
In [95]: alist = []
...: for i in range(3):
...: x = np.ones((10,10),int)*i
...: alist.append(x[0])
...:
In [96]: alist
Out[96]:
[array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]),
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1]),
array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2])]
In [97]: [item.base for item in alist]
Out[97]:
[array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]),
array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]),
array([[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2]])]
You have to append a copy if you want to truely 'throw-away' the original array.
In [98]: alist = []
...: for i in range(3):
...: x = np.ones((10,10),int)*i
...: alist.append(x[0].copy())
...:
...:
In [99]: alist
Out[99]:
[array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]),
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1]),
array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2])]
In [100]: [item.base for item in alist]
Out[100]: [None, None, None]
How can I add annotations (in a particular shape) to a PDF?
I want to be able to control:
the link target
the color
the shape of the link annotation
the location of the link annotation
Disclaimer: I am the author of the library being used in this answer
To showcase this behaviour, this example is going to re-create a shape using "pixel-art".
This array, together with these colors define the shape of super-mario:
m = [
[0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0],
[0, 0, 0, 2, 2, 2, 3, 3, 2, 3, 0, 0, 0, 0],
[0, 0, 2, 3, 2, 3, 3, 3, 2, 3, 3, 3, 0, 0],
[0, 0, 2, 3, 2, 2, 3, 3, 3, 2, 3, 3, 3, 0],
[0, 0, 2, 2, 3, 3, 3, 3, 2, 2, 2, 2, 0, 0],
[0, 0, 0, 0, 3, 3, 3, 3, 3, 3, 3, 0, 0, 0],
[0, 0, 0, 1, 1, 4, 1, 1, 1, 1, 1, 0, 0, 0],
[0, 0, 1, 1, 1, 4, 1, 1, 4, 1, 1, 1, 0, 0],
[0, 1, 1, 1, 1, 4, 4, 4, 4, 1, 1, 1, 1, 0],
[0, 3, 3, 1, 4, 5, 4, 4, 5, 4, 1, 3, 3, 0],
[0, 3, 3, 3, 4, 4, 4, 4, 4, 4, 3, 3, 3, 0],
[0, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 3, 3, 0],
[0, 0, 0, 4, 4, 4, 0, 0, 4, 4, 4, 0, 0, 0],
[0, 0, 2, 2, 2, 0, 0, 0, 0, 2, 2, 2, 0, 0],
[0, 2, 2, 2, 2, 0, 0, 0, 0, 2, 2, 2, 2, 0],
]
c = [
None,
X11Color("Red"),
X11Color("Black"),
X11Color("Tan"),
X11Color("Blue"),
X11Color("White"),
]
To manipulate the PDF, I am going to use pText.
First we are going to read an existing PDF:
# attempt to read PDF
doc = None
with open("boring-input.pdf", "rb") as in_file_handle:
print("\treading (1) ..")
doc = PDF.loads(in_file_handle)
Then we are going to add the annotations, using the array indices as references (and keeping in mind the coordinate system for PDF starts at the bottom left):
# add annotation
pixel_size = 2
for i in range(0, len(m)):
for j in range(0, len(m[i])):
if m[i][j] == 0:
continue
x = pixel_size * j
y = pixel_size * (len(m) - i)
doc.get_page(0).append_link_annotation(
page=Decimal(0),
color=c[m[i][j]],
location_on_page="Fit",
rectangle=(
Decimal(x),
Decimal(y),
Decimal(x + pixel_size),
Decimal(y + pixel_size),
),
)
Then we store the output PDF:
# attempt to store PDF
with open("its-a-me.pdf, "wb") as out_file_handle:
PDF.dumps(out_file_handle, doc)
This is a screenshot of Okular opening the PDF:
I have a python series. The series contains lists with different lengths.
0 [2, 0, 2, 0, 2, 1, 0, 0, 0, 1, 1, 2, 2, 0, 2, ...
1 [2, 2]
2 [2]
3 [1, 1, 0, 2, 2, 1, 0, 2, 2, 2, 0, 0, 0, 2, 0, ...
4 [1, 2, 0, 0, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2]
5 [2, 0, 1, 1]
6 [2, 2]
7 [0, 0, 2, 0, 2, 2]
8 [2, 0, 2, 0]
9 [2, 0, 2, 0, 2, 2, 2, 0, 2, 0, 2, 2, 2, 0, 2, ...
10 [1, 0]
11 [1, 2, 0, 0, 1, 2, 0, 2, 1, 1]
12 [1, 0, 2, 2, 2, 2, 2, 2, 2, 2, 0, 0, 2, 2, 1, ...
13 [0, 1, 0, 0, 2, 0, 1, 2, 2, 2, 2, 0, 2, 1, 0, ...
14 [0, 0, 0, 2, 1, 0, 0, 2, 1, 2, 2, 2, 2, 0, 2, ...
15 [1, 1, 2, 0, 0, 0, 0, 2, 2]
What I want to do is to measure the Volatility of these lists. As far as I'm concerned, I need to do the normalization work(which means all these lists will share the same length) before measuring. I think parsing each list precentagewise plausibility is a good choice.Sadly, I don't know how to manage it.
Maybe the first step is to transform lists to a given length. Sec step is to calculate the new score in each percentile(something like max_pooling avg? I don't know^).How to extract items from lists by peicentile?
I am trying to select the first list in each row and return them as list
using the slicing methods:
l = [[[0, 1, 1, 0, 0, 0, 2, 1, 2, 0, 1, 2], [0.0]],
[[1, 1, 0, 2, 0, 0, 0, 1, 2, 1, 2, 2], [0.0]],
[[0, 2, 2, 1, 2, 1, 0, 2, 2, 2, 1, 0], [0.0]],
[[1, 1, 0, 1, 2, 2, 2, 2, 0, 0, 2, 2], [0.0]],
[[1, 2, 2, 0, 0, 2, 1, 1, 1, 2, 1, 0], [0.0]],
[[0, 2, 1, 2, 0, 1, 2, 1, 1, 2, 2, 2], [0.0]],
[[0, 0, 2, 0, 0, 1, 1, 1, 1, 0, 0, 2], [0.0]],
[[0, 2, 1, 2, 2, 0, 0, 0, 0, 0, 1, 2], [0.0]],
[[1, 1, 1, 2, 1, 2, 2, 1, 0, 2, 0, 2], [0.0]],
[[0, 1, 1, 0, 0, 0, 1, 2, 1, 0, 1, 0], [0.0]]]
# i want to get this
l1 = [[0, 1, 1, 0, 0, 0, 2, 1, 2, 0, 1, 2],
[1, 1, 0, 2, 0, 0, 0, 1, 2, 1, 2, 2],
[0, 2, 2, 1, 2, 1, 0, 2, 2, 2, 1, 0],
[1, 1, 0, 1, 2, 2, 2, 2, 0, 0, 2, 2],
[1, 2, 2, 0, 0, 2, 1, 1, 1, 2, 1, 0],
[0, 2, 1, 2, 0, 1, 2, 1, 1, 2, 2, 2],
[0, 0, 2, 0, 0, 1, 1, 1, 1, 0, 0, 2],
[0, 2, 1, 2, 2, 0, 0, 0, 0, 0, 1, 2],
[1, 1, 1, 2, 1, 2, 2, 1, 0, 2, 0, 2],
[0, 1, 1, 0, 0, 0, 1, 2, 1, 0, 1, 0]]
This should work for you:
list(map(lambda x: x[0], l))
Output:
[[0, 1, 1, 0, 0, 0, 2, 1, 2, 0, 1, 2],
[1, 1, 0, 2, 0, 0, 0, 1, 2, 1, 2, 2],
[0, 2, 2, 1, 2, 1, 0, 2, 2, 2, 1, 0],
[1, 1, 0, 1, 2, 2, 2, 2, 0, 0, 2, 2],
[1, 2, 2, 0, 0, 2, 1, 1, 1, 2, 1, 0],
[0, 2, 1, 2, 0, 1, 2, 1, 1, 2, 2, 2],
[0, 0, 2, 0, 0, 1, 1, 1, 1, 0, 0, 2],
[0, 2, 1, 2, 2, 0, 0, 0, 0, 0, 1, 2],
[1, 1, 1, 2, 1, 2, 2, 1, 0, 2, 0, 2],
[0, 1, 1, 0, 0, 0, 1, 2, 1, 0, 1, 0]]
I've got an array of objects labeled with scipy.ndimage.measurements.label called Labels. I've got other array Data containing stuff related to Labels. How can I make a third array Neighbourhoods which could serve to map the nearest label to x,y is L
Given Labels and Data, how can I use python/numpy/scipy to get Neighbourhoods?
Labels = array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 2, 2, 2, 0],
[0, 0, 0, 0, 0, 0, 2, 2, 2, 0],
[0, 0, 0, 0, 0, 0, 2, 2, 2, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]] )
Data = array([[1, 1, 1, 1, 1, 1, 2, 3, 4, 5],
[1, 0, 0, 0, 0, 1, 2, 3, 4, 5],
[1, 0, 0, 0, 0, 1, 2, 3, 4, 4],
[1, 0, 0, 0, 0, 1, 2, 3, 3, 3],
[1, 0, 0, 0, 0, 1, 2, 2, 2, 2],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 1, 0, 0, 0, 1],
[3, 3, 3, 3, 2, 1, 0, 0, 0, 1],
[4, 4, 4, 3, 2, 1, 0, 0, 0, 1],
[5, 5, 4, 3, 2, 1, 1, 1, 1, 1]] )
Neighbourhoods = array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 0, 0, 0, 0, 1, 1, 1, 1, 1],
[1, 0, 0, 0, 0, 1, 1, 1, 0, 2],
[1, 0, 0, 0, 0, 1, 1, 0, 2, 2],
[1, 0, 0, 0, 0, 1, 0, 2, 2, 2],
[1, 1, 1, 1, 1, 0, 2, 2, 2, 2],
[1, 1, 1, 1, 0, 2, 0, 0, 0, 2],
[1, 1, 1, 0, 2, 2, 0, 0, 0, 2],
[1, 1, 0, 2, 2, 2, 0, 0, 0, 2],
[1, 1, 2, 2, 2, 2, 2, 2, 2, 2]] )
Note: I'm not sure what should happen with ties, so used zeros in the above Neighbourhoods
As suggested by David Zaslavsky, this is the job for a voroni diagram. Here is a numpy implementation: http://blancosilva.wordpress.com/2010/12/15/image-processing-with-numpy-scipy-and-matplotlibs-in-sage/
The relevant function is scipy.ndimage.distance_transform_edt. It has a return_indices option that can be exploited to do what you need (as well as calculate the raw distances (data in your example)).
As an example:
import numpy as np
from scipy.ndimage import distance_transform_edt
labels = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 2, 2, 2, 0],
[0, 0, 0, 0, 0, 0, 2, 2, 2, 0],
[0, 0, 0, 0, 0, 0, 2, 2, 2, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]] )
i, j = distance_transform_edt(labels == 0, return_distances=False,
return_indices=True)
neighborhoods = labels[i,j]
print neighborhoods
This yields:
array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 2],
[1, 1, 1, 1, 1, 1, 1, 1, 2, 2],
[1, 1, 1, 1, 1, 1, 1, 2, 2, 2],
[1, 1, 1, 1, 1, 1, 2, 2, 2, 2],
[1, 1, 1, 1, 1, 2, 2, 2, 2, 2],
[1, 1, 1, 1, 2, 2, 2, 2, 2, 2],
[1, 1, 1, 2, 2, 2, 2, 2, 2, 2],
[1, 1, 2, 2, 2, 2, 2, 2, 2, 2]])