Fetch elements from a dictionary faster

Fetch elements from a dictionary faster - python

I have a dictionary containing a mapping from color code to class index like the following:
color_to_class_idx = {(0, 0, 0) : 0, (180, 120, 120): 1, (80, 50, 50): 2, (140, 140, 140): 3, (4, 250, 7): 4, (150, 6, 51): 5, (0, 102, 200): 6, (233, 255, 7): 7, (255, 31, 0): 8, (120, 120, 120): 9}
Now, I have a list of color code values like the following:
list_ = [(0, 0, 0) , (80, 50, 50), (255, 255, 255)]
I would like to get another list with the class_idx. Note that, in the list_ there can be color codes for which are not present in the keys of color_to_class_idx. For these cases, it should have a default value (e.g. 0). So, the final output would look like - [0, 2, 0]
The list_ could be dimension 345600. So, speed matters for me. Following is the my implementation:
values = np.array([color_to_cls_idx.get(key, 0) for key in segmented_img_list])
But it is slow.
TIA

You can do a little bit better (by ~50%?) by converting most of your data structures to Numpy arrays. Here is an array of dictionary values, addressed by the color coordinates. Note that all elements that are not explicitly updated are 0:
lookup = np.zeros((256, 256, 256), dtype=int)
for i in color_to_class_idx:
lookup[i] = color_to_class_idx[i]
Here is the result array:
result = np.zeros(len(segmented_img_list))
And here is the lookup loop:
for i,key in enumerate(segmented_img_list):
result[i] = lookup[key]

Related

Clipping a datatime series along the y-axis

I have a list of tuples, where each tuple is a datetime and float. I wish to clip the float values so that they are all above a threshold value. For example if I have:
a = [
(datetime.datetime(2021, 11, 1, 0, 0, tzinfo=tzutc()), 100),
(datetime.datetime(2021, 11, 1, 1, 0, tzinfo=tzutc()), 9.0),
(datetime.datetime(2021, 11, 1, 2, 0, tzinfo=tzutc()), 100.0)
]
and if I want to clip at 10.0, this would give me:
b = [
(datetime.datetime(2021, 11, 1, 0, 0, tzinfo=tzutc()), 100),
(datetime.datetime(2021, 11, 1, 0, ?, tzinfo=tzutc()), 10.0),
(datetime.datetime(2021, 11, 1, 1, ?, tzinfo=tzutc()), 10.0),
(datetime.datetime(2021, 11, 1, 2, 0, tzinfo=tzutc()), 100.0)
]
So if I were to plot the a data (before clipping), I would get a V shaped graph. However, if I clip the data at 10.0 to give me the b data, and plot, I will have a \_/ shaped graph instead. There is a bit of math involved in calculating the new times so I'm hoping there is already functionality available to do this kind of thing. The datetimes are sorted in order and are unique. I can fix the data so the difference between consecutive times is equal, should that be necessary.

Apologies for not putting a full answer yesterday, my SO account is still rate-limited.
I have made a bit more complex custom dataset to showcase several values in a row being below threshold.
import pandas as pd
from datetime import datetime
from matplotlib import pyplot as plt
from scipy.interpolate import InterpolatedUnivariateSpline
df = pd.DataFrame([
(datetime(2021, 10, 31, 23, 0), 0),
(datetime(2021, 11, 1, 0, 0), 80),
(datetime(2021, 11, 1, 1, 0), 100),
(datetime(2021, 11, 1, 2, 0), 6),
(datetime(2021, 11, 1, 3, 0), 105),
(datetime(2021, 11, 1, 4, 0), 70),
(datetime(2021, 11, 1, 5, 0), 200),
(datetime(2021, 11, 1, 6, 0), 0),
(datetime(2021, 11, 1, 7, 0), 7),
(datetime(2021, 11, 1, 8, 0), 0),
(datetime(2021, 11, 1, 9, 0), 20),
(datetime(2021, 11, 1, 10, 0), 100),
(datetime(2021, 11, 1, 11, 0), 0)
], columns=['time', 'whatever'])
THRESHOLD = 10
The first thing to do here is to express index in terms of timedelta so that it behaves as any usual number we can then do all kinds of calculations with. For convenience, I am also expressing it as Series - an even better approach would be to create it as such from the get go, save the initial timestamp and reindex.
start_time = df['time'][0]
df.set_index((df['time'] - start_time).dt.total_seconds(), inplace=True)
series = df['whatever']
Then, I've tried InterpolatedUnivariateSpline from scipy:
roots = InterpolatedUnivariateSpline(df.index, series.values - THRESHOLD).roots()
threshold_crossings = pd.Series([THRESHOLD] * len(roots), index=roots)
new_series = pd.concat([series[series > THRESHOLD], threshold_crossings]).sort_index()
Let's test it out:
fig, ax = plt.subplots(figsize=(12, 8))
ax.plot(series)
ax.plot(df.index, [THRESHOLD] * len(df.index), 'k-.', label='threshold')
ax.plot(new_series)
ax.set_xlabel('$t-t_0$, s')
axins = ax.inset_axes([0.6, 0.6, 0.35, 0.3])
axins.plot(series)
axins.plot(df.index, [THRESHOLD] * len(df.index), 'k-.')
axins.plot(new_series)
axins.set_ylim(0, 20)
ax.indicate_inset_zoom(axins, edgecolor="black")
ax.set_ylabel('whatever, a.u.')
ax.legend(loc='upper left')
ax.set_title('Roots from InterpolatedUnivariateSpline')
Not so great. Spline roots interpolation is quite a bit off (after all, it uses a cubic B-spline under the hood and can't find roots if setting order to 1). Ah well. For monotonic functions, we could just inverse the interpolation, but this is not the case here. I hope someone finds a better way to do it, but my next step was rolling out a custom function:
def my_interp(series: pd.Series, thr: float) -> pd.Series:
needs_interp = series > thr
# XOR means we are only considering transition points
needs_interp = (needs_interp ^ needs_interp.shift(-1)).fillna(False)
# The last point will never be interpolated
x = series.index.to_series()
k = series.diff(periods=-1) / x.diff(periods=-1)
b = series - k * x
x_fill = ((thr - b) / k)[needs_interp]
fill_series = pd.Series(data=[thr] * x_fill.size, index=x_fill.values)
# NB! needs_interp is a wrong mask to use for series here
return pd.concat([series[series > thr], fill_series]).sort_index()
new= my_interp(series, THRESHOLD)
It achieves what you want to do with good precision:
To get back to timestamp representation, one would simply do
new_series.index = (start_time + pd.to_timedelta(new_series.index, unit='s'))
With that said, there are a couple caveats:
The function above assumes the timestamps are sorted (can be achieved
by sort_index), and no duplicates are present in the series
Edge conditions are nasty as usual. I have tested the function a little bit, the logic seems sound and it does not break if either side of the series is above/below the threshold, and it handles irregular data just fine, but still - watch out for NaNs in your data and consider how you should handle all the edge conditions, sorting etc.
There is no logic dedicated to handling data points exactly at threshold or ensuring there is any regularity in new timestamps. This could lead to bugs, too: e.g. if some portion of your code relies on having at least 2 data points every day, it might not hold after the transformation.

Best suitable approach to find nearest neighbour to (x, y, z) from list of triplets

I am trying to obtain a triplet from list of triplets that is closest to my required triplet incase if it was not found.
For example:
# V_s,V_g,V_r
triplets = [(500, 12, 5),
(400, 15, 2.5),
(400, 15, 3),
(450, 12, 3),
... ,
(350, 14, 3)]
The triple that I am looking for is
req_triplet = (450, 15, 2) #(Vreq_s, Vreq_g, Vreq_r)
How can I achieve this in python, a best suitable strategy to achieve is what I am in need for.
As of now I am thinking to filter the list by finding nearest parameter V_s. From the resulting list filter further by finding nearest to V_g and finally by V_r.

You can compute Euclidean distance with numPy or you can use
numpy.linalg.norm.
Try this:
>>> import numpy as np
>>> def dist(x,y):
... return np.sqrt(np.sum((x-y)**2))
>>> triplets = [(500, 12, 5), (400, 15, 2.5), (400, 15, 3),(450, 12, 3)(350, 14, 3)]
>>> req_triplet = (450, 15, 2)
>>> arr_dst = [np.linalg.norm(np.array(tr) - np.array(req_triplet)) for tr in triplets]
>>> arr_dst = [dist(np.array(tr), np.array(req_triplet)) for tr in triplets]
>>> arr_dst
[50.17967716117751, 50.002499937503124, 50.00999900019995, 3.1622776601683795, 100.00999950005]
>>> idx = np.argmin(arr_dst)
>>> idx
3
>>> triplets[idx]
(450, 12, 3)

You have to define a metric ||.||, then the triplet T that is close to a fixed one F is the one that minimize ||T - F||. You can use a classic Euclidean distance:
import numpy as np
def dist(u, v):
return np.sqrt(np.sum((np.array(u)-np.array(v))**2))

The general strategy would be to Loop through the list, for each element calculate the distance and check if it is the minimum, otherwise keep going on.
In python this would look something like this-
from math import abs
triplets = [(500, 12, 5),
(400, 15, 2.5),
(400, 15, 3),
(450, 12, 3),
... ,
(350, 14, 3)]
req_triplet = (450, 15, 2)
def calc_dist(a,b):
return sum((abs(a[i]-b[i]) for i in range(3))
def find_closest_triple(req_triplet,triplets):
min_ind = None
min_dist = -1
for i,triplet in enumerate(triplets):
if e == req_triplet:
return i
dist = calc_dist(req_triplet,triplet)
if dist < min_dist:
min_ind = i
return min_ind

Wrong difference returned between two pixelaccess objects

So I have a function which takes as parameters two PixelAccess objects which essentially are two images which are converted to a multi-dimensional array of pixels of the type (image1pixels[x,y]) , then it subtracts each tuple of pixels on the width and height of both the images and then appends the subtractions of pixels inside an array named c ; The function then returns the sum of all the tuples in the array.
Here is the function:
def difference(pix1, pix2):
size = width, height = img.size;
result = 0;
array = [];
for x in range(width):
for y in range(height):
c = tuple(map(sub, pix2[x, y], pix1[x, y]));
array.append(c);
result = abs(add(map(sum, array)));
return result;
Here to have an idea, when I print c, this is what is printed:
(0, 0, 0)
(0, 0, 0)
(0, 0, 0)
(-253, -253, -253)
(-210, -210, -210)
(-168, -168, -168)
(-147, -147, -147)
(-48, -48, -48)
(-13, -13, -13)
(-29, -29, -29)
(-48, -48, -48)
(-48, -48, -48)
(0, 0, 0)
(0, 0, 0)
(0, 0, 0)
I have to compare two images using this function, the expected difference should be 17988 but my function returns 9174.
I just want to know if my logic is wrong or if I'm coding the wrong way here knowing python is not my primary everyday language.
Thanks in advance.

Building a set of tile coordinates

I have an image which I want to divide into tiles of specific size (and cropping tiles that don't fit).
The output of this operation should be a list of coordinates in tuples [(x, y, width, height),...]. For example, dividing a 50x50 image in tiles of size 20 would give: [(0,0,20,20),(20,0,20,20),(40,0,10,20),(0,20,20,20),(20,20,20,20),(40,20,10,20),...] etc.
Given a height, width and tile_size, it seems like I should be able to do this in a single list comprehension, but I can't wrap my head around it. Any help would be appreciated. Thanks!

Got it with:
output = [(x,y,w,h) for x,w in zip(range(width)[::tile_size],[tile_size]*(w_tiles-1) + [w_padding]) for y,h in zip(range(height)[::tile_size],[tile_size]*(h_tiles-1) + [h_padding])]

import itertools
def tiles(h, w, ts):
# here is the one list comprehension for list of tuples
return [tuple(list(ele) + [ts if w-ele[0] > 20 else w-ele[0], ts if h-ele[1] > 20 else h-ele[1]]) for ele in itertools.product(*[filter(lambda x: x % ts == 0, range(w)), filter(lambda x: x % ts == 0, range(h))])]
print tiles(50, 50, 20)
[(0, 0, 20, 20), (0, 20, 20, 20), (0, 40, 20, 10), (20, 0, 20, 20), (20, 20, 20, 20), (20, 40, 20, 1
0), (40, 0, 10, 20), (40, 20, 10, 20), (40, 40, 10, 10)]

Create new numpy array-scalar of flexible dtype

I have a working solution to my problem, but when trying different things I was astounded there wasn't a better solution that I could find. It all boils down to creating a single flexible dtype value for comparing and inserting into an array.
I have an RGB 24-bit image (so 8-bits for each R, G, and B) image array. It turns out for some actions it is best to use it as a 3D array with HxWx3 other times it is best to use it as a structured array with the dtype([('R',uint8),('G',uint8),('B',uint8)]). One example is when trying to relabel the image colors so that every unique color is given a different value. I do this with the following code:
# Given im as an array of HxWx3, dtype=uint8
from numpy import dtype, uint8, unique, insert, searchsorted
rgb_dtype = dtype([('R',uint8),('G',uint8),('B',uint8)]))
im = im.view(dtype=rgb_dtype).squeeze() # need squeeze to remove the third dim
values = unique(im)
if tuple(values[0]) != (0, 0, 0):
values = insert(values, 0, 0) # value 0 needs to always be (0, 0, 0)
labels = searchsorted(values, im)
This works beautifully, however I am tried to make the if statement look nicer and just couldn't find a way. So lets look at the comparison first:
>>> values[0]
(0, 0, 0)
>>> values[0] == 0
False
>>> values[0] == (0, 0, 0)
False
>>> values[0] == array([0, 0, 0])
False
>>> values[0] == array([uint8(0), uint8(0), uint8(0)]).view(dtype=rgb_dtype)[0]
True
>>> values[0] == zeros((), dtype=rgb_dtype)
True
But what if you wanted something besides (0, 0, 0) or (1, 1, 1) and something that was not ridiculous looking? It seems like there should be an easier way to construct this, like rgb_dtype.create((0,0,0)).
Next with the insert statement, you need to insert 0 for (0, 0, 0). For other values this really does not work, for example inserting (1, 2, 3) actually inserts (1, 1, 1), (2, 2, 2), (3, 3, 3).
So in the end, is there a nicer way? Thanks!

I could make insert() work for your case doing (note that instead of 0 it is used [0]):
values = insert(values, [0], (1,2,3))
giving (for example):
array([(0, 1, 3), (0, 0, 0), (0, 0, 4), ..., (255, 255, 251), (255, 255, 253), (255, 255, 255)],
dtype=[('R', 'u1'), ('G', 'u1'), ('B', 'u1')])
Regarding another way to do your if, you can do this:
str(values[0]) == str((0,0,0))
or, perhaps more robust:
eval(str(values[0])) == eval(str(0,0,0))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Fetch elements from a dictionary faster - python

Related

Clipping a datatime series along the y-axis

Best suitable approach to find nearest neighbour to (x, y, z) from list of triplets

Wrong difference returned between two pixelaccess objects

Building a set of tile coordinates

Create new numpy array-scalar of flexible dtype

Categories

Resources