Segmentation Algorithm

Segmentation Algorithm - python

I had created a segmentation algorithm that is able to detect levels above a certain threshold and alternates between 2 categories (red and green) as shown in Figure 1 below. My current algorithm always take reference from the left and detects the first segment as red, followed by green and so on.
However, I would like the algorithm to take reference starting from the center of the longest 0s gap (10-01-06 to 10-01-14) outwards, where the first segment detected on the left will always be green and the first segment detected on the right will always be red. The longest gap can be anywhere and may not always be at the center of the dataset.
I would like the algorithm to return individual list for red and green, with their respective starting and ending point indexes.
I have attached the codes below to replicate the plot.
Figure1:
# Creating Dummy Dataset
df = pd.DataFrame(np.random.uniform(50,100,size=(100, 1)))
df2 = pd.DataFrame(np.random.uniform(50,100,size=(100, 1)))
zeros1 = pd.DataFrame(np.zeros(80))
zeros2 = pd.DataFrame(np.zeros(50))
zeros3 = pd.DataFrame(np.zeros(500))
zeros4 = pd.DataFrame(np.zeros(80))
zeros5 = pd.DataFrame(np.zeros(50))
df3 = pd.DataFrame(np.random.uniform(50,100,size=(100, 1)))
df4 = pd.DataFrame(np.random.uniform(50,100,size=(100, 1)))
df5=pd.concat([zeros1, df, zeros2, df2, zeros3, df3, zeros4, df4, zeros5 ], ignore_index=True)
times = pd.date_range('2012-10-01', periods=len(df5), freq='1min')
df6 = pd.concat([pd.DataFrame(times), df5], axis = 1, ignore_index=True)
segment = []
for i in range(0,len(df6)):
if df6.iloc[i,1]> 50:
segment.append(99)
else:
segment.append(0)
# Segmentation Algo
state_v,state_p = (0,0) # cycling through states (0,0), (99,0), (0,1), (99,1)
segments = ([],[])
for i,v in enumerate(segment):
if state_v == 0:
if v == 99:
start = i
state_v = 99
elif state_v == 99:
if v == 0:
end = i
segments[state_p].append((start, end))
state_v = 0
state_p = 1 - state_p
if state_v == 99:
end = len(segment)
segments[state_p].append((start, end))
# Plot
door_open, door_close = segments
plt.plot(df6[0], df6[1])
for o in door_open:
plt.axvline(df6[0][o[0]], linewidth=1, color='r', linestyle= '-')
plt.axvline(df6[0][o[1]], linewidth=1, color='r', linestyle= '-')
for c in door_close:
plt.axvline(df6[0][c[0]], linewidth=1, color='g', linestyle= '-')
plt.axvline(df6[0][c[1]], linewidth=1, color='g', linestyle= '-')
plt.xticks(rotation='vertical')
plt.show()

Related

Reorder Sankey diagram vertically based on label value

I'm trying to plot patient flows between 3 clusters in a Sankey diagram. I have a pd.DataFrame counts with from-to values, see below. To reproduce this DF, here is the counts dict that should be loaded into a pd.DataFrame (which is the input for the visualize_cluster_flow_counts function).
from to value
0 C1_1 C1_2 867
1 C1_1 C2_2 405
2 C1_1 C0_2 2
3 C2_1 C1_2 46
4 C2_1 C2_2 458
... ... ... ...
175 C0_20 C0_21 130
176 C0_20 C2_21 1
177 C2_20 C1_21 12
178 C2_20 C0_21 0
179 C2_20 C2_21 96
The from and to values in the DataFrame represent the cluster number (either 0, 1, or 2) and the amount of days for the x-axis (between 1 and 21). If I plot the Sankey diagram with these values, this is the result:
Code:
import plotly.graph_objects as go
def visualize_cluster_flow_counts(counts):
all_sources = list(set(counts['from'].values.tolist() + counts['to'].values.tolist()))
froms, tos, vals, labs = [], [], [], []
for index, row in counts.iterrows():
froms.append(all_sources.index(row.values[0]))
tos.append(all_sources.index(row.values[1]))
vals.append(row[2])
labs.append(row[3])
fig = go.Figure(data=[go.Sankey(
arrangement='snap',
node = dict(
pad = 15,
thickness = 5,
line = dict(color = "black", width = 0.1),
label = all_sources,
color = "blue"
),
link = dict(
source = froms,
target = tos,
value = vals,
label = labs
))])
fig.update_layout(title_text="Patient flow between clusters over time: 48h (2 days) - 504h (21 days)", font_size=10)
fig.show()
visualize_cluster_flow_counts(counts)
However, I would like to vertically order the bars so that the C0's are always on top, the C1's are always in the middle, and the C2's are always at the bottom (or the other way around, doesn't matter). I know that we can set node.x and node.y to manually assign the coordinates. So, I set the x-values to the amount of days * (1/range of days), which is an increment of +- 0.045. And I set the y-values based on the cluster value: either 0, 0.5 or 1. I then obtain the image below. The vertical order is good, but the vertical margins between the bars are obviously way off; they should be similar to the first result.
The code to produce this is:
import plotly.graph_objects as go
def find_node_coordinates(sources):
x_nodes, y_nodes = [], []
for s in sources:
# Shift each x with +- 0.045
x = float(s.split("_")[-1]) * (1/21)
x_nodes.append(x)
# Choose either 0, 0.5 or 1 for the y-value
cluster_number = s[1]
if cluster_number == "0": y = 1
elif cluster_number == "1": y = 0.5
else: y = 1e-09
y_nodes.append(y)
return x_nodes, y_nodes
def visualize_cluster_flow_counts(counts):
all_sources = list(set(counts['from'].values.tolist() + counts['to'].values.tolist()))
node_x, node_y = find_node_coordinates(all_sources)
froms, tos, vals, labs = [], [], [], []
for index, row in counts.iterrows():
froms.append(all_sources.index(row.values[0]))
tos.append(all_sources.index(row.values[1]))
vals.append(row[2])
labs.append(row[3])
fig = go.Figure(data=[go.Sankey(
arrangement='snap',
node = dict(
pad = 15,
thickness = 5,
line = dict(color = "black", width = 0.1),
label = all_sources,
color = "blue",
x = node_x,
y = node_y,
),
link = dict(
source = froms,
target = tos,
value = vals,
label = labs
))])
fig.update_layout(title_text="Patient flow between clusters over time: 48h (2 days) - 504h (21 days)", font_size=10)
fig.show()
visualize_cluster_flow_counts(counts)
Question: how do I fix the margins of the bars, so that the result looks like the first result? So, for clarity: the bars should be pushed to the bottom. Or is there another way that the Sankey diagram can vertically re-order the bars automatically based on the label value?

Firstly I don't think there is a way with the current exposed API to achieve your goal smoothly you can check the source code here.
Try to change your find_node_coordinates function as follows (note that you should pass the counts DataFrame to):
counts = pd.DataFrame(counts_dict)
def find_node_coordinates(sources, counts):
x_nodes, y_nodes = [], []
flat_on_top = False
range = 1 # The y range
total_margin_width = 0.15
y_range = 1 - total_margin_width
margin = total_margin_width / 2 # From number of Cs
srcs = counts['from'].values.tolist()
dsts = counts['to'].values.tolist()
values = counts['value'].values.tolist()
max_acc = 0
def _calc_day_flux(d=1):
_max_acc = 0
for i in [0,1,2]:
# The first ones
from_source = 'C{}_{}'.format(i,d)
indices = [i for i, val in enumerate(srcs) if val == from_source]
for j in indices:
_max_acc += values[j]
return _max_acc
def _calc_node_io_flux(node_str):
c,d = int(node_str.split('_')[0][-1]), int(node_str.split('_')[1])
_flux_src = 0
_flux_dst = 0
indices_src = [i for i, val in enumerate(srcs) if val == node_str]
indices_dst = [j for j, val in enumerate(dsts) if val == node_str]
for j in indices_src:
_flux_src += values[j]
for j in indices_dst:
_flux_dst += values[j]
return max(_flux_dst, _flux_src)
max_acc = _calc_day_flux()
graph_unit_per_val = y_range / max_acc
print("Graph Unit per Acc Val", graph_unit_per_val)
for s in sources:
# Shift each x with +- 0.045
d = int(s.split("_")[-1])
x = float(d) * (1/21)
x_nodes.append(x)
print(s, _calc_node_io_flux(s))
# Choose either 0, 0.5 or 1 for the y-v alue
cluster_number = s[1]
# Flat on Top
if flat_on_top:
if cluster_number == "0":
y = _calc_node_io_flux('C{}_{}'.format(2, d))*graph_unit_per_val + margin + _calc_node_io_flux('C{}_{}'.format(1, d))*graph_unit_per_val + margin + _calc_node_io_flux('C{}_{}'.format(0, d))*graph_unit_per_val/2
elif cluster_number == "1": y = _calc_node_io_flux('C{}_{}'.format(2, d))*graph_unit_per_val + margin + _calc_node_io_flux('C{}_{}'.format(1, d))*graph_unit_per_val/2
else: y = 1e-09
# Flat On Bottom
else:
if cluster_number == "0": y = 1 - (_calc_node_io_flux('C{}_{}'.format(0,d))*graph_unit_per_val / 2)
elif cluster_number == "1": y = 1 - (_calc_node_io_flux('C{}_{}'.format(0,d))*graph_unit_per_val + margin + _calc_node_io_flux('C{}_{}'.format(1,d)) * graph_unit_per_val /2 )
elif cluster_number == "2": y = 1 - (_calc_node_io_flux('C{}_{}'.format(0,d))*graph_unit_per_val + margin + _calc_node_io_flux('C{}_{}'.format(1,d)) * graph_unit_per_val + margin + _calc_node_io_flux('C{}_{}'.format(2,d)) * graph_unit_per_val /2 )
y_nodes.append(y)
return x_nodes, y_nodes
Sankey graphs supposed to weigh their connection width by their corresponding normalized values right? Here I do the same, first, it calculates each node flux, later by calculating the normalized coordinate the center of each node calculated according to their flux.
Here is the sample output of your code with the modified function, note that I tried to adhere to your code as much as possible so it's a bit unoptimized(for example, one could store the values of nodes above each specified source node to avoid its flux recalculation).
With flag flat_on_top = True
With flag flat_on_top = False
There is a bit of inconsistency in the flat_on_bottom version which I think is caused by the padding or other internal sources of Plotly API.

How to highlight the lowest line on a linegraph in matplotlib?

Currently, my code generates a line graph based on an array of x,y values generated from a function called f(), like so:
T = 0
for i in range(0,10):
#function f generates array of values based on T to plot x,y
x,y = f(T)
plt.plot(x, y, label = "T={}".format(T))
T += 1
This generates a graph like so:
Is there a streamlined way to make all of the lines a grey, highlighting the line with the lowest endpoint with red and highest endpoint with green, on the x-axis, regardless of what y is?
So for this example, where T=5 the line would be red and where T=3 the line would be green, and for the other lines all the same shade of grey.

Simply store all your x and y values in two lists :
X = [x0,..., x9] # List of lists.
Y = [y0,..., y9] # Same. x0, y0 = f(0)
Then find the highest and lowest value :
highest_endpoint, highest_endpoint_indice = Y[0][-1], 0 # Initialisation.
lowest_endpoint, lowest_endpoint_indice = Y[0][-1], 0 # Initialisation.
for i, y in enumerate(Y[1:]) : # No need to check for Y[0] = y0 thanks to the initialisations.
if y[-1] > highest_endpoint : # If current endpoint is superior to temporary highest endpoint.
highest_endpoint, highest_endpoint_indice = y[-1], i+1
elif y[-1] < lowest_endpoint :
lowest_endpoint, lowest_endpoint_indice = y[-1], i+1
# Plot the curves.
for T in range(10) :
if T == highest_endpoint_indice :
plt.plot(X[T], Y[T], label = "T={}".format(T), color = 'green')
elif T == lowest_endpoint_indice :
plt.plot(X[T], Y[T], label = "T={}".format(T), color = 'red')
else :
plt.plot(X[T], Y[T], label = "T={}".format(T), color = 'gray')
plt.show()

spacing nodes at networkx/plotly network and labeling

I created a network using networkx and plotly as following:
edges = df.stack().reset_index()
edges.columns = ['var_1','var_2','correlation']
edges = edges.loc[ (edges['correlation'] < -0.6) | (edges['correlation'] > 0.6) & (edges['var_1'] != edges['var_2']) ].copy()
#create undirected graph with weights corresponding to the correlation magnitude
G0 = nx.from_pandas_edgelist(edges, 'var_1', 'var_2', edge_attr=['correlation'])
mst = G0
# assign colours to edges depending on positive or negative correlation
# assign edge thickness depending on magnitude of correlation
edge_colours = []
edge_width = []
for key, value in nx.get_edge_attributes(mst, 'correlation').items():
edge_colours.append(assign_colour(value))
edge_width.append(assign_thickness(value))
node_size = []
degrees = [val for key, val in dict(G0.degree).items()]
max_deg = max(degrees)
min_deg = min(degrees)
for value in degrees:
node_size.append(assign_node_size(value,min_deg,max_deg))
#draw the network:
nx.draw(mst, pos=nx.fruchterman_reingold_layout(mst),
node_size=15, edge_color=edge_colours, node_colour="black",
edge_width=0.2)
plt.show()
def get_coordinates(G=mst):
"""Returns the positions of nodes and edges in a format for Plotly to draw the network"""
# get list of node positions
pos = nx.fruchterman_reingold_layout(mst)
Xnodes = [pos[n][0] for n in mst.nodes()]
Ynodes = [pos[n][1] for n in mst.nodes()]
Xedges_red = []
Yedges_red = []
Xedges_green = []
Yedges_green = []
def insert_edge(Xedges, Yedges):
Xedges.extend([pos[e[0]][0], pos[e[1]][0], None])
Yedges.extend([pos[e[0]][1], pos[e[1]][1], None])
search_dict = nx.get_edge_attributes(mst, 'correlation')
for e in mst.edges():
correlation = search_dict[(e[0], e[1])]
if correlation <= 0 : # red_edges
insert_edge(Xedges_red, Yedges_red)
else:
insert_edge(Xedges_green, Yedges_green)
# x coordinates of the nodes defining the edge e
return Xnodes, Ynodes, Xedges_red, Yedges_red, Xedges_green, Yedges_green
node_label = list(mst.nodes())
node_label = [fix_string(x) for x in node_label]
# get coordinates for nodes and edges
Xnodes, Ynodes, Xedges_red, Yedges_red, Xedges_green, Yedges_green = get_coordinates()
external_data = [list(x) for x in coding_names_df.values]
external_data = {fix_string(x[0]): x[1] for x in external_data}
external_data2 = [list(y) for y in coding_names_df.values]
external_data2 = {fix_string(y[0]): y[2] for y in external_data2}
external_data3 = [list(z) for z in coding_names_df.values]
external_data3 = {fix_string(z[0]): z[3] for z in external_data3}
external_data4 = [list(s) for s in coding_names_df.values]
external_data4 = {fix_string(s[0]): s[4] for s in external_data4}
# =============================================================================
description = [f"<b>{index}) {node}</b>"
"<br><br>Realm: " +
"<br>" + external_data.get(node, 'No external data found') +
"<br><br>Type: " +
"<br>" + external_data2.get(node, 'No external data found')
for index, node in enumerate(node_label)]
# =============================================================================
# def nodes colours:
node_colour = [assign_node_colour(node, external_data3, coding_names_df) for node in node_label]
node_shape = [assign_node_shape(node, external_data4, coding_names_df) for node in node_label]
# edges
# negative:
tracer_red = go.Scatter(x=Xedges_red, y=Yedges_red,
mode='lines',
line= dict(color="#FA0000", width=1),
hoverinfo='none',
showlegend=False)
# positive:
tracer_green = go.Scatter(x=Xedges_green, y=Yedges_green,
mode='lines',
line= dict(color= "#29A401", width=1),
hoverinfo='none',
showlegend=False)
# nodes
tracer_marker = go.Scatter(x=Xnodes, y=Ynodes,
mode='markers+text',
textposition='top center',
marker=dict(size=node_size,
line=dict(width=0.8, color='black'),
color=node_colour,
symbol=node_shape),
hovertext=description,
hoverinfo='text',
textfont=dict(size=7),
showlegend=False)
axis_style = dict(title='',
titlefont=dict(size=20),
showgrid=False,
zeroline=False,
showline=False,
ticks='',
showticklabels=False)
layout = dict(title='',
width=1300,
height=900,
autosize=False,
showlegend=False,
xaxis=axis_style,
yaxis=axis_style,
hovermode='closest',
plot_bgcolor = '#fff')
fig = dict(data=[tracer_red, tracer_green, tracer_marker], layout=layout)
display(HTML("""<p>Node sizes are proportional to the size of annualised returns.<br>
Node colours signify positive or negative returns since beginning of the timeframe.</p> """))
plot(fig)
and I got this plot: network
I want to add labels, but it's getting too crowded (especially in the middle)
so I have two questions:
How can I spacing the middle? (but still to keep the fruchterman_reingold possition)
How can I add just a few specific labels?
any help will be graet! Tnx :)

Something you could try is setting the k parameter in the layout algorithm, which as mentioned in the docs it sets:
k: (float (default=None)) – Optimal distance between nodes. If None the distance is set to 1/sqrt(n) where n is the number of nodes. Increase this value to move nodes farther apart.
So by playing a bit with this value, and increasing accordingly we can get a more spread out layout and avoid overlap between node labels.
Here's a simple example to illustrate what the behavior is:
result_set = {('plant','tree'), ('tree','oak'), ('flower', 'rose'), ('flower','daisy'), ('plant','flower'), ('tree','pine'), ('plant','roots'), ('animal','fish'),('animal','bird'), ('bird','robin'), ('bird','falcon'), ('animal', 'homo'),('homo','homo-sapiens'), ('animal','reptile'), ('reptile','snake'),('fungi','mushroom'), ('fungi','mold'), ('fungi','toadstool'),('reptile','crocodile'), ('mushroom','Portabello'), ('mushroom','Shiitake'),('pine','roig'),('pine','pinyer'), ('tree','eucaliptus'),('rose','Floribunda'),('rose','grandiflora')}
G=nx.from_edgelist(result_set)
pos=nx.fruchterman_reingold_layout(G)
plt.figure(figsize=(8,5))
nx.draw(G, pos=pos,
with_labels=True,
nodesize=1000,
node_color='lightgreen')
And if we increase the value of k to say 0.5, we get a nice spreading of the nodes in the layout:
pos_spaced=nx.fruchterman_reingold_layout(G, k=0.5, iterations=100)
plt.figure(figsize=(10,6))
nx.draw(G,
pos=pos_spaced,
with_labels=True,
nodesize=1000,
node_color='lightgreen')
How can I add just a few specific labels?
For this you set the labels parameters in draw to a dictionary containing the labels you want displayed. In the case the node names are the same as the labels, just create a dictionary mapping node->node as follows:
show_labels = ['plant', 'tree', 'oak', 'eucaliptus']
pos_spaced=nx.fruchterman_reingold_layout(G, k=0.54, iterations=100)
plt.figure(figsize=(10,6))
nx.draw(G,
pos=pos_spaced,
with_labels=True,
nodesize=1000,
labels=dict(zip(show_labels,show_labels)),
node_color='lightgreen')

How to plot k-means clustering results in an ordered way

I am using k-mans clustering as a means of customer and product segmentation. I found a function on stack that takes the cluster results and reorders them based on the average value of a target value in the dataframe. This seems to be working quite well but in order to plot the results I am first creating a string column in the data frame based on the ordered clustering to prevent seaborn from creating bins in the hue labels. The first problem I ran into was that while the plot and labels were being generated as intended the legend was out of order. I added a hue order but the ledgend becomes fixed to this order so changing the value of K makes the legend confusing. I added a function to address this problem as well and everything seems to be working as intended but I would like to know if there are any better ways of achiving this. I will place the related code blocks bellow.
#function for ordering cluster numbers
def order_cluster(cluster_field_name, target_field_name,df,ascending):
new_cluster_field_name = 'new_' + cluster_field_name
df_new = df.groupby(cluster_field_name)[target_field_name].mean().reset_index()
df_new = df_new.sort_values(by=target_field_name,ascending=ascending).reset_index(drop=True)
df_new['index'] = df_new.index
df_final = pd.merge(df,df_new[[cluster_field_name,'index']], on=cluster_field_name)
df_final = df_final.drop([cluster_field_name],axis=1)
df_final = df_final.rename(columns={"index":cluster_field_name})
return df_final
#adding column to dataframe based on clustering
kmeans = KMeans(n_clusters=4)
kmeans.fit(data[['ORDERS_PLACED','UNITS_SOLD','AVG_UNIT_PRICE','TOTAL_SALES']])
data['Rank'] = kmeans.predict(data[['ORDERS_PLACED','UNITS_SOLD','AVG_UNIT_PRICE','TOTAL_SALES']])
#ordering the results
data = order_cluster('Rank','TOTAL_SALES',data,True)
top = data['Rank'].max()
#adding string column to dataframe
data['Rank_ID'] = [('Group_A' if x == top else
('Group_B' if x == top - 1 else
('Group_C' if x == top - 2 else
('Group_D' if x == top - 3 else
('Group_E' if x == top - 4 else
('Group_F' if x == top - 5 else
('Group_G' if x == top - 6 else
('Group_H' if x == top - 7 else
('Group_I' if x == top - 8 else
('Group_J' if x == top - 9 else 'Group_Z')))))))))
) for x in data['Rank']]
#function to build the plot legend values
def build_legend(k_value):
if k_value == 0:
legend = ['Group_A']
elif k_value == 1:
legend = ['Group_A','Group_B']
elif k_value == 2:
legend = ['Group_A','Group_B','Group_C']
elif k_value == 3:
legend = ['Group_A','Group_B','Group_C','Group_D']
elif k_value == 4:
legend = ['Group_A','Group_B','Group_C','Group_D','Group_E']
elif k_value == 5:
legend = ['Group_A','Group_B','Group_C','Group_D','Group_E','Group_F']
elif k_value == 6:
legend = ['Group_A','Group_B','Group_C','Group_D','Group_E','Group_F','Group_G']
elif k_value == 7:
legend = ['Group_A','Group_B','Group_C','Group_D','Group_E','Group_F','Group_G','Group_H']
elif k_value == 8:
legend = ['Group_A','Group_B','Group_C','Group_D','Group_E','Group_F','Group_G','Group_H','Group_I']
elif k_value == 9:
legend = ['Group_A','Group_B','Group_C','Group_D','Group_E','Group_F','Group_G','Group_H','Group_I','Group_J']
else:
legend = ['Group_A','Group_B','Group_C','Group_D','Group_E','Group_F','Group_G','Group_H','Group_I','Group_J','Group_Z']
return legend
#plotting the results
orderHue = build_legend(top)
fig, ax = plt.subplots(figsize=(12,5))
plot = sns.scatterplot(x='ORDERS_PLACED', y='TOTAL_SALES', hue='Rank_ID', size='Rank_ID',
hue_order=orderHue, size_order=orderHue, data=report, ax=ax)
ytick = plot.get_yticks()
plot.set_yticklabels(['{:,.0f}'.format(x) for x in ytick])
plot.set_title('80/20 Customer Segmentation Using K-Means Clustering, Plot on Orders Placed & Total Sales',fontsize=12)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2)
plt.show(plot)
This seems like a lot of code to achive what might be quite simple.
Here is a quick sample of the data as requested,
CUSTOMER_ID ORDERS_PLACED UNITS_SOLD AVG_UNIT_PRICE TOTAL_SALES
A 2 59 21553.9 1271680
B 106 184 6295.9 1158445.7
C 13 78 14290 1114620
D 43 2034 245.38 499102
E 53 582 760.92 442856
F 1 6 15000 90000
G 3 60 967 58020
H 1 1 1807 1807

How can I change the colormap of an existing plot given an image file?

How can a figure using a rainbow colormap, such as figure 1, be converted so that the same data are displayed using a different color map, such as a perceptually uniform sequential map?
Assume that the underlying data from which the original image was generated are not accessible and the image itself must be recolored using only information within the image.
Background information: rainbow color maps tend to produce visual artifacts. See the cyan line near z = -1.15 m? It looks like there's a sharp edge there. But look at the colorbar itself! Even the color bar has an edge there. There's another fake edge in the yellow band that goes vertically near R = 1.45 m. The horizontal yellow stripe may be a real edge in the underlying data, although it's difficult to distinguish that case from a rainbow artifact.
More information:
http://ieeexplore.ieee.org/abstract/document/4118486/
http://matplotlib.org/users/colormaps.html

Here is my best solution so far:
import numpy as np
import scipy
import os
import matplotlib
import copy
import matplotlib.pyplot as plt
from matplotlib.pyplot import imread, imsave
def_colorbar_loc = [[909, 22], [953 - 20, 959]]
def_working_loc = [[95, 189], [857, 708]]
def recolor_image(
filename='image.png',
colorbar_loc=def_colorbar_loc,
working_loc=def_working_loc,
colorbar_orientation='auto',
colorbar_direction=-1,
new_cmap='viridis',
normalize_before_compare=False,
max_rgb='auto',
threshold=0.4,
saturation_threshold=0.25,
compare_hue=True,
show_plot=True,
debug=False,
):
"""
This script reads in an image file (like .png), reads the image's color bar (you have to tell it where), interprets
the color map used in the image to convert colors to values, then recolors those values with a new color map and
regenerates the figure. Useful for fixing figures that were made with rainbow color maps.
Parameters
-----------
:param filename: Full path and filename of the image file.
:param colorbar_loc: Location of color bar, which will be used to analyze the image and convert colors into values.
Pixels w/ 0,0 at top left corner: [[left, top], [right, bottom]]
:param working_loc: Location of the area to recolor. You don't have to recolor the whole image.
Pixels w/ 0,0 at top left corner: [[left, top], [right, bottom]], set to [[0, 0], [-1, -1]] to do everything.
:param colorbar_orientation: Set to 'x', 'y', or 'auto' to specify whether color map is horizontal, vertical,
or should be determined based on the dimensions of the colorbar_loc
:param colorbar_direction: Controls direction of ascending value
+1: colorbar goes from top to bottom or left to right.
-1: colorbar goes from bottom to top or right to left.
:param new_cmap: String describing the new color map to use in the recolored image.
:param normalize_before_compare: Divide r, g, and b each by (r+g+b) before comparing.
:param max_rgb: Do the values of r, g, and b range from 0 to 1 or from 0 to 255? Set to 1, 255, or 'auto'.
:param threshold: Sum of absolute differences in r, g, b values must be less than threshold to be valid
(0 = perfect, 3 = impossibly bad). Higher numbers = less chance of missing pixels but more chance of recoloring
plot axes, etc.
:param saturation_threshold: Minimum color saturation below which no replacement will take place
:param compare_hue: Use differences in HSV instead of RGB to determine with which index each pixel should be
associated.
:param show_plot: T/F: Open a plot to explain what is going on. Also helpful for checking your aim on the colorbar
coordinates and debugging.
:param debug: T/F: Print debugging information.
"""
def printd(string_in):
"""
Prints debugging statements
:param string_in: String to print only if debug is on.
:return: None
"""
if debug:
print(string_in)
return
print('Recoloring image: {:} ...'.format(filename))
# Determine tag name and load original file into the tree
fn1 = filename.split(os.sep)[-1] # Filename without path
fn2 = fn1.split(os.extsep)[0] # Filename without extension (so new filename can be built later)
ext = fn1.split(os.extsep)[-1] # File extension
path = os.sep.join(filename.split(os.sep)[0:-1]) # Path; used later to save results.
a = imread(filename).astype(float)
printd(f'Read image; shape = {np.shape(a)}')
if max_rgb == 'auto':
# Determine if values of R, G, and B range from 0 to 1 or from 0 to 255
if a.max() > 1:
max_rgb = 255.0
else:
max_rgb = 1.0
# Normalize a so RGB values go from 0 to 1 and are floats.
a /= max_rgb
# Extract the colorbar
x = np.array([colorbar_loc[0][0], colorbar_loc[1][0]])
y = np.array([colorbar_loc[0][1], colorbar_loc[1][1]])
cb = a[y[0]:y[1], x[0]:x[1]]
# Take just the working area, not the whole image
xw = np.array([working_loc[0][0], working_loc[1][0]])
yw = np.array([working_loc[0][1], working_loc[1][1]])
a1 = a[yw[0]:yw[1], xw[0]:xw[1]]
# Pick color bar orientation
if colorbar_orientation == 'auto':
if np.diff(x) > np.diff(y):
colorbar_orientation = 'x'
else:
colorbar_orientation = 'y'
printd('Auto selected colorbar_orientation')
printd('Colorbar orientation is {:}'.format(colorbar_orientation))
# Analyze the colorbar
if colorbar_orientation == 'y':
cb = np.nanmean(cb, axis=1)
else:
cb = np.nanmean(cb, axis=0)
if colorbar_direction < 0:
cb = cb[::-1]
# Compress colorbar to only count unique colors
# If the array gets too big, it will fill memory and crash python: https://github.com/numpy/numpy/issues/14136
dcb = np.append(1, np.sum(abs(np.diff(cb[:, 0:3], axis=0)), axis=1))
cb = cb[dcb > 0]
# Find and mask of special colors that should not be recolored
n1a = np.sum(a1[:, :, 0:3], axis=2)
replacement_mask = np.ones(np.shape(n1a), bool)
for col in [0, 3]: # Black and white will come out as 0 and 3.
mask_update = n1a != col
if mask_update.max() == 0:
print('Warning: masking to protect special colors prevented all changes to the image!')
else:
printd('Good: Special color mask {:} allowed at least some changes'.format(col))
replacement_mask *= mask_update
if replacement_mask.max() == 0:
print('Warning: replacement mask will prevent all changes to the image! '
'(Reached this point during special color protection)')
printd('Sum(replacement_mask) = {:} (after considering special color {:})'
.format(np.sum(np.atleast_1d(replacement_mask)), col))
# Also apply limits to total r+g+b
replacement_mask *= n1a > 0.75
replacement_mask *= n1a < 2.5
if replacement_mask.max() == 0:
print('Warning: replacement mask will prevent all changes to the image! '
'(Reached this point during total r+g+b+ limits)')
printd('Sum(replacement_mask) = {:} (after considering r+g+b upper threshold)'
.format(np.sum(np.atleast_1d(replacement_mask))))
if saturation_threshold > 0:
hsv1 = matplotlib.colors.rgb_to_hsv(a1[:, :, 0:3])
sat = hsv1[:, :, 1]
printd('Saturation ranges from {:} <= sat <= {:}'.format(sat.min(), sat.max()))
sat_mask = sat > saturation_threshold
if sat_mask.max() == 0:
print('Warning: saturation mask will prevent all changes to the image!')
else:
printd('Good: Saturation mask will allow at least some changes')
replacement_mask *= sat_mask
if replacement_mask.max() == 0:
print('Warning: replacement mask will prevent all changes to the image! '
'(Reached this point during saturation threshold)')
printd(f'shape(a1) = {np.shape(a)}')
printd(f'shape(cb) = {np.shape(cb)}')
# Find where on the colorbar each pixel sits
if compare_hue:
# Difference in hue
hsv1 = matplotlib.colors.rgb_to_hsv(a1[:, :, 0:3])
hsv_cb = matplotlib.colors.rgb_to_hsv(cb[:, 0:3])
d2 = abs(hsv1[:, :, :, np.newaxis] - hsv_cb.T[np.newaxis, np.newaxis, :, :])
# d2 = d2[:, :, 0, :] # Take hue only
d2 = np.sum(d2, axis=2)
printd(' shape(d2) = {:} (hue version)'.format(np.shape(d2)))
else:
# Difference in RGB
if normalize_before_compare:
# Difference of normalized RGB arrays
n1 = n1a[:, :, np.newaxis]
n2 = np.sum(cb[:, 0:3], axis=1)[:, np.newaxis]
w1 = n1 == 0
w2 = n2 == 0
n1[w1] = 1
n2[w2] = 1
d = (a1/n1)[:, :, 0:3, np.newaxis] - (cb/n2).T[np.newaxis, np.newaxis, 0:3, :]
else:
# Difference of non-normalized RGB arrays
d = (a1[:, :, 0:3, np.newaxis] - cb.T[np.newaxis, np.newaxis, 0:3, :])
printd(f'Shape(d) = {np.shape(d)}')
d2 = np.sum(np.abs(d[:, :, 0:3, :]), axis=2) # 0:3 excludes the alpha channel from this calculation
printd('Processed colorbar')
index = d2.argmin(axis=2)
md2 = d2.min(axis=2)
index_valid = md2 < threshold
if index_valid.max() == 0:
print('Warning: minimum difference is greater than threshold: all changes rejected!')
else:
printd('Good: Minimum difference filter is lower than threshold for at least one pixel.')
printd('Sum(index_valid) = {:} (before *= replacement_mask)'.format(np.sum(np.atleast_1d(index_valid))))
printd('Sum(replacement_mask) = {:} (final, before combining w/ index_valid)'
.format(np.sum(np.atleast_1d(replacement_mask))))
index_valid *= replacement_mask
if index_valid.max() == 0:
print('Warning: index_valid mask prevents all changes to the image after combination w/ replacement_mask.')
else:
printd('Good: Mask will allow at least one pixel to change.')
printd('Sum(index_valid) = {:}'.format(np.sum(np.atleast_1d(index_valid))))
value = index/(len(cb)-1.0)
printd('Index ranges from {:} to {:}'.format(index.min(), index.max()))
# Make a new image with replaced colors
b = matplotlib.cm.ScalarMappable(cmap=new_cmap).to_rgba(value) # Remap everything
printd('shape(b) = {:}, min(b) = {:}, max(b) = {:}'.format(np.shape(b), b.min(), b.max()))
c = copy.copy(a1) # Copy original
c[index_valid] = b[index_valid] # Transfer only pixels where color was close to colormap
# Transfer working area to full image
c2 = copy.copy(a) # Copy original full image
c2[yw[0]:yw[1], xw[0]:xw[1], :] = c # Replace working area
c2[:, :, 3] = a[:, :, 3] # Preserve original alpha channel
# Save the image in the same path as the original but with _recolored added to the filename.
new_filename = '{:}{:}{:}_recolored{:}{:}'.format(path, os.sep, fn2, os.extsep, ext)
imsave(new_filename, c2)
print('Done recoloring. Result saved to {:} .'.format(new_filename))
if show_plot:
# Setup figure for showing things to the user
f, axs = plt.subplots(2, 3)
axo = axs[0, 0] # Axes for original figure
axoc = axs[0, 1] # Axes for original color bar
axf = axs[0, 2] # Axes for final figure
axm = axs[1, 1] # Axes for mask
axre = axs[1, 2] # Axes for recolored section only (it might not be the whole figure)
axraw = axs[1, 0] # Axes for raw recoloring result before masking
for ax in axs.flatten():
ax.set_xlabel('x pixel')
ax.set_ylabel('y pixel')
axo.set_title('Original image w/ colorbar ID overlay')
axoc.set_title('Color progression from original colorbar')
axm.set_title('Mask')
axre.set_title('Recolored section')
axraw.set_title('Raw recolor result (no masking)')
axf.set_title('Final image')
axoc.set_xlabel('Index')
axoc.set_ylabel('Value')
# Show the user where they placed the color bar and working location
axo.imshow(a)
xx = x[np.array([0, 0, 1, 1, 0])]
yy = y[np.array([0, 1, 1, 0, 0])]
axo.plot(xx, yy, '+-', label='colorbar')
xxw = xw[np.array([0, 0, 1, 1, 0])]
yyw = yw[np.array([0, 1, 1, 0, 0])]
axo.plot(xxw, yyw, '+-', label='target')
tots = np.sum(cb[:, 0:3], axis=1)
if normalize_before_compare:
# Normalized version
axoc.plot(cb[:, 0] / tots, 'r', label='r/(r+g+b)', lw=2)
axoc.plot(cb[:, 1] / tots, 'g', label='g/(r+g+b)', lw=2)
axoc.plot(cb[:, 2] / tots, 'b', label='b/(r+g+b)', lw=2)
axoc.set_ylabel('Normalized value')
else:
axoc.plot(cb[:, 0], 'r', label='r', lw=2)
axoc.plot(cb[:, 1], 'g', label='g', lw=2)
axoc.plot(cb[:, 2], 'b', label='b', lw=2)
axoc.plot(cb[:, 3], color='gray', linestyle='--', label='$\\alpha$')
axoc.plot(tots, 'k', label='r+g+b')
# Display the new colors with no mask, the mask, and the recolored section
axraw.imshow(b)
axm.imshow(index_valid)
axre.imshow(c)
# Display the final result
axf.imshow(c2)
# Finishing touches on plots
axo.legend(loc=0).set_draggable(True)
axoc.legend(loc=0).set_draggable(True)
plt.show()
return

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Segmentation Algorithm - python

Related

Reorder Sankey diagram vertically based on label value

How to highlight the lowest line on a linegraph in matplotlib?

spacing nodes at networkx/plotly network and labeling

How to plot k-means clustering results in an ordered way

How can I change the colormap of an existing plot given an image file?

Categories

Resources