I need to build custom seaborn heatmap-like plot according to these requirements:
import pandas as pd
df = pd.DataFrame({"A": [0.3, 0.8, 1.3],
"B": [4, 9, 15],
"C": [650, 780, 900]})
df_info = pd.DataFrame({"id": ["min", "max"],
"A": [0.5, 0.9],
"B": [6, 10],
"C": [850, 880]})
df_info = df_info.set_index('id')
df
A B C
0 0.3 4 650
1 0.8 9 780
2 1.3 15 900
df_info
id A B C
min 0.5 6 850
max 0.9 10 880
Each value within df is supposed to be within a range defined in df_info.
For example the values for the column A are considered normal if they are within 0.5 and 0.9. Values that are outside the range should be colorized using a custom heatmap.
In particular:
Values that fall within the range defined for each column should not be colorized, plain black text on white background cell.
Values lower than min for that column should be colorized, for example in blue. The lower their values from the min the darker the shade of blue.
Values higher than max for that column should be colorized, for example in red. The higher their values from the max the darker the shade of red.
Q: I wouldn't know how to approach this with a standard heatmap, I'm not even sure I can accomplish this with a heatmap plot. Any suggestion?
As far as I know, a heatmap can only have one scale of values. I would suggest normalizing the data you have in the df dataframe so the values in every column follow:
between 0 and 1 if the value is between df_info's min max
below 0 if the value is below df_info's min
above 1 if the value is above df_info's max
To normalize your dataframe use :
for col in df:
df[col] = (df[col] - df_info[col]['min']) / (df_info[col]['max'] - df_info[col]['min'])
Finally, to create the color-coded heatmap use :
import seaborn as sns
from matplotlib.colors import LinearSegmentedColormap
vmin = df.min().min()
vmax = df.max().max()
colors = [[0, 'darkblue'],
[- vmin / (vmax - vmin), 'white'],
[(1 - vmin)/ (vmax - vmin), 'white'],
[1, 'darkred']]
cmap = LinearSegmentedColormap.from_list('', colors)
sns.heatmap(df, cmap=cmap, vmin=vmin, vmax=vmax)
The additional calculations with vmin and vmax allow a dynamic scaling of the colormap depending on the differences with the minimums and maximums.
Using your input dataframe we have the following heatmap:
Related
I have a sample of a dataframe as shown below.
data = {'Date':['2021-07-18','2021-07-19','2021-07-20','2021-07-21','2021-07-22','2021-07-23'],
'Invalid':["NaN", 1, 1, "NaN", "NaN", "NaN"],
'Negative':[23, 24, 17, 24, 20, 23],
'Positive':["NaN", 1, 1, 1, "NaN", 1]}
df_sample = pd.DataFrame(data)
df_sample
The code for displaying a stacked bar graph is given below and also the graph produced by it.
temp = Graph1_df.set_index(['Dates', 'Results']).sort_index(0).unstack()
temp.columns = temp.columns.get_level_values(1)
f, ax = plt.subplots(figsize=(20, 5))
temp.plot.bar(ax=ax, stacked=True, width = 0.3, color=['blue','green','red'])
ax.title.set_text('Total Test Count vs Dates')
plt.show()
Using the code above or with any new approach, I want just the values for 'positive' to be displayed on the chart.
Note: 3rd column in the dataframe snippet is the 'Positive' column.
Any help is greatly appreciated.
Thanks.
Plotting with pandas.DataFrame.plot with kind='bar'
Use .bar_label to add annotations
See this answer for other links and options related to .bar_label
Stacked bar plots are plotted in order from left to right and bottom to top, based on the order of the columns and rows, respectively.
Since 'Positive' is column index 2, we only want labels for i == 2
Tested in pandas 1.3.0 and requires matplotlib >=3.4.2 and python >=3.8
The list comprehension for labels uses an assignment expression, :=, which is only available from python 3.8
labels = [f'{v.get_height():.0f}' if ((v.get_height()) > 0) and (i == 2) else '' for v in c] is the option without :=
.bar_label is only available from matplotlib 3.4.2
This answer shows how to add annotations for matplotlib <3.4.2
import pandas as pd
import numpy as np # used for nan
# test dataframe
data = {'Date':['2021-07-18','2021-07-19','2021-07-20','2021-07-21','2021-07-22','2021-07-23'],
'Invalid':[np.nan, 1, 1, np.nan, np.nan, np.nan],
'Negative':[23, 24, 17, 24, 20, 23],
'Positive':[np.nan, 1, 1, 1, np.nan, 1]}
df = pd.DataFrame(data)
# convert the Date column to a datetime format and use the dt accessor to get only the date component
df.Date = pd.to_datetime(df.Date).dt.date
# set Date as index
df.set_index('Date', inplace=True)
# create multi-index column to match OP image
top = ['Size']
current = df.columns
df.columns = pd.MultiIndex.from_product([top, current], names=['', 'Result'])
# display(df)
Size
Result Invalid Negative Positive
Date
2021-07-18 NaN 23 NaN
2021-07-19 1.0 24 1.0
2021-07-20 1.0 17 1.0
2021-07-21 NaN 24 1.0
2021-07-22 NaN 20 NaN
2021-07-23 NaN 23 1.0
# reset the top index to a column
df = df.stack(level=0).rename_axis(['Date', 'Size']).reset_index(level=1)
# if there are many top levels that are reset as a column, then select the data to be plotted
sel = df[df.Size.eq('Size')]
# plot
ax = sel.iloc[:, 1:].plot(kind='bar', stacked=True, figsize=(20, 5), title='Total Test Count vs Dates', color=['blue','green','red'])
# add annotations
for i, c in enumerate(ax.containers):
# format the labels
labels = [f'{w:.0f}' if ((w := v.get_height()) > 0) and (i == 2) else '' for v in c]
# annotate with custom labels
ax.bar_label(c, labels=labels, label_type='center', fontsize=10)
# pad the spacing between the number and the edge of the figure
ax.margins(y=0.1)
I'm plotting a scatter plot from a Pandas dataframe in Matplotlib. Here is what the dataframe looks like:
X Y R
0 1 945 1236.334519
0 1 950 212.809352
0 1 950 290.663847
0 1 961 158.156856
And here is how i'm plotting the Dataframe:
ax1.scatter(myDF.X, myDF.Y, s=20, c='red', marker='s', alpha=0.5)
My problem is that i want to change how the marker is plotted according to how high or low the value of R is.
Example: if R is higher than 1000 (as it is in the first row of my example), color should be yellow instead of red and alpha should be 0.8 instead of 0.5. If R is lower than 1000, color should be blue and alpha should be 0.4 and so on.
Is there any way to do that or can i only use different dataframe with different data? Thanks in advance!
You can do a custom RGBA color array:
colors = [(1,1,0,0.8) if x>1000 else (1,0,0,0.4) for x in df.R]
plt.scatter(df.X,df.Y, c=colors)
Output:
I want to plot a box plot with my DataFrame:
A B C
max 10 11 14
min 3 4 10
q1 5 6 12
q3 9 7 13
how can I plot a box plot with these fixed values?
You can use the Axes.bxp method in matplotlib, based on this helpful answer. The input is a list of dictionaries containing the relevant values, but the median is a required key in these dictionaries. Since the data you provided does not include medians, I have made up medians in the code below (but you will need to calculate them from your actual data).
import matplotlib.pyplot as plt
import pandas as pd
# reproducing your data
df = pd.DataFrame({'A':[10,3,5,9],'B':[11,4,6,7],'C':[14,10,12,13]})
# add a row for median, you need median values!
sample_medians = {'A':7, 'B':6.5, 'C':12.5}
df = df.append(sample_medians, ignore_index=True)
df.index = ['max','min','q1','q3','med']
Here is the modified df with medians included:
>>> df
A B C
max 10.0 11.0 14.0
min 3.0 4.0 10.0
q1 5.0 6.0 12.0
q3 9.0 7.0 13.0
med 7.0 6.5 12.5
Now we transform the df into a list of dictionaries:
labels = list(df.columns)
# create dictionaries for each column as items of a list
bxp_stats = df.apply(lambda x: {'med':x.med, 'q1':x.q1, 'q3':x.q3, 'whislo':x['min'], 'whishi':x['max']}, axis=0).tolist()
# add the column names as labels to each dictionary entry
for index, item in enumerate(bxp_stats):
item.update({'label':labels[index]})
_, ax = plt.subplots()
ax.bxp(bxp_stats, showfliers=False);
plt.show()
Unfortunately the median line is a required parameter so it must be specified for every box. Therefore we just make it as thin as possible to be virtually unseeable.
If you want each box to be drawn with different specifications, they will have to be in different subplots. I understand if this looks kind of ugly, so you can play around with the spacing between subplots or consider removing some of the y-axes.
fig, axes = plt.subplots(nrows=1, ncols=3, sharey=True)
# specify list of background colors, median line colors same as background with as thin of a width as possible
colors = ['LightCoral', '#FEF1B5', '#EEAEEE']
medianprops = [dict(linewidth = 0.1, color='LightCoral'), dict(linewidth = 0.1, color='#FEF1B5'), dict(linewidth = 0.1, color='#EEAEEE')]
# create a list of boxplots of length 3
bplots = [axes[i].bxp([bxp_stats[i]], medianprops=medianprops[i], patch_artist=True, showfliers=False) for i in range(len(df.columns))]
# set each boxplot a different color
for i, bplot in enumerate(bplots):
for patch in bplot['boxes']:
patch.set_facecolor(colors[i])
plt.show()
How to force min and max value range when styling a dataframe column ?
Example :
df = pd.DataFrame(columns = ['column'], data = [-1, 0, 1, 2])
df.style.background_gradient(cmap='coolwarm_r')
This produces a column where value 0 is red ; however I would like 0 to be white, i.e. the middle of the 'coolwarm' colormap.
In this example, my understanding of the problem is that the default colormap interval maps my data [-1 : 2].
I would like to set it to [-2 : 2]
I've created a dummy dataframe which is similar to the one I'm using.
The dataframe consists of Fare prices, Cabin-type, and Survival (1 is alive, 0 = dead).
The first plot creates many graphs via factorplot, with each graph representing the Cabin type. The x-axis is represented by the Fare price and Y-axis is just a count of the number of occurrences at that Fare price.
What I then did was created another series, via groupby of [Cabin, Fare] and then proceeded to take the mean of the survival to get the survival rate at each Cabin and Fare price.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame(dict(
Fare=[20, 10, 30, 40, 40, 10, 20, 30, 40 ,30, 20, 30, 30],
Cabin=list('AAABCDBDCDDDC'),
Survived=[1, 0, 0, 0 ,0 ,1 ,1 ,0 ,1 ,1 , 0, 1, 1]
))
g =sns.factorplot(x='Fare', col='Cabin', kind='count', data=df,
col_wrap=3, size=3, aspect=1.3, palette='muted')
plt.show()
x =df.groupby(['Cabin','Fare']).Survived.mean()
What I would like to do is, plot an lineplot on the count graph above, (so the x-axis is the same, and each graph is still represented by a Cabin-type), but I would like the y-axis to be the survival mean we calculated with the groupby series x in the code above, which when outputted would be the third column below.
Cabin Fare
A 10 0.000000
20 1.000000
30 0.000000
B 20 1.000000
40 0.000000
C 30 1.000000
40 0.500000
D 10 1.000000
20 0.000000
30 0.666667
The y-axis for the line plot should be on the right side, and the range I would like is [0, .20, .40, .60, .80, 1.0, 1.2]
I looked through the seaborn docs for a while, but I couldn't figure out how to properly do this.
My desired output looks something like this image. I'm sorry my writing looks horrible, I don't know how to use paint well. So the ticks and numbers are on the right side of each graph. The line plot will be connected via dots at each x,y point. So for Cabin A, the first x,y point is (10,0) with 0 corresponding to the right y-axis. The second point is (20,1) and so on.
Data operations:
Compute frequency counts:
df_counts = pd.crosstab(df['Fare'], df['Cabin'])
Compute means across the group and unstack it back to obtain a DF. The Nan's are left as they are and not replaced by zero's to show the break in the line plot or else they would be continuous which wouldn't make much sense here.
df_means = df.groupby(['Cabin','Fare']).Survived.mean().unstack().T
Prepare the x-axis labels as strings:
df_counts.index = df_counts.index.astype(str)
df_means.index = df_means.index.astype(str)
Plotting:
fig, ax = plt.subplots(1, 4, figsize=(10,4))
df_counts.plot.bar(ax=ax, ylim=(0,5), cmap=plt.cm.Spectral, subplots=True,
legend=None, rot=0)
# Use secondary y-axis(right side)
df_means.plot(ax=ax, secondary_y=True, marker='o', color='r', subplots=True,
legend=None, xlim=(0,4))
# Adjust spacing between subplots
plt.subplots_adjust(wspace=0.5, hspace=0.5)
plt.show()