Get the height of the rectangles in a plot - python

I have the following graph 1 obtained with the following code [2]. As you can see from the first line inside for I gave the height of the rectangles based on the standard deviation value. But I can't figure out how to get the height of the corresponding rectangle. For example given the blue rectangle I would like to return the 2 intervals in which it is included which are approximately 128.8 and 130.6. How can I do this?
[2] The code I used is the following:
import pandas as pd
import matplotlib.ticker as ticker
import matplotlib.pyplot as plt
import numpy as np
dfLunedi = pd.read_csv( "0.lun.csv", encoding = "ISO-8859-1", sep = ';')
dfSlotMean = dfLunedi.groupby('slotID', as_index=False).agg( NLunUn=('date', 'nunique'),NLunTot = ('date', 'count'), MeanBPM=('tempo', 'mean'), std = ('tempo','std') )
#print(dfSlotMean)
dfSlotMean.drop(dfSlotMean[dfSlotMean.NLunUn < 3].index, inplace=True)
df = pd.DataFrame(dfSlotMean)
df.to_csv('1.silLunedi.csv', sep = ';', index=False)
print(df)
bpmMattino = df['MeanBPM']
std = df['std']
listBpm = bpmMattino.tolist()
limInf = df['MeanBPM'] - df['std']
limSup = df['MeanBPM'] + df['std']
tick_spacing = 1
fig, ax = plt.subplots(1, 1)
for _, r in df.iterrows():
#
ax.plot([r['slotID'], r['slotID']+1], [r['MeanBPM']]*2, linewidth = r['std'] )
#ax.plot([r['slotID'], r['slotID']+1], [r['MeanBPM']]*2, linewidth = r['std'])
ax.xaxis.grid(True)
ax.yaxis.grid(True)
ax.yaxis.set_major_locator(ticker.MultipleLocator(tick_spacing))
ax.xaxis.set_major_locator(ticker.MultipleLocator(tick_spacing))
This is the content of the csv:
slotID NMonUnique NMonTot MeanBPM std
0 7 11 78 129.700564 29.323091
2 11 6 63 123.372397 24.049397
3 12 6 33 120.625667 24.029006
4 13 5 41 124.516341 30.814985
5 14 4 43 118.904512 26.205309
6 15 3 13 116.380538 24.336491
7 16 3 42 119.670881 27.416843
8 17 5 40 125.424125 32.215865
9 18 6 45 130.540578 24.437559
10 19 9 58 128.180172 32.099529
11 20 5 44 125.596045 28.060657

I would advise against using linewidth to show anything related to your data. The reason being that linewidth is measured in "points" (see the matplotlib documentation), the size of which are not related to the xy-space that you plot your data in. To see this in action, try plotting with different linewidths and changing the size of the plotting-window. The linewidth will not change with the axes.
Instead, if you do indeed want a rectangle, I suggest using matplotlib.patches.Rectangle. There is a good example of how to do that in the documentation, and I've also added an even shorter example below.
To give the rectangles different colors, you can do as here here and simply get a random tuple with 3 elements and use that for the color. Another option is to take a list of colors, for example the TABLEAU_COLORS from matplotlib.colors and take consecutive colors from that list. The latter may be better for testing, as the rectangles will get the same color for each run, but notice that there are just 10 colors in TABLEAU_COLORS, so you will have to cycle if you have more than 10 rectangles.
import matplotlib.pyplot as plt
import matplotlib.patches as ptc
import random
x = 3
y = 4.5
y_std = 0.3
fig, ax = plt.subplots()
for i in range(10):
c = tuple(random.random() for i in range(3))
# The other option as comment here
#c = mcolors.TABLEAU_COLORS[list(mcolors.TABLEAU_COLORS.keys())[i]]
rect = ptc.Rectangle(xy=(x, y-y_std), width=1, height=2*y_std, color=c)
ax.add_patch(rect)
ax.set_xlim((0,10))
ax.set_ylim((0,5))
plt.show()

If you define the height as the standard deviation, and the center is at the mean, then the interval should be [mean-(std/2) ; mean+(std/2)] for each rectangle right? Is it intentional that the rectangles overlap? If not, I think it is your use of linewidth to size the rectangles which is at fault. If the plot is there to visualize the mean and variance of the different categories something like a boxplot or raincloud plot might be better.

Related

How do I plot stacked barplots side by side in python? (preferentially seaborn)

I'm looking a way to plot side by side stacked barplots to compare host composition of positive (Condition==True) and total cases in each country from my dataframe.
Here is a sample of the DataFrame.
id Location Host genus_name #ofGenes Condition
1 Netherlands Homo sapiens Escherichia 4.0 True
2 Missing Missing Klebsiella 3.0 True
3 Missing Missing Aeromonas 2.0 True
4 Missing Missing Glaciecola 2.0 True
5 Antarctica Missing Alteromonas 2.0 True
6 Indian Ocean Missing Epibacterium 2.0 True
7 Missing Missing Klebsiella 2.0 True
8 China Homo sapiens Escherichia 0 False
9 Missing Missing Escherichia 2.0 True
10 China Plantae kingdom Pantoea 0 False
11 China Missing Escherichia 2.0 True
12 Pacific Ocean Missing Halomonas 0 False
I need something similar to the image bellow, but I want to plot in percentage.
Can anyone help me?
I guess what you want is a stacked categorical bar plot, which cannot be directly plotted using seaborn. But you can achieve it by customizing one.
Import some necessary packages.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.patches import Patch
Read the dataset. Considering your sample data is too small, I randomly generate some to make the plot looks good.
def gen_fake_data(data, size=400):
unique_values = []
for c in data.columns:
unique_values.append(data[c].unique())
new_data = pd.DataFrame({c: np.random.choice(unique_values[i], size=size)
for i, c in enumerate(data.columns)})
new_data = pd.concat([data, new_data])
new_data['id'] = new_data.index + 1
return new_data
data = pd.read_csv('data.csv')
new_data = gen_fake_data(data)
Define the stacked categorical bar plot
def stack_catplot(x, y, cat, stack, data, palette=sns.color_palette('Reds')):
ax = plt.gca()
# pivot the data based on categories and stacks
df = data.pivot_table(values=y, index=[cat, x], columns=stack,
dropna=False, aggfunc='sum').fillna(0)
ncat = data[cat].nunique()
nx = data[x].nunique()
nstack = data[stack].nunique()
range_x = np.arange(nx)
width = 0.8 / ncat # width of each bar
for i, c in enumerate(data[cat].unique()):
# iterate over categories, i.e., Conditions
# calculate the location of each bar
loc_x = (0.5 + i - ncat / 2) * width + range_x
bottom = 0
for j, s in enumerate(data[stack].unique()):
# iterate over stacks, i.e., Hosts
# obtain the height of each stack of a bar
height = df.loc[c][s].values
# plot the bar, you can customize the color yourself
ax.bar(x=loc_x, height=height, bottom=bottom, width=width,
color=palette[j + i * nstack], zorder=10)
# change the bottom attribute to achieve a stacked barplot
bottom += height
# make xlabel
ax.set_xticks(range_x)
ax.set_xticklabels(data[x].unique(), rotation=45)
ax.set_ylabel(y)
# make legend
plt.legend([Patch(facecolor=palette[i]) for i in range(ncat * nstack)],
[f"{c}: {s}" for c in data[cat].unique() for s in data[stack].unique()],
bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.)
plt.grid()
plt.show()
Let's plot!
plt.figure(figsize=(6, 3), dpi=300)
stack_catplot(x='Location', y='#ofGenes', cat='Condition', stack='Host', data=new_data)
If you want to plot in percentile, calculate it in the raw dataset.
total_genes = new_data.groupby(['Location', 'Condition'], as_index=False)['#ofGenes'].sum().rename(
columns={'#ofGenes': 'TotalGenes'})
new_data = new_data.merge(total_genes, how='left')
new_data['%ofGenes'] = new_data['#ofGenes'] / new_data['TotalGenes'] * 100
plt.figure(figsize=(6, 3), dpi=300)
stack_catplot(x='Location', y='%ofGenes', cat='Condition', stack='Host', data=new_data)
You didn't specify how you would like to stack the bars, but you should be able to do something like this...
df = pd.read_csv('data.csv')
agg_df = df.pivot_table(index='Location', columns='Host', values='Condition', aggfunc='count')
agg_df.plot(kind='bar', stacked=True)

Matplotlib: How to plot Time Series on top of Scatter Plot

I have found solutions to similar questions, but they all produce odd results.
I have a plot that looks like this:
generated using this code:
ax1 = dft.plot(kind='scatter',x='end_date',y='pct',c='fte_grade',colormap='Reds',colorbar=False,edgecolors='red',vmin=4,vmax=10)
ax1.set_xticklabels([datetime.datetime.fromtimestamp(ts / 1e9).strftime('%Y-%m-%d') for ts in ax1.get_xticks()])
dfb.plot(kind='scatter',x='end_date',y='pct',c='fte_grade',colormap='Blues',title='%s Polls'%state,ax=ax1,colorbar=False,edgecolors='blue',vmin=4,vmax=10)
plt.ylim(30,70)
plt.axhline(50,ls='--',alpha=0.5,color='grey')
plt.xticks(rotation=20)
Now, whenever I try to plot a line ontop of this, I get something like the following:
import matplotlib.pyplot as plt
import numpy as np
x = dft['pct']
u = dft['Trump Odds']
t = list(pd.to_datetime(dft['end_date']))
plt.hold(True)
plt.subplot2grid((1, 1), (0, 0))
plt.plot(t,x)
plt.scatter(t, u)
plt.show()
If it's not clear, this is not what I want. These dots represent individual polls, and I have data representing a line that aggregates the individual polls. I think this has something to do with datetimes and the possibility of multiple polls for a particular date in the polling. I think that the plotter is getting confused because I have double values for the same date, so it assumes this is not a time series, and when i plot a line, it maintains the assumption that we don't need any continuity.
There must be something within python that can handle drawing a time series on top of a time xaxis scatter plot right?
dft data:
end_date pct fte_grade Trump Odds
0 1598054400000000000 32.0 6 32.000000
1 1588550400000000000 32.0 7 32.000000
2 1582156800000000000 39.0 8 34.666667
3 1585180800000000000 33.0 8 34.206897
4 1587600000000000000 29.0 8 33.081081
5 1590019200000000000 32.0 8 33.025641
6 1559779200000000000 36.0 8 33.800000
7 1593043200000000000 32.0 8 32.400000
Is your str ange line is not due to the fact you didn't sort the df before to plot it:
import matplotlib.pyplot as plt
import numpy as np
dft=dft.sort_values(by=['end_date'])
x = dft['pct']
u = dft['Trump Odds']
t = list(pd.to_datetime(dft['end_date']))
plt.hold(True)
plt.subplot2grid((1, 1), (0, 0))
plt.plot(t,x)
plt.scatter(t, u)
plt.show()

How to get two legends using pandas plot, one for the colors of the stacked bars and one for the hatches of the bars?

I have been trying to understand the answer of this post in order to populate two different legends.
I create a clustered stacked bar plot with different hatches for each bar and my code below is a bit different from the answer of the aforementioned post.
But I have not been able to figure out how to get one legend with the colors and one legend with the hatches.
The color legend should correspond to A, B, C, D, E and the hatch legend should indicate "with" if bar is hatched and "without" if non-hatched.
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap as coloring
# copy the dfs below and use pd.read_clipboard() to reproduce
df_1
A B C D E
Mg 10 15 23 25 27
Ca 30 33 0 20 17
df_2
A B C D E
Mg 20 12 8 40 10
Ca 7 26 12 22 16
hatches=(' ', '//')
colors_ABCDE=['tomato', 'gold', 'greenyellow', 'forestgreen', 'palevioletred']
dfs=[df_1,df_2]
for each_df, df in enumerate(dfs):
df.plot(ax=plt.subplot(111), kind="barh", \
stacked=True, hatch=hatches[each_df], \
colormap=coloring.from_list("my_colormap", colors_ABCDE), \
figsize=(7,2.5), position=len(dfs)-each_df-1, \
align='center', width=0.2, edgecolor="darkgrey")
plt.legend(loc='center left', bbox_to_anchor=(1.0, 0.5), fontsize=12)
The plot I manage to get is:
Any ideas how to create two legends and place them one next to the other or one below the other? Thanks in advance ^_^
Since adding legends in matplotlib is a complex, extensive step, consider using the very link you cite with function solution by #jrjc. However, you will need to adjust function to your horizontal bar graph needs. Specifically:
Add an argument for color map and in DataFrame.plot call
Adjust bar plot from kind='bar' to kind='barh' for horizontal version
Swap x for y in line: rect.set_y(rect.get_y() + 1 / float(n_df + 1) * i / float(n_col))
Swap width for height in line: rect.set_height(1 / float(n_df + 1))
Adjust axe.set_xticks and axe.set_xticklabels for np.arange(0, 120, 20) values
Function
import numpy as np
import pandas as pd
import matplotlib.cm as cm
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap as coloring
def plot_clustered_stacked(dfall, labels=None, title="multiple stacked bar plot", H="//",
colors_ABCDE=['tomato', 'gold', 'greenyellow', 'forestgreen', 'palevioletred'], **kwargs):
"""
CREDIT: #jrjc (https://stackoverflow.com/a/22845857/1422451)
Given a list of dataframes, with identical columns and index, create a clustered stacked bar plot.
labels is a list of the names of the dataframe, used for the legend
title is a string for the title of the plot
H is the hatch used for identification of the different dataframe
"""
n_df = len(dfall)
n_col = len(dfall[0].columns)
n_ind = len(dfall[0].index)
axe = plt.subplot(111)
for df in dfall : # for each data frame
axe = df.plot(kind="barh",
linewidth=0,
stacked=True,
ax=axe,
legend=False,
grid=False,
colormap=coloring.from_list("my_colormap", colors_ABCDE),
edgecolor="darkgrey",
**kwargs) # make bar plots
h,l = axe.get_legend_handles_labels() # get the handles we want to modify
for i in range(0, n_df * n_col, n_col): # len(h) = n_col * n_df
for j, pa in enumerate(h[i:i+n_col]):
for rect in pa.patches: # for each index
rect.set_y(rect.get_y() + 1 / float(n_df + 2) * i / float(n_col))
rect.set_hatch(H * int(i / n_col)) #edited part
rect.set_height(1 / float(n_df + 2))
axe.set_xticks(np.arange(0, 125, 20))
axe.set_xticklabels(np.arange(0, 125, 20).tolist(), rotation = 0)
axe.margins(x=0, tight=None)
axe.set_title(title)
# Add invisible data to add another legend
n=[]
for i in range(n_df):
n.append(axe.bar(0, 0, color="gray", hatch=H * i, edgecolor="darkgrey"))
l1 = axe.legend(h[:n_col], l[:n_col], loc=[1.01, 0.5])
if labels is not None:
l2 = plt.legend(n, labels, loc=[1.01, 0.1])
axe.add_artist(l1)
return axe
Call
plt.figure(figsize=(10, 4))
plot_clustered_stacked([df_1, df_2],["df_1", "df_2"])
plt.show()
plt.clf()
plt.close()
Output
I thought that this function solution by #jrjc is rather perplexing for my understanding and thus, I preferred to alter my own thing a little and adjust it.
So, it took my some time to understand that when a second legend is created for a plot, python automatically erases the first one and this is when add_artist() must be employed.
The other prerequisite in order to add the second legend is to name the plot and apply the .add_artist() method to that specific plot, so that python knows where to stick that new piece.
In short, this is how I managed to create the plot I had in mind and I hope that the comments will make it somehow clearer and useful for anyone.
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap as coloring
import matplotlib.patches as mpatches
# copy the dfs below and use pd.read_clipboard() to reproduce
df_1
A B C D E
Mg 10 15 23 25 27
Ca 30 33 0 20 17
df_2
A B C D E
Mg 20 12 8 40 10
Ca 7 26 12 22 16
hatches=(' ', '//')
colors_ABCDE=['tomato', 'gold', 'greenyellow', 'forestgreen', 'palevioletred']
dfs=[df_1,df_2]
for each_df, df in enumerate(dfs):
#I name the plot as "figure"
figure=df.plot(ax=plt.subplot(111), kind="barh", \
stacked=True, hatch=hatches[each_df], \
colormap=coloring.from_list("my_colormap", colors_ABCDE), \
figsize=(7,2.5), position=len(dfs)-each_df-1, \
align='center', width=0.2, edgecolor="darkgrey", \
legend=False) #I had to False the legend too
legend_1=plt.legend(df_1.columns, loc='center left', bbox_to_anchor=(1.0, 0.5), fontsize=12)
patch_hatched = mpatches.Patch(facecolor='beige', hatch='///', edgecolor="darkgrey", label='hatched')
patch_unhatched = mpatches.Patch(facecolor='beige', hatch=' ', edgecolor="darkgrey", label='non-hatched')
legend_2=plt.legend(handles=[patch_hatched, patch_unhatched], loc='center left', bbox_to_anchor=(1.15, 0.5), fontsize=12)
# as soon as a second legend is made, the first disappears and needs to be added back again
figure.add_artist(legend_1) #python now knows that "figure" must take the "legend_1" along with "legend_2"
I am pretty sure that it can be even more elegant and automated.

Plot gets shifted when using secondary_y

I want to plot temperature and precipitation from a weather station in the same plot with two y-axis. However, when I try this, one of the plots gets shifted for no reason it seems like. This is my code: (I have just tried for two precipitation measurements as of now, but you get the deal.)
ax = m_prec_ra.plot()
ax2 = m_prec_po.plot(kind='bar',secondary_y=True,ax=ax)
ax.set_xlabel('Times')
ax.set_ylabel('Left axes label')
ax2.set_ylabel('Right axes label')
This returns the following plot:
My plot is to be found here
I saw someone asking the same question, but I can't seem to figure out how to manually shift one of my datasets.
Here is my data:
print(m_prec_ra,m_prec_po)
Time
1 0.593436
2 0.532058
3 0.676219
4 1.780795
5 4.956048
6 11.909394
7 17.820051
8 14.225257
9 10.261061
10 2.628336
11 0.240568
12 0.431227
Name: Precipitation (mm), dtype: float64 Time
1 0.704339
2 1.225169
3 1.905223
4 4.156270
5 11.531221
6 22.246230
7 30.133800
8 27.634639
9 20.693056
10 5.282412
11 0.659365
12 0.622562
Name: Precipitation (mm), dtype: float64
The explanation for this behaviour is found in this Q & A.
Here, the solution would be to shift the lines one to the front, i.e. plotting against an index which starts at 0, instead of 1.
import numpy as np; np.random.seed(42)
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({"A" : np.arange(1,11),
"B" : np.random.rand(10),
"C" : np.random.rand(10)})
df.set_index("A", inplace=True)
ax = df.plot(y='B', kind = 'bar', legend = False)
df2 = df.reset_index()
df2.plot(ax = ax, secondary_y = True, y = 'B', kind = 'line')
plt.show()
What version of pandas are you using for this plotting?
Using 0.23.4 running this code:
df1 = pd.DataFrame({'Data_1':[1,2,4,8,16,12,8,4,1]})
df2 = pd.DataFrame({'Data_2':[1,2,4,8,16,12,8,4,1]})
ax = df1.plot()
ax2 = df2.plot(kind='bar',secondary_y=True,ax=ax)
ax.set_xlabel('Times')
ax.set_ylabel('Left axes label')
ax2.set_ylabel('Right axes label')
I get:
If you want to add sample data we could look at that.

creating ternary plots in pandas

I have data that is arranged like the following. This is an example from a dataset with 100s of loci.
loci head(%) tail(%) wing(%)
1 20 40 40
2 10 50 40
3 12 48 40
4 22 38 40
I wish to make a ternary plot for these, with head, tail, and wing making the three points of the triangle. The edges of the triangle would represent the percentages. How can I begin to do this using pandas? Any guidance would be useful.
Using matplotlib and a couple functions from the radar_chart example, we can create a radar chart directly from a dataframe.
Before we read the dataframe, you'll want to copy the imports, radar_factory and unit_poly_verts functions from the example matplotlib provides. You also need pandas, obviously.
Your imports will look like this:
import matplotlib.pyplot as plt
from matplotlib.path import Path
from matplotlib.spines import Spine
from matplotlib.projections.polar import PolarAxes
from matplotlib.projections import register_projection
import pandas as pd
import numpy as np
Since you want only the head, tail and wing, and it looks like loci is an index, I imported the data set with user_col="loci". This means the dataframe looks like this upon import:
head(%) tail(%) wing(%)
loci
1 20 40 40
2 10 50 40
3 12 48 40
4 22 38 40
Finally, you want to create a function that operates similarly to the code in the example, but instead reads the dataframe. The code below should do that and is based on the code in the '__main__' block. I stripped out some of the code that isn't required for this example and unhardcoded the colors:
def nColors(k=2, cmap='spectral'):
if type(cmap) == str:
cm = plt.get_cmap(cmap)
colors = [cm(1.*i/(k-1)) for i in range(k)]
elif cmap==None:
colors = ['k']
else:
colors = cmap
return colors
def plot_radar(data):
N = data.shape[1]
theta = radar_factory(N, frame='circle')
spoke_labels = data.columns.tolist()
fig = plt.figure(figsize=(9, 9))
fig.subplots_adjust(wspace=0.25, hspace=0.20, top=0.85, bottom=0.05)
ax = fig.add_subplot(111, projection='radar')
colors = nColors(len(data), cmap='spectral')
for i, (index, d) in enumerate(data.iterrows()):
ax.plot(theta, d.tolist(), color=colors[i])
ax.fill(theta, d.tolist(), facecolor=colors[i], alpha=0.25)
ax.set_varlabels(spoke_labels)
plt.show()
Call this function and pass your dataframe:
plot_radar(df)
This code uses the spectral color map, but you can change that by passing a valid color map in the colors = nColors(len(data)) line as the second parameter.
You can either have a circle or a polygon (triangle in this case since there are 3 dimensions).
The above code results in a chart like this:
If you change the frame parameter in the line theta = radar_factory(N, frame='circle') to be polygon, you get a chart like this:

Categories

Resources