how to plot two barh in one axis? - python

These is a pandas series with more than 500 items,I pick up the top 10 and bottom 10 to plot in one matplotlib axis,here is the picture I draw manually:
data is here:
bottom10
Out[12]:
0 -9.823127e+08
1 -8.069270e+08
2 -6.030317e+08
3 -5.709379e+08
4 -5.224355e+08
5 -4.755464e+08
6 -4.095561e+08
7 -3.989287e+08
8 -3.885740e+08
9 -3.691114e+08
Name: amount, dtype: float64
top10
Out[13]:
0 9.360520e+08
1 9.078776e+08
2 6.603838e+08
3 4.967611e+08
4 4.409362e+08
5 3.914972e+08
6 3.547471e+08
7 3.538894e+08
8 3.368558e+08
9 3.189895e+08
Name: amount, dtype: float64
top10.barh(top10.index,top10.amount,color='red',align='edge')
bottom10.barh(bottom10.index,bottom10.amount,color='green',align='edge')
Now it shows like this, which is not what I want:
.
What is the right way to plot?

You can do this by creating a twiny Axes, and plotting the bottom10 DataFrame on there.
For example:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# Random data
bottom10 = pd.DataFrame({'amount':-np.sort(np.random.rand(10))})
top10 = pd.DataFrame({'amount':np.sort(np.random.rand(10))[::-1]})
# Create figure and axes for top10
fig,axt = plt.subplots(1)
# Plot top10 on axt
top10.plot.barh(color='red',edgecolor='k',align='edge',ax=axt,legend=False)
# Create twin axes
axb = axt.twiny()
# Plot bottom10 on axb
bottom10.plot.barh(color='green',edgecolor='k',align='edge',ax=axb,legend=False)
# Set some sensible axes limits
axt.set_xlim(0,1.5)
axb.set_xlim(-1.5,0)
# Add some axes labels
axt.set_ylabel('Best items')
axb.set_ylabel('Worst items')
# Need to manually move axb label to right hand side
axb.yaxis.set_label_position('right')
plt.show()

Related

Get the height of the rectangles in a plot

I have the following graph 1 obtained with the following code [2]. As you can see from the first line inside for I gave the height of the rectangles based on the standard deviation value. But I can't figure out how to get the height of the corresponding rectangle. For example given the blue rectangle I would like to return the 2 intervals in which it is included which are approximately 128.8 and 130.6. How can I do this?
[2] The code I used is the following:
import pandas as pd
import matplotlib.ticker as ticker
import matplotlib.pyplot as plt
import numpy as np
dfLunedi = pd.read_csv( "0.lun.csv", encoding = "ISO-8859-1", sep = ';')
dfSlotMean = dfLunedi.groupby('slotID', as_index=False).agg( NLunUn=('date', 'nunique'),NLunTot = ('date', 'count'), MeanBPM=('tempo', 'mean'), std = ('tempo','std') )
#print(dfSlotMean)
dfSlotMean.drop(dfSlotMean[dfSlotMean.NLunUn < 3].index, inplace=True)
df = pd.DataFrame(dfSlotMean)
df.to_csv('1.silLunedi.csv', sep = ';', index=False)
print(df)
bpmMattino = df['MeanBPM']
std = df['std']
listBpm = bpmMattino.tolist()
limInf = df['MeanBPM'] - df['std']
limSup = df['MeanBPM'] + df['std']
tick_spacing = 1
fig, ax = plt.subplots(1, 1)
for _, r in df.iterrows():
#
ax.plot([r['slotID'], r['slotID']+1], [r['MeanBPM']]*2, linewidth = r['std'] )
#ax.plot([r['slotID'], r['slotID']+1], [r['MeanBPM']]*2, linewidth = r['std'])
ax.xaxis.grid(True)
ax.yaxis.grid(True)
ax.yaxis.set_major_locator(ticker.MultipleLocator(tick_spacing))
ax.xaxis.set_major_locator(ticker.MultipleLocator(tick_spacing))
This is the content of the csv:
slotID NMonUnique NMonTot MeanBPM std
0 7 11 78 129.700564 29.323091
2 11 6 63 123.372397 24.049397
3 12 6 33 120.625667 24.029006
4 13 5 41 124.516341 30.814985
5 14 4 43 118.904512 26.205309
6 15 3 13 116.380538 24.336491
7 16 3 42 119.670881 27.416843
8 17 5 40 125.424125 32.215865
9 18 6 45 130.540578 24.437559
10 19 9 58 128.180172 32.099529
11 20 5 44 125.596045 28.060657
I would advise against using linewidth to show anything related to your data. The reason being that linewidth is measured in "points" (see the matplotlib documentation), the size of which are not related to the xy-space that you plot your data in. To see this in action, try plotting with different linewidths and changing the size of the plotting-window. The linewidth will not change with the axes.
Instead, if you do indeed want a rectangle, I suggest using matplotlib.patches.Rectangle. There is a good example of how to do that in the documentation, and I've also added an even shorter example below.
To give the rectangles different colors, you can do as here here and simply get a random tuple with 3 elements and use that for the color. Another option is to take a list of colors, for example the TABLEAU_COLORS from matplotlib.colors and take consecutive colors from that list. The latter may be better for testing, as the rectangles will get the same color for each run, but notice that there are just 10 colors in TABLEAU_COLORS, so you will have to cycle if you have more than 10 rectangles.
import matplotlib.pyplot as plt
import matplotlib.patches as ptc
import random
x = 3
y = 4.5
y_std = 0.3
fig, ax = plt.subplots()
for i in range(10):
c = tuple(random.random() for i in range(3))
# The other option as comment here
#c = mcolors.TABLEAU_COLORS[list(mcolors.TABLEAU_COLORS.keys())[i]]
rect = ptc.Rectangle(xy=(x, y-y_std), width=1, height=2*y_std, color=c)
ax.add_patch(rect)
ax.set_xlim((0,10))
ax.set_ylim((0,5))
plt.show()
If you define the height as the standard deviation, and the center is at the mean, then the interval should be [mean-(std/2) ; mean+(std/2)] for each rectangle right? Is it intentional that the rectangles overlap? If not, I think it is your use of linewidth to size the rectangles which is at fault. If the plot is there to visualize the mean and variance of the different categories something like a boxplot or raincloud plot might be better.

How do I plot stacked barplots side by side in python? (preferentially seaborn)

I'm looking a way to plot side by side stacked barplots to compare host composition of positive (Condition==True) and total cases in each country from my dataframe.
Here is a sample of the DataFrame.
id Location Host genus_name #ofGenes Condition
1 Netherlands Homo sapiens Escherichia 4.0 True
2 Missing Missing Klebsiella 3.0 True
3 Missing Missing Aeromonas 2.0 True
4 Missing Missing Glaciecola 2.0 True
5 Antarctica Missing Alteromonas 2.0 True
6 Indian Ocean Missing Epibacterium 2.0 True
7 Missing Missing Klebsiella 2.0 True
8 China Homo sapiens Escherichia 0 False
9 Missing Missing Escherichia 2.0 True
10 China Plantae kingdom Pantoea 0 False
11 China Missing Escherichia 2.0 True
12 Pacific Ocean Missing Halomonas 0 False
I need something similar to the image bellow, but I want to plot in percentage.
Can anyone help me?
I guess what you want is a stacked categorical bar plot, which cannot be directly plotted using seaborn. But you can achieve it by customizing one.
Import some necessary packages.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.patches import Patch
Read the dataset. Considering your sample data is too small, I randomly generate some to make the plot looks good.
def gen_fake_data(data, size=400):
unique_values = []
for c in data.columns:
unique_values.append(data[c].unique())
new_data = pd.DataFrame({c: np.random.choice(unique_values[i], size=size)
for i, c in enumerate(data.columns)})
new_data = pd.concat([data, new_data])
new_data['id'] = new_data.index + 1
return new_data
data = pd.read_csv('data.csv')
new_data = gen_fake_data(data)
Define the stacked categorical bar plot
def stack_catplot(x, y, cat, stack, data, palette=sns.color_palette('Reds')):
ax = plt.gca()
# pivot the data based on categories and stacks
df = data.pivot_table(values=y, index=[cat, x], columns=stack,
dropna=False, aggfunc='sum').fillna(0)
ncat = data[cat].nunique()
nx = data[x].nunique()
nstack = data[stack].nunique()
range_x = np.arange(nx)
width = 0.8 / ncat # width of each bar
for i, c in enumerate(data[cat].unique()):
# iterate over categories, i.e., Conditions
# calculate the location of each bar
loc_x = (0.5 + i - ncat / 2) * width + range_x
bottom = 0
for j, s in enumerate(data[stack].unique()):
# iterate over stacks, i.e., Hosts
# obtain the height of each stack of a bar
height = df.loc[c][s].values
# plot the bar, you can customize the color yourself
ax.bar(x=loc_x, height=height, bottom=bottom, width=width,
color=palette[j + i * nstack], zorder=10)
# change the bottom attribute to achieve a stacked barplot
bottom += height
# make xlabel
ax.set_xticks(range_x)
ax.set_xticklabels(data[x].unique(), rotation=45)
ax.set_ylabel(y)
# make legend
plt.legend([Patch(facecolor=palette[i]) for i in range(ncat * nstack)],
[f"{c}: {s}" for c in data[cat].unique() for s in data[stack].unique()],
bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.)
plt.grid()
plt.show()
Let's plot!
plt.figure(figsize=(6, 3), dpi=300)
stack_catplot(x='Location', y='#ofGenes', cat='Condition', stack='Host', data=new_data)
If you want to plot in percentile, calculate it in the raw dataset.
total_genes = new_data.groupby(['Location', 'Condition'], as_index=False)['#ofGenes'].sum().rename(
columns={'#ofGenes': 'TotalGenes'})
new_data = new_data.merge(total_genes, how='left')
new_data['%ofGenes'] = new_data['#ofGenes'] / new_data['TotalGenes'] * 100
plt.figure(figsize=(6, 3), dpi=300)
stack_catplot(x='Location', y='%ofGenes', cat='Condition', stack='Host', data=new_data)
You didn't specify how you would like to stack the bars, but you should be able to do something like this...
df = pd.read_csv('data.csv')
agg_df = df.pivot_table(index='Location', columns='Host', values='Condition', aggfunc='count')
agg_df.plot(kind='bar', stacked=True)

Python Pandas - Plotting multiple Bar plots by category from dataframe

I have dataframe which looks like
df = pd.DataFrame(data={'ID':[1,1,1,2,2,2], 'Value':[13, 12, 15, 4, 2, 3]})
Index ID Value
0 1 13
1 1 12
2 1 15
3 2 4
4 2 2
5 2 3
and I want to plot it by the IDs (categories) so that each category would have different bar plot,
so in this case I would have two figures,
one figure with bar plot of ID=1,
and second separate figure bar plot of ID=2.
Can I do it (preferably without loops) with something like df.plot(y='Value', kind='bar')?
2 options are possible, one using matplotlib and the other seaborn that you should absolutely now as it works well with Pandas.
Pandas with matplotlib
You have to create a subplot with a number of columns and rows you set. It gives an array axes in 1-D if either nrows or ncols is set to 1, or in 2-D otherwise. Then, you give this object to the Pandas plot method.
If the number of categories is not known or high, you need to use a loop.
import pandas as pd
import matplotlib.pyplot as plt
fig, axes = plt.subplots( nrows=1, ncols=2, sharey=True )
df.loc[ df["ID"] == 1, 'Value' ].plot.bar( ax=axes[0] )
df.loc[ df["ID"] == 2, 'Value' ].plot.bar( ax=axes[1] )
plt.show()
Pandas with seaborn
Seaborn is the most amazing graphical tool that I know. The function catplot enables to plot a series of graph according to the values of a column when you set the argument col. You can select the type of plot with kind.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('white')
df['index'] = [1,2,3] * 2
sns.catplot(kind='bar', data=df, x='index', y='Value', col='ID')
plt.show()
I added a column index in order to compare with the df.plot.bar. If you don't want to, remove x='index' and it will display an unique bar with errors.

Plot gets shifted when using secondary_y

I want to plot temperature and precipitation from a weather station in the same plot with two y-axis. However, when I try this, one of the plots gets shifted for no reason it seems like. This is my code: (I have just tried for two precipitation measurements as of now, but you get the deal.)
ax = m_prec_ra.plot()
ax2 = m_prec_po.plot(kind='bar',secondary_y=True,ax=ax)
ax.set_xlabel('Times')
ax.set_ylabel('Left axes label')
ax2.set_ylabel('Right axes label')
This returns the following plot:
My plot is to be found here
I saw someone asking the same question, but I can't seem to figure out how to manually shift one of my datasets.
Here is my data:
print(m_prec_ra,m_prec_po)
Time
1 0.593436
2 0.532058
3 0.676219
4 1.780795
5 4.956048
6 11.909394
7 17.820051
8 14.225257
9 10.261061
10 2.628336
11 0.240568
12 0.431227
Name: Precipitation (mm), dtype: float64 Time
1 0.704339
2 1.225169
3 1.905223
4 4.156270
5 11.531221
6 22.246230
7 30.133800
8 27.634639
9 20.693056
10 5.282412
11 0.659365
12 0.622562
Name: Precipitation (mm), dtype: float64
The explanation for this behaviour is found in this Q & A.
Here, the solution would be to shift the lines one to the front, i.e. plotting against an index which starts at 0, instead of 1.
import numpy as np; np.random.seed(42)
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({"A" : np.arange(1,11),
"B" : np.random.rand(10),
"C" : np.random.rand(10)})
df.set_index("A", inplace=True)
ax = df.plot(y='B', kind = 'bar', legend = False)
df2 = df.reset_index()
df2.plot(ax = ax, secondary_y = True, y = 'B', kind = 'line')
plt.show()
What version of pandas are you using for this plotting?
Using 0.23.4 running this code:
df1 = pd.DataFrame({'Data_1':[1,2,4,8,16,12,8,4,1]})
df2 = pd.DataFrame({'Data_2':[1,2,4,8,16,12,8,4,1]})
ax = df1.plot()
ax2 = df2.plot(kind='bar',secondary_y=True,ax=ax)
ax.set_xlabel('Times')
ax.set_ylabel('Left axes label')
ax2.set_ylabel('Right axes label')
I get:
If you want to add sample data we could look at that.

Manipulating Dates in x-axis Pandas Matplotlib

I have a pretty simple set of data as displayed below. I am looking for a way to plot this stacked bar chart and format the x-axis (dates) so it starts at 1996-31-12 and ends at 2016-31-12 on increments of 365 days. The code I have written is plotting every single date and therefore the x-axis is very bunched up and not readable.
Datafame:
Date A B
1996-31-12 10 3
1997-31-03 5 6
1997-31-07 7 5
1997-30-11 3 12
1997-31-12 4 10
1998-31-03 5 8
.
.
.
2016-31-12 3 9
This is a similar question: Pandas timeseries plot setting x-axis major and minor ticks and labels
You can manage this using matplotlib itself instead of pandas.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
# if your dates are strings you need this step
df.Date = pd.to_datetime(df.Date)
fig,ax = plt.subplots()
ax.plot_date(df.Date,df.A)
ax.plot_date(df.Date,df.B)
ax.xaxis.set_major_locator(mdates.YearLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b\n%Y'))
plt.show()

Categories

Resources