Matplotlib line and bar in the same chart - python

I have a pandas dataframe with this easy structure:
Month
Energy
Percentage
Jan
10
0.5
Feb
11
0.6
March
13
0.71
April
15
0.73
May
18
0.81
June
20
0.85
July
24
0.91
August
28
0.93
September
24
0.81
November
17
0.71
December
15
0.6
And I want to plot the energy in bar and the percentaje in a line all in the same chart with two Y axis one for the Energy and other for the percentage. The final result I want to seems like in the picture below:
I'm also interested in show the fixed axis X with all the months even if this month doesnt have values yet
Hope you can help me
Thanks!

The code below solves your problem. So basically using the twinx() function you are able to create the 2nd access on the same plot. I copied and pasted the data you provided in your question to an excel file named Book1.xlsx.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_excel("Book1.xlsx")
df['Percentage'] = df['Percentage'] * 100
# create figure and axis objects with subplots()
fig,ax = plt.subplots()
fig = plt.figure(figsize=(60, 50))
# make a plot
ax.plot(df.Month, df.Percentage,
marker="o",
color="red")
# set x-axis label
ax.set_xlabel("Months", fontsize = 14)
# set y-axis label
ax.set_ylabel("Percentage",
color="red",
fontsize=14)
# twin object for two different y-axis on the sample plot
ax2=ax.twinx()
# make a plot with different y-axis using second axis object
ax.bar(df.Month ,df.Energy )
ax2.set_ylabel("Energy",color="blue",fontsize=14)
plt.setp(ax.get_xticklabels(), rotation=40, horizontalalignment='right')
plt.show()

Related

Create one boxplot per cluster for each column of information for a dataframe

Let it be the following Python Panda DataFrame:
value
other_value
cluster
1382
2.1
0
10
3.9
1
104
5.9
1
82
-1.1
0
100
0.9
2
1003
0.85
2
232
4.1
0
19
0.6
3
1434
0.3
3
23
1.6
3
Using the seaborn module, I want to display a set of boxplots for each column of values, showing the comparative information per value of the cluster column.
That is, for the above DataFrame, it would show a first graph for the 'value' column with 4 boxplots, one for each cluster value. The second graph would include information for the 'other_value' column also showing 1 boxplot for each cluster.
My idea is to do the same, but instead of in R language, in python: Boxplots of different variables by cluster assigned on one graph in ggplot
My code, It only shows the 1 to 1 graphs, I would like to get a joint graph with all graphs applied, as in the link above:
sns.boxplot(y='value', x='cluster',
data=df,
palette="colorblind",
hue='cluster')
Thanks for the help offered.
Most seaborn functions work best with the data in "long form".
Here is how the code could look like:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
df = pd.read_html('https://stackoverflow.com/questions/72301993/')[0]
df_long = df.melt(id_vars='cluster', value_vars=df.columns[:-1], var_name='variable', value_name='values')
sns.catplot(kind='box', data=df_long,
col='variable', y='values', x='cluster', hue='cluster', palette="colorblind", sharey=False, colwrap=2)
plt.tight_layout()
plt.show()

Why are bars missing in my stacked bar chart -- Python w/matplotlib

all.
I am trying to create a stacked bar chart built using time series data. My issue -- if I plot my data as time series (using lines) then everything works fine and I get a (messy) time series graph that includes correct dates. However, if I instead try to plot this as a stacked bar chart, my dates disappear and none of my bars appear.
I have tried messing with the indexing, height, and width of the bars. No luck.
Here is my code:
import pylab
import pandas as pd
import matplotlib.pyplot as plt
df1= pd.read_excel('pathway/filename.xls')
df1.set_index('TIME', inplace=True)
ax = df1.plot(kind="Bar", stacked=True)
ax.set_xlabel("Date")
ax.set_ylabel("Change in Yield")
df1.sum(axis=1).plot( ax=ax, color="k", title='Historical Decomposition -- 1 year -- One-Quarter Revision')
plt.axhline(y=0, color='r', linestyle='-')
plt.show()
If i change
ax = df1.plot(kind="Bar", stacked=True)
to ax = df1.plot(kind="line", stacked=False)
I get:
if instead I use ax = df1.plot(kind="Bar", stacked=True)
I get:
Any thoughts here?
Without knowing what the data looks like, I'd try something like this:
#Import data here and generate DataFrame
print(df.head(5))
A B C D
DATE
2020-01-01 -0.01 0.06 0.40 0.45
2020-01-02 -0.02 0.05 0.39 0.42
2020-01-03 -0.03 0.04 0.38 0.39
2020-01-04 -0.04 0.03 0.37 0.36
2020-01-05 -0.05 0.02 0.36 0.33
f, ax = plt.subplots()
ax.bar(df.index, df['A'])
ax.bar(df.index, df['B'])
ax.bar(df.index, df['C'], bottom=df['B'])
ax.plot(df.index, df['D'], color='black', linewidth=2)
ax.set_xlabel('Date')
ax.set_ylabel('Change in Yield')
ax.axhline(y=0, color='r')
ax.set_xticks([])
ax.legend()
plt.show()
Edit:: Ok, I've found a way looking at this post here:
Plot Pandas DataFrame as Bar and Line on the same one chart
Try resetting the index so that it is a separate column. In my example, it is called 'DATE'. Then try:
ax = df[['DATE','D']].plot(x='DATE',color='black')
df[['DATE','A','B','C']].plot(x='DATE', kind='bar',stacked=True,ax=ax)
ax.axhline(y=0, color='r')
ax.set_xticks([])
ax.set_xlabel('Date')
ax.set_ylabel('Change in Yield')
ax.legend()
plt.show()

plot data with different scale on same y axis on subplots

I have a dataframe with variable scale data, I am trying to get a plot with subplots. something like this.
raw_data = {'strike_date': ['2019-10-31', '2019-11-31','2019-12-31','2020-01-31', '2020-02-31'],
'strike': [100.00, 113.00, 125.00, 126.00, 135.00],
'lastPrice': [42, 32, 36, 18, 23],
'volume': [4, 24, 31, 2, 3],
'openInterest': [166, 0, 0, 62, 12]}
ploty_df = pd.DataFrame(raw_data, columns = ['strike_date', 'strike', 'lastPrice', 'volume', 'openInterest'])
ploty_df
strike_date strike lastPrice volume openInterest
0 2019-10-31 100.0 42 4 166
1 2019-11-31 113.0 32 24 0
2 2019-12-31 125.0 36 31 0
3 2020-01-31 126.0 18 2 62
4 2020-02-31 135.0 23 3 12
this is what I tried so far with a twinx, if you noticed the out put is a flat data without any scale difference for strike and volume.
fig, ax = plt.subplots()
fig.subplots_adjust(right=0.75)
mm = ax.twinx()
yy = ax.twinx()
for col in ploty_df.columns:
mm.plot(ploty_df.index,ploty_df[[col]],label=col)
mm.set_ylabel('volume')
yy.set_ylabel('strike')
yy.spines["right"].set_position(("axes", 1.2))
yy.set_ylim(mm.get_ylim()[0]*12, mm.get_ylim()[1]*12)
plt.tick_params(axis='both', which='major', labelsize=16)
handles, labels = mm.get_legend_handles_labels()
mm.legend(fontsize=14, loc=6)
plt.show()
and the output
the main problem with your script is that you are generating 3 axes but only plotting on one of them, you need to think of each axes as a separate object with its own y-scale, y-limit and so. So for example in your script when you call fig, ax = plt.subplots() you generate the first axes that you call ax (this is the standard yaxis with the scale on the left-side of your plot). If you want to plot something on this axes you should call ax.plot() but in your case you are plotting everything on the axes that you called mm.
I think you should really go through the matplotlib documentation do understand these concepts better. For plotting on multiple y-axis I would recommend you to have a look at this example.
Below you can find a basic example to plot your data on 3 different y-axis, you can take it as a starting point to produce the graph you are looking for.
#convert the index of your dataframe to datetime
plot_df.index=pd.DatetimeIndex(plot_df.strike_date)
fig, ax = plt.subplots(figsize=(15,7))
fig.subplots_adjust(right=0.75)
l1,=ax.plot(plot_df['strike'],'r')
ax.set_ylabel('Stike')
ax2=ax.twinx()
l2,=ax2.plot(plot_df['lastPrice'],'g')
ax2.set_ylabel('lastPrice')
ax3=ax.twinx()
l3,=ax3.plot(plot_df['volume'],'b')
ax3.set_ylabel('volume')
ax3.spines["right"].set_position(("axes", 1.2))
ax3.spines["right"].set_visible(True)
ax.legend((l1,l2,l3),('Stike','lastPrice','volume'),loc='center left')
here the result:
p.s. Your example dataframe contains non existing dates (31st February 2020) so you have to modify those in order to be able to convert the index to datetime.

plot dataframe with two y-axes

I have the following dataframe:
land_cover 1 2 3 4 5 6 size
0 20 19.558872 6.856950 3.882243 1.743048 1.361306 1.026382 16.520265
1 30 9.499454 3.513521 1.849498 0.836386 0.659660 0.442690 8.652517
2 40 10.173790 3.123167 1.677257 0.860317 0.762718 0.560290 11.925280
3 50 10.098777 1.564575 1.280729 0.894287 0.884028 0.887448 12.647710
4 60 6.166109 1.588687 0.667839 0.230659 0.143044 0.070628 2.160922
5 110 17.846565 3.884678 2.202129 1.040551 0.843709 0.673298 30.406541
I want to plot the data in the way that:
. land_cover is the x-axis
. cols 1 - 6 should be stacked bar plots per land_cover class (row)
. and the column 'size' should be a second y-axis and could be a simple point symbol for every row and additionally a smooth line connecting the points
Any ideas?
Your code is pretty fine. I only add two more lines
import matplotlib.pyplot as plt
df.plot(x="land_cover", y=[1, 2, 3, 4, 5, 6], stacked=True, kind="bar")
ax = df['size'].plot(secondary_y=True, color='k', marker='o')
ax.set_ylabel('size')
plt.show()
In general just add one extra argument to your plot call: secondary_y=['size'].
In this case a separate plot is easier though, because of line vs bars etc.

Scatter plot with custom ticks

I want to do a scatter plot of a wavelength (float) in y-axis and spectral class (list of character/string) in x-axis, labels = ['B','A','F','G','K','M']. Data are saved in pandas dataframe, df.
df['Spec Type Index']
0 NaN
1 A
2 G
. .
. .
167 K
168 Nan
169 G
Then,
df['Disk Major Axis "']
0 4.30
1 4.50
2 22.00
. .
. .
167 1.32
168 0.28
169 25.00
Thus, I thought this should be done simply with
plt.scatter(df['Spec Type Index'], df['Disk Major Axis "'])
But I get this annoying error
could not convert string to float: 'G'
After fixing this, I want to make custom xticks as follows. However, how can I
labels = ['B','A','F','G','K','M']
ticks = np.arange(len(labels))
plt.xticks(ticks, labels)
First, I think you have to map those strings to integers then matplotlib can decide where to place those points.
labels = ['B','A','F','G','K','M']
mapping = {'B': 0,'A': 1,'F': 2,'G': 3,'K': 4,'M': 5}
df = df.replace({'Spec Type Index': mapping})
Then plot the scatter,
fig, ax = plt.subplots()
ax.scatter(df['Spec Type Index'], df['Disk Major Axis "'])
Finally,
ax.set_xticklabels(labels)

Categories

Resources