plot data with different scale on same y axis on subplots - python

I have a dataframe with variable scale data, I am trying to get a plot with subplots. something like this.
raw_data = {'strike_date': ['2019-10-31', '2019-11-31','2019-12-31','2020-01-31', '2020-02-31'],
'strike': [100.00, 113.00, 125.00, 126.00, 135.00],
'lastPrice': [42, 32, 36, 18, 23],
'volume': [4, 24, 31, 2, 3],
'openInterest': [166, 0, 0, 62, 12]}
ploty_df = pd.DataFrame(raw_data, columns = ['strike_date', 'strike', 'lastPrice', 'volume', 'openInterest'])
ploty_df
strike_date strike lastPrice volume openInterest
0 2019-10-31 100.0 42 4 166
1 2019-11-31 113.0 32 24 0
2 2019-12-31 125.0 36 31 0
3 2020-01-31 126.0 18 2 62
4 2020-02-31 135.0 23 3 12
this is what I tried so far with a twinx, if you noticed the out put is a flat data without any scale difference for strike and volume.
fig, ax = plt.subplots()
fig.subplots_adjust(right=0.75)
mm = ax.twinx()
yy = ax.twinx()
for col in ploty_df.columns:
mm.plot(ploty_df.index,ploty_df[[col]],label=col)
mm.set_ylabel('volume')
yy.set_ylabel('strike')
yy.spines["right"].set_position(("axes", 1.2))
yy.set_ylim(mm.get_ylim()[0]*12, mm.get_ylim()[1]*12)
plt.tick_params(axis='both', which='major', labelsize=16)
handles, labels = mm.get_legend_handles_labels()
mm.legend(fontsize=14, loc=6)
plt.show()
and the output

the main problem with your script is that you are generating 3 axes but only plotting on one of them, you need to think of each axes as a separate object with its own y-scale, y-limit and so. So for example in your script when you call fig, ax = plt.subplots() you generate the first axes that you call ax (this is the standard yaxis with the scale on the left-side of your plot). If you want to plot something on this axes you should call ax.plot() but in your case you are plotting everything on the axes that you called mm.
I think you should really go through the matplotlib documentation do understand these concepts better. For plotting on multiple y-axis I would recommend you to have a look at this example.
Below you can find a basic example to plot your data on 3 different y-axis, you can take it as a starting point to produce the graph you are looking for.
#convert the index of your dataframe to datetime
plot_df.index=pd.DatetimeIndex(plot_df.strike_date)
fig, ax = plt.subplots(figsize=(15,7))
fig.subplots_adjust(right=0.75)
l1,=ax.plot(plot_df['strike'],'r')
ax.set_ylabel('Stike')
ax2=ax.twinx()
l2,=ax2.plot(plot_df['lastPrice'],'g')
ax2.set_ylabel('lastPrice')
ax3=ax.twinx()
l3,=ax3.plot(plot_df['volume'],'b')
ax3.set_ylabel('volume')
ax3.spines["right"].set_position(("axes", 1.2))
ax3.spines["right"].set_visible(True)
ax.legend((l1,l2,l3),('Stike','lastPrice','volume'),loc='center left')
here the result:
p.s. Your example dataframe contains non existing dates (31st February 2020) so you have to modify those in order to be able to convert the index to datetime.

Related

Sort boxplot and colour by pairs

I have some data for conditions that go together by pairs, structured like this:
mydata = {
"WT_before": [11,12,13],
"WT_after": [16,17,18],
"MRE11_before": [21,22,23,24,25],
"MRE11_after": [26,27,28,29,30],
"NBS1_before": [31,32,33,34],
"NBS1_after": [36,37,38,39]
}
(my real data has more conditions and more values per condition, this is just an example)
I looked into colouring the boxplots by pairs to help reading the figure, but it seemed quite convoluted to do in matplotlib.
For the moment I'm doing it this way:
bxplt_labels, bxplt_data = mydata.keys(), mydata.values()
bxplt_colors = ["pink", "pink", "lightgreen", "lightgreen", "lightblue", "lightblue"]
fig2, ax = plt.subplots(figsize=(20, 10), dpi=500)
bplot = plt.boxplot(bxplt_data, vert=False, showfliers=False, notch=False, patch_artist=True,)
for patch, color in zip(bplot['boxes'], bxplt_colors):
patch.set_facecolor(color)
plt.yticks(range(1, len(bxplt_labels) + 1), bxplt_labels)
fig2.show()
which produces the figure:
I would like:
to sort the condition names, so that I can order them to my choosing, and
to get a more elegant way of choosing the colours used, in particular because I will need to reuse this data for more figures afterwards (like scatterplot before/after for each condition)
If it is needed, I can rearrange the data structure, but each condition doesn't have the same number of values, so a dictionary seemed like the best option for me. Alternatevely, I can use seaborn, which I saw has quite a few possibilities, but I'm not familiar with it, so I would need more time to understand it.
Could you help me to figure out?
Seaborn works easiest with a dataframe in "long form". In this case, there would be rows with the condition repeated for every value with that condition.
Seaborn's boxplot accepts an order= keyword, where you can change the order of the x-values. E.g. order=sorted(mydata.keys()) to sort the values alphabetically. Or list(mydata.keys())[::-1] to use the original order, but reversed. The default order would be how the values appear in the dataframe.
For a horizontal boxplot, you can use x='value', y='condition'. The order will apply to either x or y, depending on which column contains strings.
For coloring, you can use the palette= keyword. This can either be a string indicating one of matplotlib's or seaborn's colormaps. Or it can be a list of colors. Many more options are possible.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
mydata = {
"WT_before": [11, 12, 13],
"WT_after": [16, 17, 18],
"MRE11_before": [21, 22, 23, 24, 25],
"MRE11_after": [26, 27, 28, 29, 30],
"NBS1_before": [31, 32, 33, 34],
"NBS1_after": [36, 37, 38, 39]
}
df = pd.DataFrame([[k, val] for k, vals in mydata.items() for val in vals],
columns=['condition', 'value'])
fig, ax = plt.subplots(figsize=(12, 5))
sns.boxplot(data=df, x='condition', y='value',
order=['WT_before', 'WT_after', 'MRE11_before', 'MRE11_after', 'NBS1_before', 'NBS1_after'],
palette='turbo', ax=ax)
plt.tight_layout()
plt.show()
Here is an example with horizontal boxes:
sns.boxplot(data=df, x='value', y='condition', palette='Paired')
sns.despine()
plt.xlabel('')
plt.ylabel('')
plt.tight_layout()
plt.show()
The dataframe would look like:
condition
value
0
WT_before
11
1
WT_before
12
2
WT_before
13
3
WT_after
16
4
WT_after
17
5
WT_after
18
6
MRE11_before
21
7
MRE11_before
22
8
MRE11_before
23
9
MRE11_before
24
10
MRE11_before
25
11
MRE11_after
26
12
MRE11_after
27
13
MRE11_after
28
14
MRE11_after
29
15
MRE11_after
30
16
NBS1_before
31
17
NBS1_before
32
18
NBS1_before
33
19
NBS1_before
34
20
NBS1_after
36
21
NBS1_after
37
22
NBS1_after
38
23
NBS1_after
39

Matplotlib line and bar in the same chart

I have a pandas dataframe with this easy structure:
Month
Energy
Percentage
Jan
10
0.5
Feb
11
0.6
March
13
0.71
April
15
0.73
May
18
0.81
June
20
0.85
July
24
0.91
August
28
0.93
September
24
0.81
November
17
0.71
December
15
0.6
And I want to plot the energy in bar and the percentaje in a line all in the same chart with two Y axis one for the Energy and other for the percentage. The final result I want to seems like in the picture below:
I'm also interested in show the fixed axis X with all the months even if this month doesnt have values yet
Hope you can help me
Thanks!
The code below solves your problem. So basically using the twinx() function you are able to create the 2nd access on the same plot. I copied and pasted the data you provided in your question to an excel file named Book1.xlsx.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_excel("Book1.xlsx")
df['Percentage'] = df['Percentage'] * 100
# create figure and axis objects with subplots()
fig,ax = plt.subplots()
fig = plt.figure(figsize=(60, 50))
# make a plot
ax.plot(df.Month, df.Percentage,
marker="o",
color="red")
# set x-axis label
ax.set_xlabel("Months", fontsize = 14)
# set y-axis label
ax.set_ylabel("Percentage",
color="red",
fontsize=14)
# twin object for two different y-axis on the sample plot
ax2=ax.twinx()
# make a plot with different y-axis using second axis object
ax.bar(df.Month ,df.Energy )
ax2.set_ylabel("Energy",color="blue",fontsize=14)
plt.setp(ax.get_xticklabels(), rotation=40, horizontalalignment='right')
plt.show()

Plotting line plot on top of bar plot in Python / matplotlib from dataframe

I am trying to plot a line plot on top of a stacked bar plot in matplotlib, but cannot get them both to show up.
I have the combined dataframe already set up by pulling various information from other dataframes and with the datetime index. I am trying to plot a stacked bar plot from the activity columns (LightlyActive, FairlyActive, VeryActive) and several line plots from the minutes in each sleep cycle (wake, light, deep, rem) on one set of axes (ax1). I am then trying to plot the efficiency column as a line plot on a separate set of axes (ax2).
I cannot get both the stacked bar plot and the line plots to show up simultaneously. If I plot the bar plot second, that is the only one that shows up. If I plot the line plots first (activity and efficiency) those are the only ones that show up. It seems like whichever style of plot I plot second covers up the first one.
LightlyActive FairlyActive VeryActive efficiency wake light deep rem
dateTime
2018-04-10 314 34 123 93.0 55.0 225.0 72.0 99.0
2018-04-11 253 22 102 96.0 44.0 260.0 50.0 72.0
2018-04-12 282 26 85 93.0 47.0 230.0 60.0 97.0
2018-04-13 292 35 29 96.0 43.0 205.0 81.0 85.0
fig, ax1 = plt.subplots(figsize = (10, 10))
temp_df[['LightlyActive', 'FairlyActive', 'VeryActive']].plot(kind = 'bar', stacked = True, ax = ax1)
ax2 = plt.twinx(ax = ax1)
temp_df[['wake', 'light', 'deep', 'rem']].plot(ax = ax1)
temp_df['efficiency'].plot(ax = ax2)
plt.show()
I would like to have on single plot with a stacked bar plot of activity levels ('LightlyActive', 'FairlyActive', 'VeryActive') and sleep cycles ('wake', 'light', 'deep', 'rem') on one set of axes, and sleep efficiency on a second set of axes.
EDIT
I am not even getting it to display as Trenton did in the edited version below (designated as "Edited by Trenton M"). The 2 plots immediately below this are the versions that display for me.
This is what I get so far (Edited by Trenton M):
Note the circled areas.
Figured it out! By leaving the dates as a column (i.e. not setting them as the index), I can plot both the line plot and bar plot. I can then go back and adjust labels accordingly.
#ScottBoston your x-axis tipped me off. Thanks for looking into this.
date1 = pd.datetime(2018, 4, 10)
data = {'LightlyActive': [314, 253, 282, 292],
'FairlyActive': [34, 22, 26, 35],
'VeryActive': [123, 102, 85, 29],
'efficiency': [93.0, 96.0, 93.0, 96.0],
'wake': [55.0, 44.0, 47.0, 43.0],
'light': [225.0, 260.0, 230.0, 205.0],
'deep': [72.0, 50.0, 60.0, 81.0],
'rem': [99.0, 72.0, 97.0, 85.0],
'date': [date1 + pd.Timedelta(days = i) for i in range(4)]}
temp_df = pd.DataFrame(data)
fig, ax1 = plt.subplots(figsize = (10, 10))
ax2 = plt.twinx(ax = ax1)
temp_df[['LightlyActive', 'FairlyActive', 'VeryActive']].\
plot(kind = 'bar', stacked = True, ax = ax1)
temp_df[['wake', 'light', 'deep', 'rem']].plot(ax = ax1, alpha = 0.5)
temp_df['efficiency'].plot(ax = ax2)
ax1.set_xticklabels(labels = temp_df['date'])
plt.show()
What about using alpha?
fig, ax1 = plt.subplots(figsize = (10, 10))
temp_df[['LightlyActive', 'FairlyActive', 'VeryActive']].plot(kind = 'bar', stacked = True, ax = ax1, alpha=.3)
ax2 = plt.twinx(ax = ax1)
temp_df[['wake', 'light', 'deep', 'rem']].plot(ax = ax1, zorder=10)
temp_df['efficiency'].plot(ax = ax2)
plt.show()
Output:

plot dataframe with two y-axes

I have the following dataframe:
land_cover 1 2 3 4 5 6 size
0 20 19.558872 6.856950 3.882243 1.743048 1.361306 1.026382 16.520265
1 30 9.499454 3.513521 1.849498 0.836386 0.659660 0.442690 8.652517
2 40 10.173790 3.123167 1.677257 0.860317 0.762718 0.560290 11.925280
3 50 10.098777 1.564575 1.280729 0.894287 0.884028 0.887448 12.647710
4 60 6.166109 1.588687 0.667839 0.230659 0.143044 0.070628 2.160922
5 110 17.846565 3.884678 2.202129 1.040551 0.843709 0.673298 30.406541
I want to plot the data in the way that:
. land_cover is the x-axis
. cols 1 - 6 should be stacked bar plots per land_cover class (row)
. and the column 'size' should be a second y-axis and could be a simple point symbol for every row and additionally a smooth line connecting the points
Any ideas?
Your code is pretty fine. I only add two more lines
import matplotlib.pyplot as plt
df.plot(x="land_cover", y=[1, 2, 3, 4, 5, 6], stacked=True, kind="bar")
ax = df['size'].plot(secondary_y=True, color='k', marker='o')
ax.set_ylabel('size')
plt.show()
In general just add one extra argument to your plot call: secondary_y=['size'].
In this case a separate plot is easier though, because of line vs bars etc.

Scatter plot with custom ticks

I want to do a scatter plot of a wavelength (float) in y-axis and spectral class (list of character/string) in x-axis, labels = ['B','A','F','G','K','M']. Data are saved in pandas dataframe, df.
df['Spec Type Index']
0 NaN
1 A
2 G
. .
. .
167 K
168 Nan
169 G
Then,
df['Disk Major Axis "']
0 4.30
1 4.50
2 22.00
. .
. .
167 1.32
168 0.28
169 25.00
Thus, I thought this should be done simply with
plt.scatter(df['Spec Type Index'], df['Disk Major Axis "'])
But I get this annoying error
could not convert string to float: 'G'
After fixing this, I want to make custom xticks as follows. However, how can I
labels = ['B','A','F','G','K','M']
ticks = np.arange(len(labels))
plt.xticks(ticks, labels)
First, I think you have to map those strings to integers then matplotlib can decide where to place those points.
labels = ['B','A','F','G','K','M']
mapping = {'B': 0,'A': 1,'F': 2,'G': 3,'K': 4,'M': 5}
df = df.replace({'Spec Type Index': mapping})
Then plot the scatter,
fig, ax = plt.subplots()
ax.scatter(df['Spec Type Index'], df['Disk Major Axis "'])
Finally,
ax.set_xticklabels(labels)

Categories

Resources