Matplotlib x-axis limited range - python

So I've been trying to plot some data. The x-axis is limited to two years. My question is pretty simple can someones explain why X-axis is limited to date range from 2015Q1 - 2017Q1, when the available data is between 2015Q1 - 2020Q1. Is there something missing or incorrect with my code?
dd2
qtr median count
0 2015Q1 1290000.0 27
1 2015Q2 1330000.0 43
2 2015Q3 1570000.0 21
3 2015Q4 1371000.0 20
4 2016Q1 1386500.0 20
5 2016Q2 1767500.0 22
6 2016Q3 1427500.0 32
7 2016Q4 1501000.0 31
8 2017Q1 1700000.0 29
9 2017Q2 1630000.0 15
10 2017Q3 1687500.0 24
11 2017Q4 1450000.0 15
12 2018Q1 1505000.0 13
13 2018Q2 1494000.0 14
14 2018Q3 1415000.0 21
15 2018Q4 1150000.0 15
16 2019Q1 1228000.0 15
17 2019Q2 1352500.0 12
18 2019Q3 1237500.0 12
19 2019Q4 1455000.0 26
20 2020Q1 1468000.0 9
code
x = dd2['qtr']
y1 = dd2['count']
y2 = dd2['median']
fig, ax = plt.subplots(figsize=(40,10))
ax = plt.subplot(111)
ax2 = ax.twinx()
y1_plot = y1.plot(ax=ax2, color='green', legend=True, marker='*', label="median")
y2_plot = y2.plot(ax=ax, color='red', legend=True, linestyle='--', marker='x', label="count")
plt.title('Price trend analysis')
ax.set_xticklabels(x, rotation='vertical',color='k', size=20)
ax.set_xlabel('year')
ax.set_ylabel('sold price')
ax2.set_ylabel('number of sales')
y1_patch = mpatches.Patch(color='red', label='median sold price')
y2_patch = mpatches.Patch(color='green', label='count')
plt.legend(handles=[y2_patch,y1_patch],loc='upper right')
plt.savefig('chart.png', dpi=300,bbox_inches ='tight')
plt.show()

using mtick to plot all x-axis data.
import matplotlib.ticker as mtick
ax.xaxis.set_major_locator(mtick.IndexLocator(base=1, offset=0))

Instead of going through Pandas' Series plotting methods, I'd use pyplot to plot your x and y data together, like this:
# everything is the same up to 'ax2 = ax.twinx()'
# plot on your axes, save a reference to the line
line1 = ax.plot(x, y1, color="green", label="median sold price", marker='*')
line2 = ax2.plot(x, y2, color="red", label="count", marker='x')
# no need for messing with patches
lines = line1 + line2
labels = [l.get_label() for l in lines]
ax.legend(lines, labels, loc='upper right')
# this is the same as before again
plt.title('Price trend analysis')
ax.xaxis.set_tick_params(rotation=90, color='k', size
ax.set_xlabel('year')
ax.set_ylabel('sold price')
ax2.set_ylabel('number of sales')
plt.savefig('chart.png', dpi=300,bbox_inches ='tight')
plt.show()

Related

How to align the x position of the dots in seaborn scatterplot to a nested bar plot

I am trying to plot a scatter plot on top of a bar plot using sns.scatterplot() and df.plot(kind='bar'); The figure turns out to be fine, but it would be even nicer if I can align each of the scatter points to its corresponding bar with an identical label.
I have read the document on Rectangle of matplotlib.pyplot that it has a get_x() method that can "Return the left coordinate of the rectangle";
I wonder if there is a way for me to assign these coordinates to the scatter points that'd be plotted by seaborn?
Code
fig, ax = plt.subplots(nrows=1, ncols=1)
fig.set_size_inches(9, 9)
fig.set_dpi(300)
bar_df.plot(kind='bar', ax=ax)
ax2 = ax.twinx()
sns.scatterplot(data=line_df, ax=ax2)
Dataframes
bar_df
year
apple
banana
citrus
...
2020
12
34
56
78
2025
12
34
56
78
2030
12
34
56
78
2035
12
34
56
78
line_df
year
apple
banana
citrus
...
2020
23
45
67
89
2025
23
45
67
89
2030
23
45
67
89
2035
23
45
67
89
It'd be really nice if I could make the points in the same vertical line as the bar with the same header;
sns.scatterplot interprets the x-axis as numeric. As such, it doesn't align well with a bar plot, nor does it have a dodge= parameter.
You can use sns.stripplot instead.
Seaborn works easiest with its data in "long form", which can be achieved via pandas pd.melt.
Here is some example code:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
bar_df, line_df = pd.read_html('https://stackoverflow.com/questions/73191315')
bar_df_long = bar_df.melt(id_vars='year', var_name='fruit', value_name='bar_value')
line_df_long = line_df.melt(id_vars='year', var_name='fruit', value_name='line_value')
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(6,6), dpi=300)
sns.barplot(data=bar_df_long, x='year', y='bar_value', hue='fruit', dodge=True, ax=ax)
ax2 = ax.twinx()
sns.stripplot(data=line_df_long, x='year', y='line_value', hue='fruit', dodge=True, jitter=False,
edgecolor='black', linewidth=1, ax=ax2)
ax2.legend_.remove() # remove the second legend
plt.tight_layout()
plt.show()

how can I plot with multiple colors based on values on x-axis

Hey guys let's say I have a pandas DataFrame
Year Delta_T
0 2000 23
1 2001 25
2 2002 22
2 2002 22
4 2004 30
5 2005 21
and I want to plot delta_T in function of time but I want to plot one color for the dates from 2000 to 2003 and another color from 2004 to 2005
Can someone please tell me how I can do it?
I have tried this
plt.figure()
plt.scatter(delta_T_all.iloc[:,0].pd.Timestamp('2010-04-17'),
pd.Timestamp('2016-01-01'),delta_T_all.iloc[:,1], label= '220-250m'),
plt.scatter(delta_T_all.iloc[:,0].pd.Timestamp('2016-01-01'),
pd.Timestamp('2019-09-14'),delta_T_all.iloc[:,1], label= '220-250m')
plt.xlabel('Time')
plt.setp(plt.gca().get_xticklabels(), rotation=60, ha="right")
plt.ylabel('Delta_T')
plt.legend()
plt.title('Delta_T in function of time')
This is the answer
plt.figure()
plt.scatter(delta_T_all.iloc[0:2,0],delta_T_all.iloc[0:2,1],color='r')
plt.scatter(delta_T_all.iloc[2:5,0],delta_T_all.iloc[2:5,1],color='b')
plt.xlabel('Time')
plt.ylabel('Delta_T')
plt.legend()

How to show chart type of each Y-axis to distinguish compared factors

When comparing two different Y variables, there is no real way of knowing which chart type belongs to which Y-Axis. I need a legend that says which chart type belongs to which data set.
With help from this site itself I've been able to plot different categorized factors using different chart types, but as you can see there is no way to tell which chart type belongs to which factor/variable
This is the data table(tm_daily_df), and the current code
report_date shift UTL_R Head_Count
0 2019-03-17 A 0.669107 39
1 2019-03-18 A 0.602197 69
2 2019-03-19 A 0.568741 72
3 2019-03-20 A 0.552013 78
4 2019-03-21 A 0.585469 57
5 2019-03-22 A 0.635652 61
6 2019-03-23 A 0.602197 51
7 2019-03-17 1 0.828020 16
8 2019-03-17 2 0.585469 8
9 2019-03-17 3 0.526922 15
10 2019-03-18 1 0.618924 30
11 2019-03-18 2 0.610560 20
12 2019-03-18 3 0.577105 19
13 2019-03-19 1 0.610560 28
14 2019-03-19 2 0.602197 26
15 2019-03-19 3 0.468375 18
16 2019-03-20 1 0.543650 33
17 2019-03-20 2 0.552013 26
18 2019-03-20 3 0.552013 19
19 2019-03-21 1 0.577105 22
20 2019-03-21 2 0.585469 19
21 2019-03-21 3 0.602197 16
22 2019-03-22 1 0.593833 26
23 2019-03-22 2 0.685835 20
24 2019-03-22 3 0.635652 15
25 2019-03-23 1 0.577105 23
26 2019-03-23 2 0.627288 16
27 2019-03-23 3 0.602197 12
fig, ax = plt.subplots(figsize=(15,6))
g = sns.lineplot(x='report_date', y='UTL_R', data=tm_daily_df, ax=ax, hue = 'shift', legend = None,
marker='o', markersize=10)
ax2 = ax.twinx()
g = sns.barplot(x='report_date', y='Head_Count', data=tm_daily_df, ax=ax2, hue='shift',alpha=.5)
ax.set_title('Utilization Ratio vs HeadCount')
plt.show()
I want to have a legend that says which chart type belongs to which data set. In this case, there would be a secondary legend that shows a line and the word "UTL_R" and a square(or whatever would represent a bar graph) next to the word "Head_Count" .
I'm also open to any other ideas that can define the applied chart types. Keep in mind this graph is one of many from a huge set of variables, it's not a single instance.
Is there maybe a way i can just put an image/small table into the figure if this is not possible?
tl;dr at the bottom
I recently needed to implement two legends on a project as well. My code is something like:
def plot_my_data(ax, local_zerog, local_oneg, local_maxg):
# local_zerog list looks like: [local_zerog_dcmdcl_names, local_zerog_dcmdcl_values, local_zerog_time2double_names, local_zerog_time2double_values]
# the others are structured the same way as well
mpl.rcParams["lines.markersize"] = 7
dcmdcl = ax.scatter(local_zerog[0], local_zerog[1], label='Zero G', facecolors='none', edgecolors='b') #dcmdcl
ax.scatter(local_oneg[0], local_oneg[1], label="One G", facecolors='none', edgecolors='g')
ax.scatter(local_maxg[0], local_maxg[1], label="Max G", facecolors='none', edgecolors='r')
ax.tick_params(axis="x", direction="in", top=False, labeltop=False, labelbottom=True)
ax.tick_params(axis="y", direction="in", right=True)
labels = ax.get_xticklabels()
plt.setp(labels, rotation=90, horizontalalignment='center')
legend1 = ax.legend(loc=1)
time2double = ax.scatter(local_zerog[2], local_zerog[3], label='Zero G', marker='s', color='b') #time2double
ax.scatter(local_oneg[2], local_oneg[3], label="One G", marker='s', color='g')
ax.scatter(local_maxg[2], local_maxg[3], label="Max G", marker='s', color='r')
ax.plot(local_oneg[0], [0 for _ in local_oneg[0]], color='k') # line at 0
ax.plot(local_oneg[2], [0 for _ in local_oneg[2]], color='k')
ax.legend([dcmdcl, time2double], ["dcmdcl [%]", "time2double [s]"], loc=2)
plt.gca().add_artist(legend1)
Where I had basically 6 sets of data: 3 for dcmdcl and 3 for time2double. Each has a different color/shape so basically I plotted all of one shape in the lines
dcmdcl = ax.scatter(local_zerog[0], local_zerog[1], label='Zero G', facecolors='none', edgecolors='b') #dcmdcl
ax.scatter(local_oneg[0], local_oneg[1], label="One G", facecolors='none', edgecolors='g')
ax.scatter(local_maxg[0], local_maxg[1], label="Max G", facecolors='none', edgecolors='r')
ax.tick_params(axis="x", direction="in", top=False, labeltop=False, labelbottom=True)
ax.tick_params(axis="y", direction="in", right=True)
labels = ax.get_xticklabels()
plt.setp(labels, rotation=90, horizontalalignment='center')
legend1 = ax.legend(loc=1)
where the last line generates a legend based off the various labels I've assigned. Now to differentiate between the shapes I took one dcmdcl and one time2double and made another legend. The relevant code is:
dcmdcl = ax.scatter(local_zerog[0], local_zerog[1], label='Zero G', facecolors='none', edgecolors='b') #dcmdcl
time2double = ax.scatter(local_zerog[2], local_zerog[3], label='Zero G', marker='s', color='b') #time2double
ax.legend([dcmdcl, time2double], ["dcmdcl [%]", "time2double [s]"], loc=2)
where I basically feed it two specific instances and tell it to create another legend from this information and place it at another location.
tl;dr
It looks like you already have the legend you want for one of the data sets, so now you basically need to run:
legend1 = ax.legend(['put a series of items you want to describe here'], ['put how you would like to title them (needs to be in same order as previous list)'], loc=2)
plt.gca().add_artist(legend1)
I think the order might be important here (I don't remember from when I made it), but if you'll notice my order is:
plot some stuff
legend1 = ax.legend(loc=1) to make a legend (not plotted yet, just a variable)
plot more stuff
ax.legend([dcmdcl, time2double], ["dcmdcl [%]", "time2double [s]"], loc=2) (note this is not assigned to a variable this time)
plt.gca().add_artist(legend1) now I use the variable made earlier to plot it via add_artist()
My code to generate each ax that is passed into my function above:
fig = plt.figure(figsize=(15, 15))
ax = fig.add_subplot(1, 3, 1)
zerog, oneg, maxg = build_plot_data(lower_mach)
plot_my_data(ax, zerog, oneg, maxg)
ax.set_title("Mach < .7")

Seaborn stripplot deletes plot

I have a boxplot:
fig, ax = plt.subplots(1,1)
bp = df.boxplot(column='transaction_value',
by='store_type', grid=True,
ax=ax, showfliers=True)
plt.tight_layout(rect=[0, 0.03, 1, 0.95])
ax.set_ylim([0, 800])
ax.set_ylabel('transaction_value')
plt.show()
I have a seaborn stripplot:
bplot=sns.stripplot(y='transaction_value', x='store_type',
data=df,
jitter=True,
marker='o',
alpha=0.1,
color='black')
When I try to overlay the stripplot on the boxplot, it deletes the first boxplot (on the very far left).
fig, ax = plt.subplots(1,1)
bp = df.boxplot(column='transaction_value',
by='store_type', grid=True,
ax=ax, showfliers=True)
bplot=sns.stripplot(y='transaction_value', x='store_type',
data=df,
jitter=True,
marker='o',
alpha=0.1,
color='black')
plt.tight_layout(rect=[0, 0.03, 1, 0.95])
ax.set_ylim([0, 500])
ax.set_ylabel('transaction_value')
plt.show()
How can I stop this from happening?
Added data example:
a
transaction_value store_type
0 30.927648 express
1 20.356693 extra
2 48.201950 metro
3 77.213957 metro
4 15.482211 superstore
5 85.794876 superstore
6 16.199844 extra
7 0.007816 superstore
8 50.925737 metro
9 81.393811 metro
10 7.616312 superstore
11 82.172441 metro
12 49.608503 extra
13 71.907878 metro
14 85.833738 superstore
15 88.131029 express
16 11.541427 extra
17 89.759724 metro
18 96.435902 superstore
19 91.984656 superstore
20 67.795293 metro
21 39.806654 superstore
22 39.565823 metro
23 37.507718 superstore
24 37.918300 metro
25 18.599158 metro
26 3.815219 extra
27 83.210068 express
28 3.988503 extra
29 94.298953 superstore
a = pd.read_clipboard()
fig, ax = plt.subplots(1,1)
bp = a.boxplot(column='transaction_value',
by='store_type', grid=True,
ax=ax, showfliers=True)
bplot=sns.stripplot(y='transaction_value', x='store_type',
data=a,
jitter=True,
marker='o',
alpha=0.1,
color='black')
plt.tight_layout(rect=[0, 0.03, 1, 0.95])
ax.set_ylim([0, 500])
ax.set_ylabel('transaction_value')
plt.show()
#ImportanceOfBeingErnest provided a solution in comments while I was typing, but I was going to suggest something else:
For better consistency, I would recommend to use seaborn to do the boxplots as well, this should ensure that both plots are laid out the same way,
fig, ax = plt.subplots(1,1)
sns.boxplot(y='transaction_value', x='store_type', data=df, ax=ax,
color='w')
sns.stripplot(y='transaction_value', x='store_type', data=df, ax=ax,
jitter=True,
marker='o',
alpha=0.1,
color='black')
ax.set_ylabel('transaction_value')
plt.show()

Plotting a bar chart comparing years in pandas

I have the following dataframe
Date_x BNF Chapter_x VTM_NM Occurance_x Date_y BNF Chapter_y Occurance_y
0 2016-12-01 1 Not Specified 2994 2015-12-01 1 3212
1 2016-12-01 1 Mesalazine 2543 2015-12-01 1 2397
2 2016-12-01 1 Omeprazole 2307 2015-12-01 1 2370
3 2016-12-01 1 Esomeprazole 1535 2015-12-01 1 1516
4 2016-12-01 1 Lansoprazole 1511 2015-12-01 1 1547
I have plotted a bar chart with 2 bars one representing 2015 and the other 2016 using this code
fig = plt.figure() # Create matplotlib figure
ax = fig.add_subplot(111) # Create matplotlib axes
width = 0.4
df.Occurance_x.plot(kind='bar', color='red', ax=ax, width=width, position=1)
df.Occurance_y.plot(kind='bar', color='blue', ax=ax, width=width, position=0)
ax.set_ylabel('Occurance')
plt.legend(['Date_x', 'Date_y'], loc='upper right')
ax.set_title('BNF Chapter 1 Top 5 drugs prescribed')
plt.show()
However the x axi shows the index 0 1 2 3 4
- I want it to show the drug names
How would I go about doing this?
I guess that you can start to play from this.
import pandas as pd
df = pd.DataFrame({"date_x":[2015]*5,
"Occurance_x":[2994, 2543, 2307, 1535, 1511],
"VTM_NM":["Not Specified", "Mesalazine", "Omeprazole",
"Esomeprazole", "Lansoprazole"],
"date_y":[2016]*5,
"Occurance_y":[3212, 2397, 2370, 1516, 1547]})
ax = df[["VTM_NM","Occurance_x", "Occurance_y"]].plot(x='VTM_NM',
kind='bar',
color=["g","b"],
rot=45)
ax.legend(["2015", "2016"]);

Categories

Resources