Sorting values in plt.bar - python

I have been looking around the net for hours now, and have not been able to solve this problem, and hope some of you can help.
plt.bar(att_new['player'], att_new['shots'].groupby(att_new['player']).transform('sum'))
plt.axhline(y=att_shots_leauge_average, color='r')
plt.xticks(rotation=90)
plt.figure(figsize=(10,30))
my dataframe looks like this:
att_new = att[['id','player','date','team_name','fixture_name','position_new', 'goals','shots',
'shots_on_target', 'xg', 'attacking_pen_area_touches',
'aerials_won', 'final_third_entry_passes', 'dribbles_completed']]
I have been going over: https://datavizpyr.com/sort-bars-in-barplot-using-seaborn-in-python/, but for me, it seems like the groupby I am doing, is making quite some problems but I need it to get the sum value.
Hope you can help! Thanks!
------EDITED CODE------
import pandas as pd
import seaborn as sns
# groupby and sort
dfg = att_new.groupby('player', as_index=False).shots.sum().sort_values('shots', ascending=False)
# get the mean value for everything
mean = att_shots_leauge_average
# plot
ax = dfg.plot.bar('player', 'shots', figsize=(9, 7), legend=False)
ax.axhline(y=mean, color='gray', lw=3)
ax.text(1.5, mean + 0.2, f'mean{mean:0.2f}', weight='bold')

You must sort the values with .sort_values()
plt.bar(att_new['player'], att_new['shots'].groupby(att_new['player']).transform('sum')) is convoluted, do the .groupby separately, and then plot the result, as shown below.
import pandas as pd
import seaborn as sns # only used for importing the data
# sample data
tips = sns.load_dataset('tips')
# groupby and sort
dfg = tips.groupby('day', as_index=False).total_bill.sum().sort_values('total_bill', ascending=False)
# get the mean value for everything
mean_tips = tips.total_bill.mean()
# plot
ax = dfg.plot.bar('day', 'total_bill', figsize=(9, 7), legend=False)
ax.axhline(y=mean_tips, color='gray', lw=3)
ax.text(1.5, mean_tips + 0.2, f'Mean Tips: ${mean_tips:0.2f}', weight='bold')

Related

How do you add two df to a plot map?

I want to combine df1.plot() and df0.plot() onto one plot graph. Currently, when running both, it will give me two plot maps, and I'm out of my expertise on how to join both of them.
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = [25, 10]
df = df.sort_values('datetime', ascending=True)
df1.plot()
df0.plot()
plt.show()
Replace your code
df1.plot()
df0.plot()
The following code will help:
plt.plot(df1["X"], df1["Y"])
plt.plot(df0["X"], df0["Y"])
plt.show()
You can either entirely switch over to Matplotlib code as Mohit Mehlawat suggested, or you can keep using pandas.DataFrame.plot() by setting an ax variable and place both plots on it like so:
df1 = pd.DataFrame({'Value':[1,2,3,4,5]})
df2 = pd.DataFrame({'Value':[5,4,3,2,1]})
# Relevant code
ax = df1.plot()
df2.plot(ax=ax)
Output:

How to add multiple custom ticks to seaborn boxplot

I generated a boxplot using seaborn. On the x axis, I would like to have, both the number of days (20, 25, 32) and the actual dates they refer to (2022-05-08, 2022-05-13, 2022-05-20).
I found a potential solution at the following link add custom tick with matplotlib. I'm trying to adapt it to my problem but I could only get the number of days or the dates, not both.
I really would appreciate any help. Thank you in advance for your time.
Please, find below my code and the desired output.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame({'nb_days':[20,20,20,25,25,20,32,32,25,32,32],
'Dates':['2022-05-08','2022-05-08','2022-05-08','2022-05-13','2022-05-13','2022-05-08','2022-05-20','2022-05-20','2022-05-13','2022-05-20','2022-05-20'],
'score':[3,3.5,3.4,2,2.2,3,5,5.2,4,4.3,5]})
df['Dates'] = df['Dates'].apply(pd.to_datetime)
tick_label = dict(zip(df['nb_days'],df['Dates'].apply(lambda x: x.strftime('%Y-%m-%d')))) #My custom xtick label
#Plot
fig,ax = plt.subplots(figsize=(6,6))
ax = sns.boxplot(x='nb_days',y='score',data=df,color=None)
# iterate over boxes to change color
for i,box in enumerate(ax.artists):
box.set_edgecolor('red')
box.set_facecolor('white')
sns.stripplot(x='nb_days',y='score',data=df,color='black')
ticks = sorted(df['nb_days'].unique())
labels = [tick_label.get(t, ticks[i]) for i,t in enumerate(ticks)]
ax.set_xticklabels(labels)
plt.tight_layout()
plt.show()
plt.close()
Here is the desired output.
You can do that by adding these lines in place of ax.set_xticklabels(labels)
new_labels=["{}\n{}".format(a_, b_) for a_, b_ in zip(ticks, labels)]
ax.set_xticklabels(new_labels)
Output
Try this:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame({'nb_days':[20,20,20,25,25,20,32,32,25,32,32],
'Dates':['2022-05-08','2022-05-08','2022-05-08','2022-05-13','2022-05-13','2022-05-08','2022-05-20','2022-05-20','2022-05-13','2022-05-20','2022-05-20'],
'score':[3,3.5,3.4,2,2.2,3,5,5.2,4,4.3,5]})
df['Dates'] = df['Dates'].apply(pd.to_datetime)
tick_label = dict(zip(df['nb_days'],df['Dates'].apply(lambda x: x.strftime('%Y-%m-%d')))) #My custom xtick label
#Plot
fig,ax = plt.subplots(figsize=(6,6))
ax = sns.boxplot(x='nb_days',y='score',data=df,color=None)
# iterate over boxes to change color
for i,box in enumerate(ax.artists):
box.set_edgecolor('red')
box.set_facecolor('white')
sns.stripplot(x='nb_days',y='score',data=df,color='black')
ticks = sorted(df['nb_days'].unique())
labels = ["{}\n".format(t)+tick_label.get(t, ticks[i]) for i, t in enumerate(ticks)]
ax.set_xticklabels(labels)
plt.tight_layout()
plt.show()
plt.close()

date and graph alignment - Economic analysis

I'm am running a fundamental economic analysis and when I get to visualising and charting I am not able to align the dates with the graph.
I wanted the most recent date entry to show on the right and the rest of the dates to show every two years.
I have tried literally everything and cant find the solution.
Here is my code:
%matplotlib inline
import pandas as pd
from matplotlib import pyplot
import matplotlib.dates as mdates
df = pd.read_csv('https://fred.stlouisfed.org/graph/fredgraph.csvbgcolor=%23e1e9f0&chart_type=line&drp=0&fo=open%20sans&graph_bgcolor=%23ffffff&height=450&mode=fred&recession_bars=off&txtcolor=%23444444&ts=12&tts=12&width=1168&nt=0&thu=0&trc=0&show_legend=yes&show_axis_titles=yes&show_tooltip=yes&id=NAEXKP01EZQ657S&scale=left&cosd=1995-04-01&coed=2020-04-01&line_color=%234572a7&link_values=false&line_style=solid&mark_type=none&mw=3&lw=2&ost=-99999&oet=99999&mma=0&fml=a&fq=Quarterly&fam=avg&fgst=lin&fgsnd=2020-02-01&line_index=1&transformation=lin&vintage_date=2020-09-21&revision_date=2020-09-21&nd=1995-04-01')
df = df.set_index('DATE')
df['12MonthAvg'] = df.rolling(window=12).mean().dropna(how='all')
df['9MonthAvg'] = df['12MonthAvg'].rolling(window=12).mean().dropna(how='all')
df['Spread'] = df['12MonthAvg'] - df['9MonthAvg']
pyplot.style.use("seaborn")
pyplot.subplots(figsize=(10, 5), dpi=85)
df['Spread'].plot().set_title('EUROPE: GDP Q Growth Rate (12M/12M Avg Spread)', fontsize=16)
df['Spread'].plot().axhline(0, linestyle='-', color='r',alpha=1, linewidth=2, marker='')
df['Spread'].plot().spines['left'].set_position(('outward', 10))
df['Spread'].plot().spines['bottom'].set_position(('outward', 10))
df['Spread'].plot().spines['right'].set_visible(False)
df['Spread'].plot().spines['top'].set_visible(False)
df['Spread'].plot().yaxis.set_ticks_position('left')
df['Spread'].plot().xaxis.set_ticks_position('bottom')
df['Spread'].plot().text(0.50, 0.02, "Crossing red line downwards / Crossing red line Upwards",
transform=pyplot.gca().transAxes, fontsize=14, ha='center', color='blue')
df['Spread'].plot().fmt_xdata = mdates.DateFormatter('%Y-%m-%d')
print(df['Spread'].tail(3))
pyplot.autoscale()
pyplot.show()
And the output:
This is the raw data:
There is a couple of corrections to your code.
In your URL insert "?" after fredgraph.csv. It starts so called query string,
where bgcolor is the first parameter.
Read your DataFrame with additional parameters:
df = pd.read_csv('...', parse_dates=[0], index_col=[0])
The aim is to:
read Date column as datetime,
set it as the index.
Create additional columns as:
df['12MonthAvg'] = df.NAEXKP01EZQ657S.rolling(window=12).mean()
df['9MonthAvg'] = df.NAEXKP01EZQ657S.rolling(window=9).mean()
df['Spread'] = df['12MonthAvg'] - df['9MonthAvg']
Corrections:
9MonthAvg (as I think) should be computed from the source column,
not from 12MonthAvg,
dropna here is not needed, as you create whole column anyway.
Now is the place to use dropna() on Spread column and save it in
a dedicated variable:
spread = df['Spread'].dropna()
Draw your figure the following way:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
plt.style.use("seaborn")
fig, ax = plt.subplots(figsize=(10, 5), dpi=85)
plt.plot_date(spread.index, spread, fmt='-')
ax.set_title('EUROPE: GDP Q Growth Rate (12M/12M Avg Spread)', fontsize=16)
ax.axhline(0, linestyle='-', color='r',alpha=1, linewidth=2, marker='')
ax.spines['left'].set_position(('outward', 10))
ax.spines['bottom'].set_position(('outward', 10))
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.text(0.50, 0.02, "Crossing red line downwards / Crossing red line Upwards",
transform=ax.transAxes, fontsize=14, ha='center', color='blue')
ax.xaxis.set_major_formatter(mdates.DateFormatter(fmt='%Y-%m-%d'))
plt.show()
Corrections:
plt.subplots returns fig and ax, so I saved them (actually, only ax
is needed).
When one axis contains dates, it is better to use plot_date.
I changed the way DateFormatter is set.
Using the above code I got the following picture:

How can I plot slice of certain DataFrame for each row with different color?

I would like to plot certain slices of my Pandas Dataframe for each rows (based on row indexes) with different colors.
My data look like the following:
I already tried with the help of this tutorial to find a way but I couldn't - probably due to a lack of skills.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv("D:\SOF10.csv" , header=None)
df.head()
#Slice interested data
C = df.iloc[:, 2::3]
#Plot Temp base on row index colorfully
C.apply(lambda x: plt.scatter(x.index, x, c='g'))
plt.show()
Following is my expected plot:
I was also wondering if I could displace the mean of each row of the sliced data which contains 480 values somewhere in the plot or in the legend beside of plot! Is it feasible (like the following picture) to calculate the mean and displaced somewhere in the legend or by using small font size displace next to its own data in graph ?
Data sample: data
This gives the plot without legend
C = df.iloc[:,2::3].stack().reset_index()
C.columns = ['level_0', 'level_1', 'Temperature']
fig, ax = plt.subplots(1,1)
C.plot('level_0', 'Temperature',
ax=ax, kind='scatter',
c='level_0', colormap='tab20',
colorbar=False, legend=True)
ax.set_xlabel('Cycles')
plt.show()
Edit to reflect modified question:
stack() transform your (sliced) dataframe to a series with index (row, col)
reset_index() reset the double-level index above to level_0 (row), level_1 (col).
set_xlabel sets the label of x-axis to what you want.
Edit 2: The following produces scatter with legend:
CC = df.iloc[:,2::3]
fig, ax = plt.subplots(1,1, figsize=(16,9))
labels = CC.mean(axis=1)
for i in CC.index:
ax.scatter([i]*len(CC.columns[1:]), CC.iloc[i,1:], label=labels[i])
ax.legend()
ax.set_xlabel('Cycles')
ax.set_ylabel('Temperature')
plt.show()
This may be an approximate answer. scatter(c=, cmap= can be used for desired coloring.
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import itertools
df = pd.DataFrame({'a':[34,22,1,34]})
fig, subplot_axes = plt.subplots(1, 1, figsize=(20, 10)) # width, height
colors = ['red','green','blue','purple']
cmap=matplotlib.colors.ListedColormap(colors)
for col in df.columns:
subplot_axes.scatter(df.index, df[col].values, c=df.index, cmap=cmap, alpha=.9)

Combining FacetGrid and dual Y-axis in Pandas

I am trying to plot two different variables (linked by a relation of causality), delai_jour and date_sondage on a single FacetGrid. I can do it with this code:
g = sns.FacetGrid(df_verif_sum, col="prefecture", col_wrap=2, aspect=2, sharex=True,)
g = g.map(plt.plot, "date_sondage", "delai_jour", color="m", linewidth=2)
g = g.map(plt.bar, "date_sondage", "impossible")
which gives me this:
FacetGrid
(There are 33 of them in total).
I'm interested in comparing the patterns across the various prefecture, but due to the difference in magnitude I cannot see the changes in the line chart.
For this specific work, the best way to do it is to create a secondary y axis, but I can't seem to make anything work: it doesn't look like it's possible with FacetGrid, and I didn't understand the code not was able to replicate the examples i've seen with pure matplotlib.
How should I go about it?
I got this to work by iterating through the axes and plotting a secondary axis as in a typical Seaborn graph.
Using the OP example:
g = sns.FacetGrid(df_verif_sum, col="prefecture", col_wrap=2, aspect=2, sharex=True)
g = g.map(plt.plot, "date_sondage", "delai_jour", color="m", linewidth=2)
for ax, (_, subdata) in zip(g.axes, df_verif_sum.groupby('prefecture')):
ax2=ax.twinx()
subdata.plot(x='data_sondage',y='impossible', ax=ax2,legend=False,color='r')
If you do any formatting to the x-axis, you may have to do it to both ax and ax2.
Here's an example where you apply a custom mapping function to the dataframe of interest. Within the function, you can call plt.gca() to get the current axis at the facet being currently plotted in FacetGrid. Once you have the axis, twinx() can be called just like you would in plain old matplotlib plotting.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
def facetgrid_two_axes(*args, **kwargs):
data = kwargs.pop('data')
dual_axis = kwargs.pop('dual_axis')
alpha = kwargs.pop('alpha', 0.2)
kwargs.pop('color')
ax = plt.gca()
if dual_axis:
ax2 = ax.twinx()
ax2.set_ylabel('Second Axis!')
ax.plot(data['x'],data['y1'], **kwargs, color='red',alpha=alpha)
if dual_axis:
ax2.plot(df['x'],df['y2'], **kwargs, color='blue',alpha=alpha)
df = pd.DataFrame()
df['x'] = np.arange(1,5,1)
df['y1'] = 1 / df['x']
df['y2'] = df['x'] * 100
df['facet'] = 'foo'
df2 = df.copy()
df2['facet'] = 'bar'
df3 = pd.concat([df,df2])
win_plot = sns.FacetGrid(df3, col='facet', size=6)
(win_plot.map_dataframe(facetgrid_two_axes, dual_axis=True)
.set_axis_labels("X", "First Y-axis"))
plt.show()
This isn't the prettiest plot as you might want to adjust the presence of the second y-axis' label, the spacing between plots, etc. but the code suffices to show how to plot two series of differing magnitudes within FacetGrids.

Categories

Resources