I´d like to create a stacked barplot of asset weights representing a financial portfolio over time. I tried several approaches for that one, but got the most pleasing results with matplotlib's stackplot function. However, I am not able to display negative asset weights in my stackplot, thus receiving wrong figures. I am using Python (3.8.3) and Matplotlib (3.3.2).
The following displays the head of the asset weights dataframe to plot:
w_minvar1nc.head()
SMACAP GROWTH MOMTUM MINVOL QUALITY
Date
2015-02-20 0.012942 0.584273 -0.114441 0.387773 0.129454
2015-02-23 0.013129 0.584528 -0.115836 0.386448 0.131732
2015-02-24 0.013487 0.584404 -0.116585 0.386364 0.132330
2015-02-25 0.015145 0.572256 -0.117796 0.387583 0.142811
2015-02-26 0.015113 0.567198 -0.114580 0.387807 0.144462
The following displays a simple code snippet of my current approach to the stackplot:
# initialize stackplot
fig, ax = plt.subplots(nrows=1, ncols=1, facecolor="#F0F0F0")
# create and format stackplot
ax.stackplot(w_minvar1nc.index, w_minvar1nc.SMACAP, w_minvar1nc.GROWTH, w_minvar1nc.MOMTUM, w_minvar1nc.MINVOL, w_minvar1nc.QUALITY)
ax.set_xlabel("Time")
ax.set_ylabel("Weight")
ax.set_ylim(bottom=-0.5, top=1.5)
ax.grid(which="major", color="grey", linestyle="--", linewidth=0.5)
# save stackplot
fig.savefig(fname=(plotpath + "test.png"))
plt.clf()
plt.close()
And here comes the corresponding stackplot itself in which you can see that the negative asset weights don't show up:
Does anyone know how to deal with that problem? Any ideas would be much appreciated.
PS: Of course I've already tried other approaches such as stacking the data manually and then create a regular barplot etc. And in this case the positive and negative asset weights are actually displayed correctly, but this approach also leads to even bigger problems regarding the formatting of the x-axis because of the daily data.
If the columns are separated into positive and negative weights, you can plot them separately:
from matplotlib import pyplot as plt
import pandas as pd
#fake data
import numpy as np
np.random.seed(123)
n = 100
df = pd.DataFrame({"Dates": pd.date_range("20180101", periods=n, freq="10d"),
"A": 0.2 + np.random.random(n)/10,
"B": -np.random.random(n)/10,
"C": -0.1-np.random.random(n)/10,
"D": 0.3+ np.random.random(n)/10})
df.set_index("Dates", inplace=True)
df["E"] = 1 - df.A - df.D - df.B - df.C
fig, ax = plt.subplots(nrows=1, ncols=1, facecolor="#F0F0F0")
ax.stackplot(df.index, df.A, df.D, df.E)
ax.stackplot(df.index, df.B, df.C)
ax.set_xlabel("Time")
ax.set_ylabel("Weight")
ax.set_ylim(bottom=-0.5, top=1.5)
ax.grid(which="major", color="grey", linestyle="--", linewidth=0.5)
plt.show()
Sample output:
Enclosed the solution to the problem with huge credit to #Mr. T:
# split data into negative and positive values
w_minvar1nc_pos = w_minvar1nc[w_minvar1nc >= 0].fillna(0)
w_minvar1nc_neg = w_minvar1nc[w_minvar1nc < 0].fillna(0)
# initialize stackplot
fig, ax = plt.subplots(nrows=1, ncols=1, facecolor="#F0F0F0")
# create and format stackplot
ax.stackplot(w_minvar1nc_pos.index, w_minvar1nc_pos.SMACAP, w_minvar1nc_pos.GROWTH, w_minvar1nc_pos.MOMTUM, w_minvar1nc_pos.MINVOL, w_minvar1nc_pos.QUALITY)
ax.stackplot(w_minvar1nc_neg.index, w_minvar1nc_neg.SMACAP, w_minvar1nc_neg.GROWTH, w_minvar1nc_neg.MOMTUM, w_minvar1nc_neg.MINVOL, w_minvar1nc_neg.QUALITY)
ax.set_xlabel("Time")
ax.set_ylabel("Weight")
ax.set_ylim(bottom=-0.5, top=1.5)
ax.grid(which="major", color="grey", linestyle="--", linewidth=0.5)
# save stackplot
fig.savefig(fname=(plotpath + "test.png"))
plt.clf()
plt.close()
Related
I have been working on pandas data that has too many time x points/ticks. I have found several solutions to reduce it, but my problem is I'm using two different data set with different time points. So, how to not only reduce the x-ticks as well as aligning these two data time points?
Yes, I know sharex.
My plot generating code is following:
fig, (ax1, ax2,ax3,ax4) = plt.subplots(4, 1,figsize=(10,7), sharex="all")
fig.subplots_adjust(bottom=0.2)
ax1.plot( df.time, df['B'], color='k')
ax1.plot( df.time, df['Bx'], color='r')
ax1.plot( df.time, df['By'], color='b')
ax1.plot( df.time, df['Bz'], color='g')
ax1.xaxis.grid(True,alpha=0.3)
ax1.set_ylabel('Bx,By,Bz,B[nT]')
ax2.plot(df1.time, df1['v_total'],color='k')
ax2.plot(df1.time, df1['Vx'],color='r')
ax2.plot(df1.time, df1['Vy'],color='b')
ax2.plot(df1.time, df1['Vz'],color='g')
ax2.xaxis.grid(True,alpha=0.3)
ax2.set_ylabel('Vx,Vy,Vz,V[km/s]')
ax3.plot(df1.time, df1['n'],color='k')
ax3.xaxis.grid(True,alpha=0.3)
ax3.set_ylabel('Np[1/cm^3]')
ax4.plot(df1.time, df1['T'],color='k')
ax4.xaxis.grid(True,alpha=0.3)
ax4.set_ylabel('T[k]')
#loc = mdates.MinuteLocator([0,30])
#ax2.xaxis.set_major_locator(loc)
#ax2.xaxis.set_major_formatter(mdates.AutoDateFormatter(loc))
#ax3.xaxis.set_major_locator(loc)
#ax3.xaxis.set_major_formatter(mdates.AutoDateFormatter(loc))
#ax4.xaxis.set_major_locator(loc)
#ax4.xaxis.set_major_formatter(mdates.AutoDateFormatter(loc))
#ax2 = plt.gca()
#ax2.xaxis.set_major_locator(mdates.MinuteLocator(interval=10))
#ax2.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M:%S'))
#ax3 = plt.gca()
#ax3.xaxis.set_major_locator(mdates.MinuteLocator(interval=10))
#ax3.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M:%S'))
#ax4 = plt.gca()
#ax4.xaxis.set_major_locator(mdates.MinuteLocator(interval=10))
#ax4.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M:%S'))
fig.suptitle('Shock format')
plt.savefig('plot.png')
plt.savefig('plot1.pdf')
plt.show()
Here df is the one with many points and I want to reduce df x ticks/time points as well as aligning df1 to df. Hashtagged lines are my try, but it takes too long and giving me this warning "Locator attempting to generate 359569 ticks ([-113.5, ..., 2383.5]), which exceeds Locator.MAXTICKS (1000)."
The graph output is following
What my goal should look like is this
I have created the data as I see fit as it seems difficult to provide data.
The main point is that the byminute is the number of minutes, and the default value is 1 minute in the range (60). The interval specifies at what interval that minute increment should be displayed. So for the 15 minute increments, I used np.range(0,60,15) and two intervals. The result is 00 and 30 minute increments.
import pandas as pd
import numpy as np
date_rng = pd.date_range('2022-05-31 00:00:00', freq='1s', periods=43200)
df = pd.DataFrame({'datetime': pd.to_datetime(date_rng), 'value':np.random.randn(43200).cumsum()})
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np
fig, (ax1,ax2,ax3,ax4) = plt.subplots(4, 1,figsize=(12,6), sharex="all")
fig.subplots_adjust(bottom=0.2)
ax1.plot(df.datetime, df['value'], color='b')
ax2.plot(df.datetime, df['value'], color='g')
ax3.plot(df.datetime, df['value'], color='r')
ax4.plot(df.datetime, df['value'], color='k')
minutes = mdates.MinuteLocator(byminute=np.arange(0,60,15),interval=2)
minutes_fmt = mdates.DateFormatter('%d %H:%M')
ax4.xaxis.set_major_locator(minutes)
ax4.xaxis.set_major_formatter(minutes_fmt)
ax4.tick_params(axis='x', labelrotation=45)
plt.show()
I'm am running a fundamental economic analysis and when I get to visualising and charting I am not able to align the dates with the graph.
I wanted the most recent date entry to show on the right and the rest of the dates to show every two years.
I have tried literally everything and cant find the solution.
Here is my code:
%matplotlib inline
import pandas as pd
from matplotlib import pyplot
import matplotlib.dates as mdates
df = pd.read_csv('https://fred.stlouisfed.org/graph/fredgraph.csvbgcolor=%23e1e9f0&chart_type=line&drp=0&fo=open%20sans&graph_bgcolor=%23ffffff&height=450&mode=fred&recession_bars=off&txtcolor=%23444444&ts=12&tts=12&width=1168&nt=0&thu=0&trc=0&show_legend=yes&show_axis_titles=yes&show_tooltip=yes&id=NAEXKP01EZQ657S&scale=left&cosd=1995-04-01&coed=2020-04-01&line_color=%234572a7&link_values=false&line_style=solid&mark_type=none&mw=3&lw=2&ost=-99999&oet=99999&mma=0&fml=a&fq=Quarterly&fam=avg&fgst=lin&fgsnd=2020-02-01&line_index=1&transformation=lin&vintage_date=2020-09-21&revision_date=2020-09-21&nd=1995-04-01')
df = df.set_index('DATE')
df['12MonthAvg'] = df.rolling(window=12).mean().dropna(how='all')
df['9MonthAvg'] = df['12MonthAvg'].rolling(window=12).mean().dropna(how='all')
df['Spread'] = df['12MonthAvg'] - df['9MonthAvg']
pyplot.style.use("seaborn")
pyplot.subplots(figsize=(10, 5), dpi=85)
df['Spread'].plot().set_title('EUROPE: GDP Q Growth Rate (12M/12M Avg Spread)', fontsize=16)
df['Spread'].plot().axhline(0, linestyle='-', color='r',alpha=1, linewidth=2, marker='')
df['Spread'].plot().spines['left'].set_position(('outward', 10))
df['Spread'].plot().spines['bottom'].set_position(('outward', 10))
df['Spread'].plot().spines['right'].set_visible(False)
df['Spread'].plot().spines['top'].set_visible(False)
df['Spread'].plot().yaxis.set_ticks_position('left')
df['Spread'].plot().xaxis.set_ticks_position('bottom')
df['Spread'].plot().text(0.50, 0.02, "Crossing red line downwards / Crossing red line Upwards",
transform=pyplot.gca().transAxes, fontsize=14, ha='center', color='blue')
df['Spread'].plot().fmt_xdata = mdates.DateFormatter('%Y-%m-%d')
print(df['Spread'].tail(3))
pyplot.autoscale()
pyplot.show()
And the output:
This is the raw data:
There is a couple of corrections to your code.
In your URL insert "?" after fredgraph.csv. It starts so called query string,
where bgcolor is the first parameter.
Read your DataFrame with additional parameters:
df = pd.read_csv('...', parse_dates=[0], index_col=[0])
The aim is to:
read Date column as datetime,
set it as the index.
Create additional columns as:
df['12MonthAvg'] = df.NAEXKP01EZQ657S.rolling(window=12).mean()
df['9MonthAvg'] = df.NAEXKP01EZQ657S.rolling(window=9).mean()
df['Spread'] = df['12MonthAvg'] - df['9MonthAvg']
Corrections:
9MonthAvg (as I think) should be computed from the source column,
not from 12MonthAvg,
dropna here is not needed, as you create whole column anyway.
Now is the place to use dropna() on Spread column and save it in
a dedicated variable:
spread = df['Spread'].dropna()
Draw your figure the following way:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
plt.style.use("seaborn")
fig, ax = plt.subplots(figsize=(10, 5), dpi=85)
plt.plot_date(spread.index, spread, fmt='-')
ax.set_title('EUROPE: GDP Q Growth Rate (12M/12M Avg Spread)', fontsize=16)
ax.axhline(0, linestyle='-', color='r',alpha=1, linewidth=2, marker='')
ax.spines['left'].set_position(('outward', 10))
ax.spines['bottom'].set_position(('outward', 10))
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.text(0.50, 0.02, "Crossing red line downwards / Crossing red line Upwards",
transform=ax.transAxes, fontsize=14, ha='center', color='blue')
ax.xaxis.set_major_formatter(mdates.DateFormatter(fmt='%Y-%m-%d'))
plt.show()
Corrections:
plt.subplots returns fig and ax, so I saved them (actually, only ax
is needed).
When one axis contains dates, it is better to use plot_date.
I changed the way DateFormatter is set.
Using the above code I got the following picture:
In Pandas, I am doing:
bp = p_df.groupby('class').plot(kind='kde')
p_df is a dataframe object.
However, this is producing two plots, one for each class.
How do I force one plot with both classes in the same plot?
Version 1:
You can create your axis, and then use the ax keyword of DataFrameGroupBy.plot to add everything to these axes:
import matplotlib.pyplot as plt
p_df = pd.DataFrame({"class": [1,1,2,2,1], "a": [2,3,2,3,2]})
fig, ax = plt.subplots(figsize=(8,6))
bp = p_df.groupby('class').plot(kind='kde', ax=ax)
This is the result:
Unfortunately, the labeling of the legend does not make too much sense here.
Version 2:
Another way would be to loop through the groups and plot the curves manually:
classes = ["class 1"] * 5 + ["class 2"] * 5
vals = [1,3,5,1,3] + [2,6,7,5,2]
p_df = pd.DataFrame({"class": classes, "vals": vals})
fig, ax = plt.subplots(figsize=(8,6))
for label, df in p_df.groupby('class'):
df.vals.plot(kind="kde", ax=ax, label=label)
plt.legend()
This way you can easily control the legend. This is the result:
import matplotlib.pyplot as plt
p_df.groupby('class').plot(kind='kde', ax=plt.gca())
Another approach would be using seaborn module. This would plot the two density estimates on the same axes without specifying a variable to hold the axes as follows (using some data frame setup from the other answer):
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
# data to create an example data frame
classes = ["c1"] * 5 + ["c2"] * 5
vals = [1,3,5,1,3] + [2,6,7,5,2]
# the data frame
df = pd.DataFrame({"cls": classes, "indices":idx, "vals": vals})
# this is to plot the kde
sns.kdeplot(df.vals[df.cls == "c1"],label='c1');
sns.kdeplot(df.vals[df.cls == "c2"],label='c2');
# beautifying the labels
plt.xlabel('value')
plt.ylabel('density')
plt.show()
This results in the following image.
There are two easy methods to plot each group in the same plot.
When using pandas.DataFrame.groupby, the column to be plotted, (e.g. the aggregation column) should be specified.
Use seaborn.kdeplot or seaborn.displot and specify the hue parameter
Using pandas v1.2.4, matplotlib 3.4.2, seaborn 0.11.1
The OP is specific to plotting the kde, but the steps are the same for many plot types (e.g. kind='line', sns.lineplot, etc.).
Imports and Sample Data
For the sample data, the groups are in the 'kind' column, and the kde of 'duration' will be plotted, ignoring 'waiting'.
import pandas as pd
import seaborn as sns
df = sns.load_dataset('geyser')
# display(df.head())
duration waiting kind
0 3.600 79 long
1 1.800 54 short
2 3.333 74 long
3 2.283 62 short
4 4.533 85 long
Plot with pandas.DataFrame.plot
Reshape the data using .groupby or .pivot
.groupby
Specify the aggregation column, ['duration'], and kind='kde'.
ax = df.groupby('kind')['duration'].plot(kind='kde', legend=True)
.pivot
ax = df.pivot(columns='kind', values='duration').plot(kind='kde')
Plot with seaborn.kdeplot
Specify hue='kind'
ax = sns.kdeplot(data=df, x='duration', hue='kind')
Plot with seaborn.displot
Specify hue='kind' and kind='kde'
fig = sns.displot(data=df, kind='kde', x='duration', hue='kind')
Plot
Maybe you can try this:
fig, ax = plt.subplots(figsize=(10,8))
classes = list(df.class.unique())
for c in classes:
df2 = data.loc[data['class'] == c]
df2.vals.plot(kind="kde", ax=ax, label=c)
plt.legend()
I was trying to do a comparison of runtime between Naive matrix multiplication and Strassen's. For this, I was recording the runtime for a different dimension of the matrices. Then I was trying to plot the result in the same graph for the comparison.
But the problem is the plotting is not showing the proper result.
Here is the data...
2 3142
3 3531
4 4756
5 5781
6 8107
The leftmost column is denoting n, the dimension and rightmost column is denoting execution time.
The above data is for Naive method and the data for Strassen is in this pattern too.
I'm inserting this data to a pandas dataframe. And after plotting the data the image looks like this:
Here blue is for Naive and green is for Strassen's
This is certainly not true as Naive cannot be constant. But my code was correct. SO I decided to plot them separately and these are the result:
Naive
Strassen
As you can see it might happen because the scaling in Y axis is not the same?
Is this the reason?
The code I'm implementing for plotting is:
fig = plt.figure()
data_naive = pd.read_csv('naive.txt', sep="\t", header=None)
data_naive.columns = ["n", "time"]
plt.plot(data_naive['n'], data_naive['time'], 'g')
data_strassen = pd.read_csv('strassen.txt', sep="\t", header=None)
data_strassen.columns = ["n", "time"]
plt.plot(data_strassen['n'], data_strassen['time'], 'b')
plt.show()
fig.savefig('figure.png')
What I tried to work out?
fig = plt.figure()
data_naive = pd.read_csv('naive.txt', sep="\t", header=None)
data_naive.columns = ["n", "time"]
data_strassen = pd.read_csv('strassen.txt', sep="\t", header=None)
data_strassen.columns = ["n", "time"]
ax = data_naive.plot(x='n', y='time', c='blue', figsize=(20,10))
data_strassen.plot(x='n', y='time', c='green', figsize=(20,10), ax=ax)
plt.savefig('comparison.png')
plt.show()
But no luck!!!
How to plot them in the same figure without altering their actual orientation?
IIUC: Here is a solution using twinx
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.randint(10, 100, (12,2)))
df[1] = np.random.dirichlet(np.ones(12)*1000., size=1)[0]
fig, ax1 = plt.subplots()
ax1.plot(df[0], color='r')
#Plot the secondary axis in the right side
ax2 = ax1.twinx()
ax2.plot(df[1], color='k')
fig.tight_layout()
plt.show()
Result produced:
I am trying to plot two different variables (linked by a relation of causality), delai_jour and date_sondage on a single FacetGrid. I can do it with this code:
g = sns.FacetGrid(df_verif_sum, col="prefecture", col_wrap=2, aspect=2, sharex=True,)
g = g.map(plt.plot, "date_sondage", "delai_jour", color="m", linewidth=2)
g = g.map(plt.bar, "date_sondage", "impossible")
which gives me this:
FacetGrid
(There are 33 of them in total).
I'm interested in comparing the patterns across the various prefecture, but due to the difference in magnitude I cannot see the changes in the line chart.
For this specific work, the best way to do it is to create a secondary y axis, but I can't seem to make anything work: it doesn't look like it's possible with FacetGrid, and I didn't understand the code not was able to replicate the examples i've seen with pure matplotlib.
How should I go about it?
I got this to work by iterating through the axes and plotting a secondary axis as in a typical Seaborn graph.
Using the OP example:
g = sns.FacetGrid(df_verif_sum, col="prefecture", col_wrap=2, aspect=2, sharex=True)
g = g.map(plt.plot, "date_sondage", "delai_jour", color="m", linewidth=2)
for ax, (_, subdata) in zip(g.axes, df_verif_sum.groupby('prefecture')):
ax2=ax.twinx()
subdata.plot(x='data_sondage',y='impossible', ax=ax2,legend=False,color='r')
If you do any formatting to the x-axis, you may have to do it to both ax and ax2.
Here's an example where you apply a custom mapping function to the dataframe of interest. Within the function, you can call plt.gca() to get the current axis at the facet being currently plotted in FacetGrid. Once you have the axis, twinx() can be called just like you would in plain old matplotlib plotting.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
def facetgrid_two_axes(*args, **kwargs):
data = kwargs.pop('data')
dual_axis = kwargs.pop('dual_axis')
alpha = kwargs.pop('alpha', 0.2)
kwargs.pop('color')
ax = plt.gca()
if dual_axis:
ax2 = ax.twinx()
ax2.set_ylabel('Second Axis!')
ax.plot(data['x'],data['y1'], **kwargs, color='red',alpha=alpha)
if dual_axis:
ax2.plot(df['x'],df['y2'], **kwargs, color='blue',alpha=alpha)
df = pd.DataFrame()
df['x'] = np.arange(1,5,1)
df['y1'] = 1 / df['x']
df['y2'] = df['x'] * 100
df['facet'] = 'foo'
df2 = df.copy()
df2['facet'] = 'bar'
df3 = pd.concat([df,df2])
win_plot = sns.FacetGrid(df3, col='facet', size=6)
(win_plot.map_dataframe(facetgrid_two_axes, dual_axis=True)
.set_axis_labels("X", "First Y-axis"))
plt.show()
This isn't the prettiest plot as you might want to adjust the presence of the second y-axis' label, the spacing between plots, etc. but the code suffices to show how to plot two series of differing magnitudes within FacetGrids.