Creating a histogram of Yearly Returns - python

I am trying to finish a task for a project and my task is to create a histogram of yearly returns of Dow Jones historical returns. I have uploaded a picture of the task and my progress below. The problem I have at this point is that I can't find a way to separate the years in the histogram as it shows in the task and I don't know how to modify the y-axix and the legend to show the information that is showing in the first picture.
Any help is appreciated
What I am trying to make and My progress so far
Here is my code:
# Importing packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import datetime
#setting the order
order=[-60,-50,-40,-30,-20,-10,
0,10,20,30,40,50,60,70]
#getting the data
dow_jones_returns = pd.read_csv('data/dow-jones-by-year-historical-annual-returns (2).csv')
dow_jones=pd.DataFrame(data=dow_jones_returns)
dow_jones['date']=pd.to_datetime(dow_jones['date'])
dow_jones['date']=pd.DatetimeIndex(dow_jones['date']).year
pd.to_numeric(dow_jones.value)
up_to_2019=dow_jones.iloc[0:99]
lastyear= dow_jones.iloc[-1]
#ploting the histogram
fig = plt.figure()
up_to_2019['value'].plot.hist(bins = order)
plt.show()

Hi to just give you some further directions,
Regarding the Textbox
the textbox looks like it contains the summary statistics of DataFrame.describe() + a few additional ones. You can create a textbox by utilzing a combination of .text() and .subplot()
I found this guide to be very useful for creating a textbox in a plot
Since we dont have the data,
here a pseudo code:
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
textstr = str(up_to_2019['value'].describe())
ax.hist(up_to_2019['value'], bins = order)
# these are matplotlib.patch.Patch properties
props = dict(boxstyle='round', facecolor='wheat', alpha=0.5)
# place a text box in upper left in axes coords
ax.text(0.05, 0.95, textstr, transform=ax.transAxes, fontsize=10,
verticalalignment='top', bbox=props)
plt.show()
Regarding the y-axis:
1) Here is how you set the right label: plt.ylabel("Number of Observations\n(Probability in%)")
2) Than add the Ticks plt.yticks(np.arange(1,27))
Regarding the labels inside the bins
Thats rather tricky, one option, though definitely not advised would to also include the labels via the .text() method. I dont know if it helps but here is how you do this in R.
Also might helpful are these two links:
how-to-add-a-text-into-a-rectangle
Change color for the patches in a hist
Apparently calling plt.hist() has three return values one of which is callled patches. You can iterate over patches and i.e. change the color of these (see the link above) however I couldn't figure how to put a text to them.
import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
x = [21,22,23,4,5,6,77,8,9,10,31,32,33,34,35,36,37,18,49,50,100]
num_bins = 5
n, bins, patches = plt.hist(x, num_bins, facecolor='blue', alpha=0.5)
for i,pat in enumerate(patches):
pat.set_test("Test") #this doesnt work sadly

Related

How to draw a 3D grid using matplotlib based on three columns of data?

I'm facing a problem with making a 3D plot. I want to build a 3D surface plot like below from three columns of data.
Expected graphic case
I have implemented a few currently, as shown below.
Current picture case
But I still don't know how to make it "grid" like the first picture? Does anyone know how to achieve this? Part of the code and full data are as follows.
import numpy as np
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import os
import warnings
from mpl_toolkits.mplot3d import Axes3D
warnings.filterwarnings('ignore')
os.chdir(r"E:\SoftwareFile\stataFile")
matplotlib.use('TkAgg')
plt.figure(figsize=(10,6))
data = pd.read_stata(r"E:\SoftwareFile\stataFile\demo.dta")
ax = plt.axes(projection="3d")
ax.plot_trisurf(data["age"], data["weight"], data["pr_highbp"],
cmap=plt.cm.Spectral_r)
ax.set_xticks(np.arange(20, 90, step=10))
ax.set_yticks(np.arange(40, 200, step=40))
ax.set_zticks(np.arange( 0, 1.2, step=0.2))
ax.set_title("Probability of Hypertension by Age and Weight")
ax.set_xlabel("Age (years)")
ax.set_ylabel("Weight (kg")
ax.zaxis.set_rotate_label(False)
ax.set_zlabel("Probability of Hypertension", rotation=90)
ax.view_init(elev=30, azim=240)
plt.savefig("demo.png", dpi=1200)
Download all data
Sincerely appreciate your help
Remove the colormap and opacity in the trisurf command like so:
ax.plot_trisurf(
data["age"],
data["weight"],
data["pr_highbp"],
color=None,
linewidth=1,
antialiased=True,
edgecolor="Black",
alpha=0,
)
That should result in:
You could also take a look at plot_wireframe(). For that I think you have to start with
x = data["age"].to_list()
y = data["weight"].to_list()
X, Y = np.meshgrid(x, y)
But I'm not sure how to create the z coordinate. It seems you may need interpolation from what I read.

How to add multiple histograms in a figure using Matplotlib?

I'm using matplotlib to plot many histograms in one plot successfully:
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(1)
for i in range(0, 6):
data = np.random.normal(size=1000)
plt.hist(data, bins=30, alpha = 0.5)
plt.show()
However, I wish to export this plot in a pdf, using PdfPages. I want to add each histogram in a separate page, which I successfully do like this:
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.backends.backend_pdf import PdfPages
np.random.seed(1)
fig = []
with PdfPages("exported_data.pdf") as pdf:
for i in range(0, 6):
data = np.random.normal(size=1000)
fig.append(plt.hist(data, bins=30, alpha = 0.5))
pdf.savefig(fig[i])
plt.close()
But I want another, 7th page with all the plots in one figure as shown above. How do I add many histograms in the same figure (so I can then add in the pdf page)? I see many tutorials on how to plot a grid of histograms within a figure but I haven't found one with all the histograms in one plot added to a figure.
Thanks,
Stam
You can run the loop to plot all histograms together (your first code snippet) after having run the loop to plot them separately (your second code snippet). Here is an example where the random arrays are saved in the datasets list during the first loop to be able to plot them together in the second loop. This solution works by using plt.gcf() which returns the current figure.
import numpy as np # v 1.19.2
import matplotlib.pyplot as plt # v 3.3.4
from matplotlib.backends.backend_pdf import PdfPages
np.random.seed(1)
datasets = []
with PdfPages("exported_data.pdf") as pdf:
# Plot each histogram, adding each figure to the pdf
for i in range(6):
datasets.append(np.random.normal(size=1000))
plt.hist(datasets[i], bins=30, alpha = 0.5)
pdf.savefig(plt.gcf())
plt.close()
# Plot histograms together using a loop then add the completed figure to the pdf
for data in datasets:
plt.hist(data, bins=30, alpha = 0.5)
pdf.savefig(plt.gcf())
plt.close()

pyplot order of magnitude fontsize modification when using scientific ticks python [duplicate]

I am attempting to plot differential cross-sections of nuclear decays and so the magnitudes of the y-axis are around 10^-38 (m^2) pylab as default plots the axis as 0.0,0.2,0.4... etc and has a '1e-38' at the top of the y-axis.
I need to increase the font size of just this little bit, I have tried adjusting the label size
py.tick_params(axis='y', labelsize=20)
but this only adjusts the labels 0.0,0.2,0.4....
Many thanks for all help
You can access the text object using the ax.yaxis.get_offset_text().
import numpy as np
import matplotlib.pyplot as plt
# Generate some data
N = 10
x = np.arange(N)
y = np.array([i*(10**-38) for i in x])
fig, ax = plt.subplots()
# Plot the data
ax.plot(x,y)
# Get the text object
text = ax.yaxis.get_offset_text()
# Set the size.
text.set_size(30) # Overkill!
plt.show()
I've written the solution above using matplotlib.pyplot rather than pylab though if you absolutely have to use pylab then it can be changed (though I'd recommend you use matplotlib.pyplot in any case as they are pretty much identical you can just do a lot more with pyplot easier).
Edit
If you were to use pylab then the code would be:
pylab.plot(x, y)
ax = pylab.gca() # Gets the current axis object
text = ax.yaxis.get_offset_text() # Get the text object
text.set_size(30) # # Set the size.
pylab.show()
An example plot with an (overkill!) offset text.

Remove grid lines, but keep frame (ggplot2 style in matplotlib)

Using Matplotlib I'd like to remove the grid lines inside the plot, while keeping the frame (i.e. the axes lines). I've tried the code below and other options as well, but I can't get it to work. How do I simply keep the frame while removing the grid lines?
I'm doing this to reproduce a ggplot2 plot in matplotlib. I've created a MWE below. Be aware that you need a relatively new version of matplotlib to use the ggplot2 style.
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import pylab as P
import numpy as np
if __name__ == '__main__':
values = np.random.uniform(size=20)
plt.style.use('ggplot')
fig = plt.figure()
_, ax1 = P.subplots()
weights = np.ones_like(values)/len(values)
plt.hist(values, bins=20, weights=weights)
ax1.set_xlabel('Value')
ax1.set_ylabel('Probability')
ax1.grid(b=False)
#ax1.yaxis.grid(False)
#ax1.xaxis.grid(False)
ax1.set_axis_bgcolor('white')
ax1.set_xlim([0,1])
P.savefig('hist.pdf', bbox_inches='tight')
OK, I think this is what you are asking (but correct me if I misunderstood):
You need to change the colour of the spines. You need to do this for each spine individually, using the set_color method:
for spine in ['left','right','top','bottom']:
ax1.spines[spine].set_color('k')
You can see this example and this example for more about using spines.
However, if you have removed the grey background and the grid lines, and added the spines, this is not really in the ggplot style any more; is that really the style you want to use?
EDIT
To make the edge of the histogram bars touch the frame, you need to either:
Change your binning, so the bin edges go to 0 and 1
n,bins,patches = plt.hist(values, bins=np.linspace(0,1,21), weights=weights)
# Check, by printing bins:
print bins[0], bins[-1]
# 0.0, 1.0
If you really want to keep the bins to go between values.min() and values.max(), you would need to change your plot limits to no longer be 0 and 1:
n,bins,patches = plt.hist(values, bins=20, weights=weights)
ax.set_xlim(bins[0],bins[-1])

Getting legend in seaborn jointplot

I'm interested in using the seaborn joint plot for visualizing correlation between two numpy arrays. I like the visual distinction that the kind='hex' parameter gives, but I would also like to know the actual count that different shades correspond to. Does anyone know how to put this legend on the side or even on the plot? I tried looking at the documentation and couldn't find it.
Thanks!
EDIT: updated to work with new Seaborn ver.
You need to do it manually by making a new axis with add_axes and then pass the name of the ax to plt.colorbar().
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
x = np.random.normal(0.0, 1.0, 1000)
y = np.random.normal(0.0, 1.0, 1000)
hexplot = sns.jointplot(x, y, kind="hex")
plt.subplots_adjust(left=0.2, right=0.8, top=0.8, bottom=0.2) # shrink fig so cbar is visible
# make new ax object for the cbar
cbar_ax = hexplot.fig.add_axes([.85, .25, .05, .4]) # x, y, width, height
plt.colorbar(cax=cbar_ax)
plt.show()
Sources: I almost gave up after I read a dev say that the
"work/benefit ratio [to implement colorbars] is too high"
but then I eventually found this solution in another issue.
The following has worked for me:
t1 = sns.jointplot(data=df, x="originalestimate_hours", y="working_hours_per_day_created_target", hue="status")
t1.ax_joint.legend_._visible=False
t1.fig.legend(bbox_to_anchor=(1, 1), loc=2)

Categories

Resources