I was given the following task: "Make a histogram plot with 30 bins for the selected daily mean temperatures using all data of the years 1981-2010. Then repeat this and add a histogram into the same plot for the daily mean temperatures of the years 1961-1990."
I have my avgt and yearly data imported. I know how to plot avgt.
plt.hist(avgt, bins=30, edgecolor='black')
plt.xlabel('Average T')
plt.ylabel('Times Occured')
But how do I get it to just be within the years of 1981-2010. They are part of the same dataset I believe.
Related
Hi I have a data frame containing students and their graduation year. Column 1 is the student name, such as Peter, Henry... And column 2 is the student's graduation year, such as 2023, 2024.
I tried to build a histogram to count the number of students in each graduation year, and display the year on the x-axis.
I tried this code:
'''
import matplotlib.pyplot as plt
plt.figure()
plt.hist(df['Student Grad Year'])
'''
But it doesn't give the right result, not sure why the last two bars are connected. I also want to show the year value in the middle of each bar. Note that, the value in 'Grad Year' column is int. Should it be converted to datetime type first?
The default number of bins is 10. Since the data is discrete, you can just set the bins to the years that appear in the data set. However, you have to specify the bins' boundaries, which would be half years, e.g. like this:
import matplotlib.pyplot as plt
plt.figure()
years = np.unique(df['Student Grad Year'].values)
bins = np.append(years[0] - 0.5, years + 0.5)
plt.hist(df['Student Grad Year'], bins=bins)
i'm trying to plot against time but the time in my dataframe has become the index
is there a way to plot against the index?
In an experiment I conducted, I gathered the temperature data every 2 seconds. The experiment lasted over 3000s.
I tried plotting my findings with matplotlib with this sample code, after previously having imported each csv column into separate lists.
plt.plot(time, temperature)
plt.xlabel('Time' + r'$\left(s\right)$')
plt.ylabel('Temperature' + r'$\left(C\right)$')
# plt.xticks(np.arange(0, 3500, 500.0))
# plt.yticks(np.arange(0, 20, 2))
# plt.style.use('fivethirtyeight')
plt.show()
My result is this:
Graph
How can I improve this:
in order to make it smoother (maybe be experiment design - every 1 seconds data collection)
in order to make it more scientific (adding legend and writing celsius symbol instead of C for temperature units)
Any other helpful suggestions are welcome.
Edit: Sample Data
Time,Temperature
0,19.77317518
2,19.77317518
4,19.77317518
6,19.77317518
8,19.77317518
10,19.77317518
12,19.77317518
14,19.77317518
16,19.77317518
18,19.77317518
...
40,19.36848822
42,19.36848822
44,20.379735
46,20.17760174
48,20.379735
In order to add Celcius to y label:
plt.ylabel('Temperature ($^\circ$C)')
In order to smooth it, you should first use only markers
plt.plot(time, temperature, '.')
Matplotlib perform linear interpolation between 2 points this is why you have those "jumps"
If you want to fit a smooth line to the data check the following link:
How to smooth a curve in the right way?
relatively new python user here.
I'm trying to build a histogram of stock index annual returns.
My goal is to have a graph which looks like this:
Can someone help me how to come up with this sort of stacked bars in the histogram with the labels for the years?
So far I have the following:
plt.figure()
plt.hist(returns, bins=[-0.6,-0.5,-0.4,-0.3,-0.2,-0.1,0,0.1,0.2,0.3,0.4,0.5,0.6,0.7])
plt.xticks(np.arange(-0.6, 0.7, 0.1))
plt.xlabel("Returns DJIA in %")
plt.ylabel("Number of observations (Probability in %)")
Thanks in advance!
I'm trying to make a density plot of the hourly demand:
data
The 'hr' means different hours, 'cnt' means demand.
I know how to make a density plot such as:
sns.kdeplot(bike['hr'])
However, this only works when the demand for different hours is unknown. Thus I can count each hour as its demand. Now I know the demand count of each hour, how I can make a density plot of such data?
A density plot aims to show an estimate of a distribution. To make a graph showing the density of hourly demand, we would really expect to see many iid samples of demand, with time-stamps, i.e. one row per sample. Then a density plot would make sense.
But in the type of data here, where the demand ('cnt') is sampled regularly and aggregated over that sample period (the hour), a density plot is not directly meaningful. But a bar graph as a histogram does make sense, using the hours as the bins.
Below I show how to use pandas functions to produce such a plot -- really simple. For reference I also show how we might produce a density plot, through a sort of reconstruction of "original" samples.
df = pd.read_csv("../data/hour.csv") # load dataset, inc cols hr, cnt, no NaNs
# using the bar plotter built in to pandas objects
fig, ax = plt.subplots(1,2)
df.groupby('hr').agg({'cnt':sum}).plot.bar(ax=ax[0])
# reconstructed samples - has df.cnt.sum() rows, each one containing an hour of a rental.
samples = np.hstack([ np.repeat(h, df.cnt.iloc[i]) for i, h in enumerate(df.hr)])
# plot a density estimate
sns.kdeplot(samples, bw=0.5, lw=3, c="r", ax=ax[1])
# to make a useful comparison with a density estimate, we need to have our bar areas
# sum up to 1, so we use groupby.apply to divide by the total of all counts.
tot = float(df.cnt.sum())
df.groupby('hr').apply(lambda x: x['cnt'].sum()/tot).plot.bar(ax=ax[1], color='C0')
Demand for bikes seems to be low during the night... But it is also apparent that they are probably used for commuting, with peaks at hours 8am and 5-6pm.