Switching month numbers to month names on x-axis of histogram matplotlib - python

I have a histogram with data points over one year. Everything works well, except that the month get displayed as a number. I would like to display them with their names. Instead of "01" it should say "January" (also slightly rotated, but I can do that myself.)
How do I tell maplotlib to use the month names?
I already looked in the documentation and other posts, but couldn't make it work.
My code:
fig = plt.figure(figsize=(12,6))
s = fig.add_subplot(111)
s.hist(mydata,bins=120,stacked=True, color=mycolors, alpha=1)
s.xaxis.set_major_locator(mdates.MonthLocator())
s.xaxis.set_major_formatter(mdates.DateFormatter('%m'))
s.legend(legend)
With mydata being a list of dataframes of shape (n,1).
I am new to matplotlib and don't fully understand what is happening in s.xaxis.set_major_formatter(mdates.DateFormatter('%m')) but my guess is this line needs to be modified?
My Code results in the following Graph:
Any help is greatly appreciated :)

Use %B in the DateFormatter to get month names instead of the month numbers that %m gives. You can also use %b to get abbreviated names.

just need to replace the month format in mdates.DateFormatter:
#fig = plt.figure(figsize=(12,6))
#s = fig.add_subplot(111)
#s.hist(mydata,bins=120,stacked=True, color=mycolors, alpha=1)
#s.xaxis.set_major_locator(mdates.MonthLocator())
s.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
OR
s.xaxis.set_major_formatter(mdates.DateFormatter('%B'))
#s.legend(legend)
B = long name
b = abbreviated name

Related

How to create a line plot using the mean of a column and extracting year from a Date column?

Update: I've now managed to solve this. For extracting the year this is what I used,
df['year'] = pd.DatetimeIndex(df['Date']).year
this allowed me to add a new column for the year and then use that column to plot the chart.
sns.lineplot(y="Class", x="year", data=df)
plt.xlabel("Year",fontsize=20)
plt.ylabel("Success Rate",fontsize=20)
plt.show()
Now I managed the right plot chart.
I'm trying a get a line plot using the mean of a column and linking that to extracted value (year) from the date column. However, I can't seem to get the right outcome.
Here's how I extracted the Year value from the date column,
year=[]
def Extract_year(date):
for i in df["Date"]:
year.append(i.split("-")[0])
return year
And here's how plotted the values to create a line plot,
sns.lineplot(y=df['Class'].mean(), x=Extract_year(df))
plt.xlabel("Year",fontsize=20)
plt.ylabel("Success Rate",fontsize=20)
plt.show()
But instead of seeing a trend (see screenshot-1), it only displays a straight line (see screenshot-2) for the mean value. Could someone please explain to me, what I am doing wrong and how can I correct it?
Thanks!
What you are plotting is df['Class'].mean(), that of course is a fixed value. I don't know which time format you're using, but maybe you need to calculate different means for different years
EDIT:
Yes there is:
df = pd.DataFrame({'Date':['2020-01-20','2019-01-20','2022-01-20','2021-01-20','2012-01-20','2013-01-20','2016-01-20','2018-01-20']})
years = pd.to_datetime(df['Date'],format='%Y-%m-%d').dt.year.sort_values().tolist()

The matplotlib chart changes when I change the index in python pandas dataframe

I have a dataset of S&P500 historical prices with the date, the price and other data that i don't need now to solve my problem.
Date Price
0 1981.01 6.19
1 1981.02 6.17
2 1981.03 6.24
3 1981.04 6.25
. . .
and so on till 2020
The date is a float with the year, a dot and the month.
I tried to plot all historical prices with matplotlib.pyplot as plt.
plt.plot(df["Price"].tail(100))
plt.title("S&P500 Composite Historical Data")
plt.xlabel("Date")
plt.ylabel("Price")
This is the result. I used df["Price"].tail(100) so you can see better the difference between the first and the second graph(You are going to see in a sec).
But then I tried to set the index from the one before(0, 1, 2 etc..) to the df["Date"] column in the DataFrame in order to see the date in the x axis.
df = df.set_index("Date")
plt.plot(df["Price"].tail(100))
plt.title("S&P500 Composite Historical Data")
plt.xlabel("Date")
plt.ylabel("Price")
This is the result, and it's quite disappointing.
I have the Date where it should be in the x axis but the problem is that the graph is different from the one before which is the right one.
If you need the dataset to try out the problem here you can find it.
It is called U.S. Stock Markets 1871-Present and CAPE Ratio.
Hope you've understood everything.
Thanks in advance
UPDATE
I found something that could cause the problem. If you look in depth at the date you can see that in month #10 each is written as a float(in the original dataset) like this: example Year:1884 1884.1. The problem occur when you use pd.to_datetime() to transform the Date float series to a Datetime. So the problem could be that the date in the month #10, when converted into a Datetime, become: (example from before) 1884-01-01 which is the first month in the year and it has an effect on the final plot.
SOLUTION
Finally, I solved my problem!
Yes, the error was the one I explain in the UPDATE paragraph, so I decided to add a 0 as a String where the lenght of the Date (as a string) is 6 in order to change, for example: 1884.1 ==> 1884.10
df["len"] = df["Date"].apply(len)
df["Date"] = df["Date"].where(df["len"] == 7, df["Date"] + "0")
Then i drop the len column i've just created.
df.drop(columns="len", inplace=True)
At the end I changed the "Date" to a Datetime with pd.to_datetime
df["Date"] = pd.to_datetime(df["Date"], format='%Y.%m')
df = df.set_index("Date")
And then I plot
df["Price"].tail(100).plot()
plt.title("S&P500 Composite Historical Data")
plt.xlabel("Date")
plt.ylabel("Price")
plt.show()
The easiest way would be to transform the date into an actual datetime index. This way matplotlib will automatically pick it up and plot it accordingly. For example, given your date format, you could do:
df["Date"] = pd.to_datetime(df["Date"].astype(str), format='%Y.%m')
df = df.set_index("Date")
plt.plot(df["Price"].tail(100))
Currently, the first plot you showed is actually plotting the Price column against the index, which seems to be a regular range index from 0 - 1800 and something. You suggested your data starts in 1981, so although each observation is evenly spaced on the x axis (it's spaced at an interval of 1, which is the jump from one index value to the next). That's why the chart looks reasonable. Yet the x-axis values don't.
Now when you set the Date (as float) to be the index, note that you're not evenly covering the interval between, for example, 1981 and 1982. You have evenly spaced values from 1981.1 - 1981.12, but nothing from 1981.12 - 1982. That's why the second chart is also plotted as expected. Setting the index to a DatetimeIndex as described above should remove this issue, as Matplotlib will know how to evenly space the dates along the x-axis.
I think your problem is that your Date is of float type and taking it as an x-axis does exactly what is expected for taking an array of the kind ([2012.01, 2012.02, ..., 2012.12, 2013.01....]) as x-axis. You might convert the Date column to a DateTimeIndex first and then use the built-in pandas plot method:
df["Price"].tail(100).plot()
It is not a good idea to treat df['Date'] as float. It should be converted into pandas datetime64[ns]. This can be achieved using pandas pd.to_datetime method.
Try this:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('ie_data.csv')
df=df[['Date','Price']]
df.dropna(inplace=True)
#converting to pandas datetime format
df['Date'] = df['Date'].astype(str).map(lambda x : x.split('.')[0] + x.split('.')[1])
df['Date'] = pd.to_datetime(df['Date'], format='%Y%m')
df.set_index(['Date'],inplace=True)
#plotting
df.plot() #full data plot
df.tail(100).plot() #plotting just the tail
plt.title("S&P500 Composite Historical Data")
plt.xlabel("Date")
plt.ylabel("Price")
plt.show()
Output:

String to Date Conversion in Python

In the following piece of code:
df['Year']=pd.DatetimeIndex(df['Date']).year
df['Month']=pd.DatetimeIndex(df['Date']).month
df['Day']=pd.DatetimeIndex(df['Date']).day
df['MM_DD_str']=df['Month'].astype(str).str.zfill(2)+'-'+df['Day'].astype(str).str.zfill(2)
Since I want only MM-DD i did this way and it is a string now. But later on the program I want them in the date format. Especially I need month in order to plot a graph. Can i extract a date by extracting month from it.
Edited:
I want to plot a graph in which the Xtick should have the months like Jan, Feb, Mar upto Dec. I have to extract month from the dataframe df['MM_DD_str'] and make them as tick labels for the graph.
This is the final code i have written for plotting graph:
md_str = df['MM_DD_str']
get_month =md_str.apply(lambda d: pd.to_datetime(d, format='%m-%d').month)
#print(get_month)
plt.xticks(get_month,('Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'))
plt.show()enter code here
I am not getting neither output nor error
If I understand correctly, you currently have a date string, like "06-23" for example, and you later want to extract the month from it as a datetime object:
md_col = df['MM_DD_str']
get_month = lambda d: pd.to_datetime(d, format='%m-%d').month
md_col.apply(get_month)
get_month is a lambda function that takes a string, converts it to a datetime object, and then extracts the month.
.apply() takes a dataframe column and applies a function to all the rows in the column
Note that if your column contains NaNs or strings that cannot be converted to dates, you could include the errors argument in the .to_datetime function:
get_month = lambda d: pd.to_datetime(d, errors='ignore', format='%m-%d').month
I did not understand the question properly but
the df['date'] column could be used to plot the graph since it is already in date-time format
pd.to_datetime() can be used
so lets say
date='2019-05'
date=pd.to_datetime(date)
date.month
EDIT:
Matplotlib needs numeric values to plot on the x axis
when you say plt.xticks() as some string values you cant plot the graph however you can change the labels . so this is an example adjust to your labels
import matplotlib.pyplot as plt
figure=plt.figure()
ax=plt.axes()
df=pd.DataFrame()
months=['june','july','august','september']
dates=['2019-06','2019-07','2019-08','2019-09']
df['dates']=dates
df['values']=[1,4,7,10]
df['dates']=pd.to_datetime(df['dates']) #pd is for pandas
df['values'].plot(ax=ax)
ax.set_xticks([0,1,2,3,4]) #numerical values that get plotted
ax.set_xticklabels(months) #actual labels for those numerical values

Sanitizing Time Series whose plots shows erratic graph lines

I want to plot timelines, my dates are formatted as day/month/year.
When building the index, I take care of that:
# format Date
test['DATA'] = pd.to_datetime(test['DATA'], format='%d/%m/%Y')
test.set_index('DATA', inplace=True)
and with a double check I see months and days are correctly interpreted:
#the number of month reflect the month, not the day : correctly imported!
test['Year'] = test.index.year
test['Month'] = test.index.month
test['Weekday Name'] = test.index.weekday_name
However, when I plot, I see datapoints get connected erratically (although their distribution seems to be correct, since I expect a seasonality):
# Start and end of the date range to extract
start, end = '2018-01', '2018-04'
# Plot daily, weekly resampled, and 7-day rolling mean time series together
fig, ax = plt.subplots()
ax.plot(test.loc['2018', 'TMIN °C'],
marker='.', linestyle='-', linewidth=0.5, label='Daily')
I suspect it may have to do with misinterpreted dates, or that dates are not put in the right sequence, but could not find a way to verify where an error may be.
Could you help validating how to import correctly my timeseries ?
Oh, it was super simple. I assumed datetime was automatically sorted, instead one must sort :
test.loc['2018-01':'2018-03'].sort_index().index #sorted
test.loc['2018-01':'2018-03'].index #not sorted
This question may be delated or marked as duplicate, I let it for moderators:
Pandas - Sorting a dataframe by using datetimeindex

Why the plot appears differently between the x-axis use date and the x-axis use list of numbers on matplotlib?

I have stock data that contains the ohlc attribute and I want to make a RSI indicator plot by calculating the close value. Because the stock data is sorted by date, the date must be changed to a number using date2num. But the calculation result of the close attribute becomes a list of RSI values when plotted overlapping.
I think the length of the results of the RSI is not the same as the date length, but after I test by doing len(rsi) == len(df ['date']) show the same length. Then I try not to use the x-axis date but the list of number made by range(0, len(df['date'])) and plot show as I expected.
#get data
df = df.tail(1000)
#covert date
df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'].apply(mdates.date2num)
#make indicator wit TA-Lib
rsi = ta.RSI(df['close'], timeperiod=14)
#plot rsi indicator wit TA-Lib
ax1.plot(df['date'], rsi)
ax2.plot(range(0, len(df['date'])), rsi)
#show chart
plt.show()
I expect the output using the x-axis date to be the same as the x-axis list of numbers
Image that shows the difference
It seems that matplotlib chooses the x-ticks to display (when chosen automatically) to show "round" numbers. So in your case of integers, a tick every 200; in your case of dates, every two months.
You seem to expect the dates to follow the same tick steps as the integers, but this will cause the graph to show arbitrary dates in the middle of the month, which isn't a good default behavior.
If that's the behavior you want, try something of this sort:
rng = range(len(df['date']))
ax2.plot(rng, rsi) # Same as in your example
ax2.set_xlim((rng[0], rng[-1])) # Make sure no ticks outside of range
ax2.set_xticklabels(df['date'].iloc[ax2.get_xticks()]) # Show respective dates in the locations of the integers
This behavior can of course be reversed if you wish to show numbers instead of dates, using the same ticks as the dates, but I'll leave that to you.
After I tried several times, I found the core of the problem. On weekends the data is not recorded so there is a gap on the date. The matplotlib x-axis date will be given a gap on weekends even though there is no data on that day, so the line plot will overlap.
For the solution I haven't found it, but for the time being I use the list of numbers.

Categories

Resources