How to regulate number of ticks plot? - python

I have a dataframe with shape (2000, 2). There are two columns: value and date. I want to plot it with date on x axis. But since there are 2000 days there, I want to keep only 10 ticks on x axis. I tried this:
plt.plot(data["Date"], data["Value"])
plt.locator_params(axis='x', nbins=10)
plt.show()
But plot looks like this:
How to fix it?

From your plot, I'm going to assume your problem is that your "Date" column are strings, and not datetimes (or pandas' Timestamp), so matplotlib considers it as categories. If it was datetime-like, matplotlib would automatically select a somewhat-suitable tick spacing:
You would need to convert those string back to datetimes, for example with dateutil.parser
from dateutil import parser
data['Date_dt'] = data['Date'].apply(parser.parse)
or via strptime (the formatting string in args could change depending on your date format)
from datetime import datetime
data['Date_dt'] = data['Date_str'].apply(datetime.strptime, args=['%Y-%m-%d %H:%M:%S'])
If for some obscure reason, you really just want EXACTLY 10 ticks, you could do something along the lines of:
plt.xticks(pd.date_range(data['Date'].min(), data['Date'].max(), periods=10))

Related

Datetime to Time/HH:MM format – investigating events on multiple dates by the time of day

I have a pandas dataframe with a column "Datetime" which has values in pd.Timestamp / np.datetime64 format. How should I extract the hours and minutes while keeping the status of this "HH:MM" as "continuous plottable values?"
I want to plot a histogram of the dataframe column (pd.Series) based on the frequency in "HH:MM sense" in which case the x-axis would range from 00:00 to 23:59 etc.
import pandas as pd
# ...
new_df["Datetime"][0]
> Timestamp('2022-08-08 16:58:00')
I saw examples of extracting the time as a string. Not good enough. I could also use groupby hour and then e.g. plot a bar chart by count but that's not exactly what I was looking for, either...
...or I could convert each row to a string and then immediately back to pd.Timestamp with the same date. It's not ideal, but works. Any better ideas?
I battled with this a bit longer and got it working decently. Is this really the most straightforward way of doing it? The lambda stuff feels always a bit far-fetched, and this one still keeps the full date which isn't a problem per se but not necessary, either (and requires extra formatting on the xaxis).
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
fig, ax = plt.subplots()
plt.xticks(rotation=45)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
# pd.Timestamp convers the date automatically to "today" if YYYYMMDD is not specified
new_df["Datetime"].apply(lambda t:pd.Timestamp(f'{t.hour:02d}:{t.minute:02d}')).hist(ax=ax)

Using datetime object for a scatter plot?

I would like to construct a scatter plot, using date time objects on both axis. Namely, dates (formatted as %YYYY-MM-DD) will be placed on one axis, the second axis will display 24 hour scale (i.e. from 0 to 24) and contain timestamps of events (formatted as %HH:MM in 24-hour format), such a user logging into the server, that occurred on a given date. There could be several events on a particular date, for example, a user logging 2 or 3 times.
My questions: how do I use such datetime objects, while creating a plot using matplotlib? Do I need to convert them in order to feed into matplotlib?
As in https://stackoverflow.com/a/1574146/12540580 :
You must first convert your timestamps to Python datetime objects (use datetime.strptime). Then use date2num to convert the dates to matplotlib format.
Plot the dates and values using plot_date:
dates = matplotlib.dates.date2num(list_of_datetimes)
matplotlib.pyplot.plot_date(dates, values)

The matplotlib chart changes when I change the index in python pandas dataframe

I have a dataset of S&P500 historical prices with the date, the price and other data that i don't need now to solve my problem.
Date Price
0 1981.01 6.19
1 1981.02 6.17
2 1981.03 6.24
3 1981.04 6.25
. . .
and so on till 2020
The date is a float with the year, a dot and the month.
I tried to plot all historical prices with matplotlib.pyplot as plt.
plt.plot(df["Price"].tail(100))
plt.title("S&P500 Composite Historical Data")
plt.xlabel("Date")
plt.ylabel("Price")
This is the result. I used df["Price"].tail(100) so you can see better the difference between the first and the second graph(You are going to see in a sec).
But then I tried to set the index from the one before(0, 1, 2 etc..) to the df["Date"] column in the DataFrame in order to see the date in the x axis.
df = df.set_index("Date")
plt.plot(df["Price"].tail(100))
plt.title("S&P500 Composite Historical Data")
plt.xlabel("Date")
plt.ylabel("Price")
This is the result, and it's quite disappointing.
I have the Date where it should be in the x axis but the problem is that the graph is different from the one before which is the right one.
If you need the dataset to try out the problem here you can find it.
It is called U.S. Stock Markets 1871-Present and CAPE Ratio.
Hope you've understood everything.
Thanks in advance
UPDATE
I found something that could cause the problem. If you look in depth at the date you can see that in month #10 each is written as a float(in the original dataset) like this: example Year:1884 1884.1. The problem occur when you use pd.to_datetime() to transform the Date float series to a Datetime. So the problem could be that the date in the month #10, when converted into a Datetime, become: (example from before) 1884-01-01 which is the first month in the year and it has an effect on the final plot.
SOLUTION
Finally, I solved my problem!
Yes, the error was the one I explain in the UPDATE paragraph, so I decided to add a 0 as a String where the lenght of the Date (as a string) is 6 in order to change, for example: 1884.1 ==> 1884.10
df["len"] = df["Date"].apply(len)
df["Date"] = df["Date"].where(df["len"] == 7, df["Date"] + "0")
Then i drop the len column i've just created.
df.drop(columns="len", inplace=True)
At the end I changed the "Date" to a Datetime with pd.to_datetime
df["Date"] = pd.to_datetime(df["Date"], format='%Y.%m')
df = df.set_index("Date")
And then I plot
df["Price"].tail(100).plot()
plt.title("S&P500 Composite Historical Data")
plt.xlabel("Date")
plt.ylabel("Price")
plt.show()
The easiest way would be to transform the date into an actual datetime index. This way matplotlib will automatically pick it up and plot it accordingly. For example, given your date format, you could do:
df["Date"] = pd.to_datetime(df["Date"].astype(str), format='%Y.%m')
df = df.set_index("Date")
plt.plot(df["Price"].tail(100))
Currently, the first plot you showed is actually plotting the Price column against the index, which seems to be a regular range index from 0 - 1800 and something. You suggested your data starts in 1981, so although each observation is evenly spaced on the x axis (it's spaced at an interval of 1, which is the jump from one index value to the next). That's why the chart looks reasonable. Yet the x-axis values don't.
Now when you set the Date (as float) to be the index, note that you're not evenly covering the interval between, for example, 1981 and 1982. You have evenly spaced values from 1981.1 - 1981.12, but nothing from 1981.12 - 1982. That's why the second chart is also plotted as expected. Setting the index to a DatetimeIndex as described above should remove this issue, as Matplotlib will know how to evenly space the dates along the x-axis.
I think your problem is that your Date is of float type and taking it as an x-axis does exactly what is expected for taking an array of the kind ([2012.01, 2012.02, ..., 2012.12, 2013.01....]) as x-axis. You might convert the Date column to a DateTimeIndex first and then use the built-in pandas plot method:
df["Price"].tail(100).plot()
It is not a good idea to treat df['Date'] as float. It should be converted into pandas datetime64[ns]. This can be achieved using pandas pd.to_datetime method.
Try this:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('ie_data.csv')
df=df[['Date','Price']]
df.dropna(inplace=True)
#converting to pandas datetime format
df['Date'] = df['Date'].astype(str).map(lambda x : x.split('.')[0] + x.split('.')[1])
df['Date'] = pd.to_datetime(df['Date'], format='%Y%m')
df.set_index(['Date'],inplace=True)
#plotting
df.plot() #full data plot
df.tail(100).plot() #plotting just the tail
plt.title("S&P500 Composite Historical Data")
plt.xlabel("Date")
plt.ylabel("Price")
plt.show()
Output:

String to Date Conversion in Python

In the following piece of code:
df['Year']=pd.DatetimeIndex(df['Date']).year
df['Month']=pd.DatetimeIndex(df['Date']).month
df['Day']=pd.DatetimeIndex(df['Date']).day
df['MM_DD_str']=df['Month'].astype(str).str.zfill(2)+'-'+df['Day'].astype(str).str.zfill(2)
Since I want only MM-DD i did this way and it is a string now. But later on the program I want them in the date format. Especially I need month in order to plot a graph. Can i extract a date by extracting month from it.
Edited:
I want to plot a graph in which the Xtick should have the months like Jan, Feb, Mar upto Dec. I have to extract month from the dataframe df['MM_DD_str'] and make them as tick labels for the graph.
This is the final code i have written for plotting graph:
md_str = df['MM_DD_str']
get_month =md_str.apply(lambda d: pd.to_datetime(d, format='%m-%d').month)
#print(get_month)
plt.xticks(get_month,('Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'))
plt.show()enter code here
I am not getting neither output nor error
If I understand correctly, you currently have a date string, like "06-23" for example, and you later want to extract the month from it as a datetime object:
md_col = df['MM_DD_str']
get_month = lambda d: pd.to_datetime(d, format='%m-%d').month
md_col.apply(get_month)
get_month is a lambda function that takes a string, converts it to a datetime object, and then extracts the month.
.apply() takes a dataframe column and applies a function to all the rows in the column
Note that if your column contains NaNs or strings that cannot be converted to dates, you could include the errors argument in the .to_datetime function:
get_month = lambda d: pd.to_datetime(d, errors='ignore', format='%m-%d').month
I did not understand the question properly but
the df['date'] column could be used to plot the graph since it is already in date-time format
pd.to_datetime() can be used
so lets say
date='2019-05'
date=pd.to_datetime(date)
date.month
EDIT:
Matplotlib needs numeric values to plot on the x axis
when you say plt.xticks() as some string values you cant plot the graph however you can change the labels . so this is an example adjust to your labels
import matplotlib.pyplot as plt
figure=plt.figure()
ax=plt.axes()
df=pd.DataFrame()
months=['june','july','august','september']
dates=['2019-06','2019-07','2019-08','2019-09']
df['dates']=dates
df['values']=[1,4,7,10]
df['dates']=pd.to_datetime(df['dates']) #pd is for pandas
df['values'].plot(ax=ax)
ax.set_xticks([0,1,2,3,4]) #numerical values that get plotted
ax.set_xticklabels(months) #actual labels for those numerical values

Set xticks interval with date values

Hello I have a a list that has about 150 dates that are stored in string format. I would like to set an interval so that there are only 10 ticks along the x-axis I am not sure how to do this without changing the type format.
'1980-06',
'1980-09',
'1980-12',
'1981-03',
'1981-06',
'1981-09',
'1981-12',
...
You can provide the ticks list you want to show. To get that ticks list, just divide your date list into ten parts and find how far each tick should be. And then use Python's indexing to get the values of ticks. Check below:
import math
dates = ['1980-06', '1980-09', '1980-12', '1981-12' ...] # Your date list
date_len = len(dates)
step = int(math.ceil(date_len/10))
ticks = dates[::step] # ticks to show on graph

Categories

Resources