Using datetime object for a scatter plot? - python

I would like to construct a scatter plot, using date time objects on both axis. Namely, dates (formatted as %YYYY-MM-DD) will be placed on one axis, the second axis will display 24 hour scale (i.e. from 0 to 24) and contain timestamps of events (formatted as %HH:MM in 24-hour format), such a user logging into the server, that occurred on a given date. There could be several events on a particular date, for example, a user logging 2 or 3 times.
My questions: how do I use such datetime objects, while creating a plot using matplotlib? Do I need to convert them in order to feed into matplotlib?

As in https://stackoverflow.com/a/1574146/12540580 :
You must first convert your timestamps to Python datetime objects (use datetime.strptime). Then use date2num to convert the dates to matplotlib format.
Plot the dates and values using plot_date:
dates = matplotlib.dates.date2num(list_of_datetimes)
matplotlib.pyplot.plot_date(dates, values)

Related

Datetime to Time/HH:MM format – investigating events on multiple dates by the time of day

I have a pandas dataframe with a column "Datetime" which has values in pd.Timestamp / np.datetime64 format. How should I extract the hours and minutes while keeping the status of this "HH:MM" as "continuous plottable values?"
I want to plot a histogram of the dataframe column (pd.Series) based on the frequency in "HH:MM sense" in which case the x-axis would range from 00:00 to 23:59 etc.
import pandas as pd
# ...
new_df["Datetime"][0]
> Timestamp('2022-08-08 16:58:00')
I saw examples of extracting the time as a string. Not good enough. I could also use groupby hour and then e.g. plot a bar chart by count but that's not exactly what I was looking for, either...
...or I could convert each row to a string and then immediately back to pd.Timestamp with the same date. It's not ideal, but works. Any better ideas?
I battled with this a bit longer and got it working decently. Is this really the most straightforward way of doing it? The lambda stuff feels always a bit far-fetched, and this one still keeps the full date which isn't a problem per se but not necessary, either (and requires extra formatting on the xaxis).
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
fig, ax = plt.subplots()
plt.xticks(rotation=45)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
# pd.Timestamp convers the date automatically to "today" if YYYYMMDD is not specified
new_df["Datetime"].apply(lambda t:pd.Timestamp(f'{t.hour:02d}:{t.minute:02d}')).hist(ax=ax)

How do I convert unusual time string into date time

I measured the seeing index and I need to plot it as a function of time, but the time I received from the measurement is a string with 02-09-2022_time_11-53-51,045 format. How can I convert it into something Python could read and I could use in my plot?
Using pandas I extracted time and seeing_index columns from the txt file received by the measurement. Python correctly plotted seeing index values on Y axes, but besides plotting time values on the X axis, it just added a number to each row and plotted index against row number. What can I do so it was index against time?
You may try this:
df.time = pd.to_datetime(df.time, format='%d-%m-%Y_time_%H-%M-%S,%f')

How to regulate number of ticks plot?

I have a dataframe with shape (2000, 2). There are two columns: value and date. I want to plot it with date on x axis. But since there are 2000 days there, I want to keep only 10 ticks on x axis. I tried this:
plt.plot(data["Date"], data["Value"])
plt.locator_params(axis='x', nbins=10)
plt.show()
But plot looks like this:
How to fix it?
From your plot, I'm going to assume your problem is that your "Date" column are strings, and not datetimes (or pandas' Timestamp), so matplotlib considers it as categories. If it was datetime-like, matplotlib would automatically select a somewhat-suitable tick spacing:
You would need to convert those string back to datetimes, for example with dateutil.parser
from dateutil import parser
data['Date_dt'] = data['Date'].apply(parser.parse)
or via strptime (the formatting string in args could change depending on your date format)
from datetime import datetime
data['Date_dt'] = data['Date_str'].apply(datetime.strptime, args=['%Y-%m-%d %H:%M:%S'])
If for some obscure reason, you really just want EXACTLY 10 ticks, you could do something along the lines of:
plt.xticks(pd.date_range(data['Date'].min(), data['Date'].max(), periods=10))

String to Date Conversion in Python

In the following piece of code:
df['Year']=pd.DatetimeIndex(df['Date']).year
df['Month']=pd.DatetimeIndex(df['Date']).month
df['Day']=pd.DatetimeIndex(df['Date']).day
df['MM_DD_str']=df['Month'].astype(str).str.zfill(2)+'-'+df['Day'].astype(str).str.zfill(2)
Since I want only MM-DD i did this way and it is a string now. But later on the program I want them in the date format. Especially I need month in order to plot a graph. Can i extract a date by extracting month from it.
Edited:
I want to plot a graph in which the Xtick should have the months like Jan, Feb, Mar upto Dec. I have to extract month from the dataframe df['MM_DD_str'] and make them as tick labels for the graph.
This is the final code i have written for plotting graph:
md_str = df['MM_DD_str']
get_month =md_str.apply(lambda d: pd.to_datetime(d, format='%m-%d').month)
#print(get_month)
plt.xticks(get_month,('Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'))
plt.show()enter code here
I am not getting neither output nor error
If I understand correctly, you currently have a date string, like "06-23" for example, and you later want to extract the month from it as a datetime object:
md_col = df['MM_DD_str']
get_month = lambda d: pd.to_datetime(d, format='%m-%d').month
md_col.apply(get_month)
get_month is a lambda function that takes a string, converts it to a datetime object, and then extracts the month.
.apply() takes a dataframe column and applies a function to all the rows in the column
Note that if your column contains NaNs or strings that cannot be converted to dates, you could include the errors argument in the .to_datetime function:
get_month = lambda d: pd.to_datetime(d, errors='ignore', format='%m-%d').month
I did not understand the question properly but
the df['date'] column could be used to plot the graph since it is already in date-time format
pd.to_datetime() can be used
so lets say
date='2019-05'
date=pd.to_datetime(date)
date.month
EDIT:
Matplotlib needs numeric values to plot on the x axis
when you say plt.xticks() as some string values you cant plot the graph however you can change the labels . so this is an example adjust to your labels
import matplotlib.pyplot as plt
figure=plt.figure()
ax=plt.axes()
df=pd.DataFrame()
months=['june','july','august','september']
dates=['2019-06','2019-07','2019-08','2019-09']
df['dates']=dates
df['values']=[1,4,7,10]
df['dates']=pd.to_datetime(df['dates']) #pd is for pandas
df['values'].plot(ax=ax)
ax.set_xticks([0,1,2,3,4]) #numerical values that get plotted
ax.set_xticklabels(months) #actual labels for those numerical values

Why the plot appears differently between the x-axis use date and the x-axis use list of numbers on matplotlib?

I have stock data that contains the ohlc attribute and I want to make a RSI indicator plot by calculating the close value. Because the stock data is sorted by date, the date must be changed to a number using date2num. But the calculation result of the close attribute becomes a list of RSI values when plotted overlapping.
I think the length of the results of the RSI is not the same as the date length, but after I test by doing len(rsi) == len(df ['date']) show the same length. Then I try not to use the x-axis date but the list of number made by range(0, len(df['date'])) and plot show as I expected.
#get data
df = df.tail(1000)
#covert date
df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'].apply(mdates.date2num)
#make indicator wit TA-Lib
rsi = ta.RSI(df['close'], timeperiod=14)
#plot rsi indicator wit TA-Lib
ax1.plot(df['date'], rsi)
ax2.plot(range(0, len(df['date'])), rsi)
#show chart
plt.show()
I expect the output using the x-axis date to be the same as the x-axis list of numbers
Image that shows the difference
It seems that matplotlib chooses the x-ticks to display (when chosen automatically) to show "round" numbers. So in your case of integers, a tick every 200; in your case of dates, every two months.
You seem to expect the dates to follow the same tick steps as the integers, but this will cause the graph to show arbitrary dates in the middle of the month, which isn't a good default behavior.
If that's the behavior you want, try something of this sort:
rng = range(len(df['date']))
ax2.plot(rng, rsi) # Same as in your example
ax2.set_xlim((rng[0], rng[-1])) # Make sure no ticks outside of range
ax2.set_xticklabels(df['date'].iloc[ax2.get_xticks()]) # Show respective dates in the locations of the integers
This behavior can of course be reversed if you wish to show numbers instead of dates, using the same ticks as the dates, but I'll leave that to you.
After I tried several times, I found the core of the problem. On weekends the data is not recorded so there is a gap on the date. The matplotlib x-axis date will be given a gap on weekends even though there is no data on that day, so the line plot will overlap.
For the solution I haven't found it, but for the time being I use the list of numbers.

Categories

Resources