I am trying to convert HH:MM into the datetime format. It converts but it adds an unwanted year 1900. I don't know why?
My code:
df['HH:MM'] =
datetime
2019-10-01 08:19:40 08:19:40
2019-10-01 08:20:15 08:20:15
2019-10-01 08:21:29 08:21:29
2019-10-01 08:22:39 08:22:39
2019-10-01 08:29:07 08:29:07
Name: HH:MM, Length: 5, dtype: object
df['HH:MM'] = pd.to_datetime(cdf['HH:MM'], format = '%H:%M:%S', errors='ignore')
Present output
df['HH:MM'] =
datetime
2019-10-01 08:19:40 1900-01-01 08:19:40
2019-10-01 08:20:15 1900-01-01 08:20:15
2019-10-01 08:21:29 1900-01-01 08:21:29
2019-10-01 08:22:39 1900-01-01 08:22:39
2019-10-01 08:29:07 1900-01-01 08:29:07
Name: HH:MM, Length: 5, dtype: datetime64[ns]
Why I need this?
I am plotting HH:MM on the x-axis and value on the y-axis. The x-axis ticks look crazy and we cannot read even after I used plt.gcf().autofmt_xdate().
autofmt_xdate() will not work if the values are of type str, instead change the type to datetime and manipulate the xaxis through a Locator and a Formatter:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt, dates as mdates
np.random.seed(15)
dr = pd.date_range('2019-10-01 00:00:00', '2019-10-01 23:59', freq='1T')
df = pd.DataFrame({'HH:MM': dr.strftime('%H:%M'),
'y': np.random.random(len(dr)) * 10},
index=dr)
df['HH:MM'] = pd.to_datetime(df['HH:MM'])
ax = df.plot(kind='scatter', x='HH:MM', y='y', rot=45)
ax.xaxis.set_major_locator(mdates.HourLocator(interval=2))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
plt.tight_layout()
plt.show()
Sample Data and imports:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt, dates as mdates
np.random.seed(15)
dr = pd.date_range('2019-10-01 00:00:00', '2019-10-01 23:59', freq='1T')
df = pd.DataFrame({'HH:MM': dr.strftime('%H:%M'),
'y': np.random.random(len(dr)) * 10},
index=dr)
df:
HH:MM y
2019-10-01 00:00:00 00:00 8.488177
2019-10-01 00:01:00 00:01 1.788959
2019-10-01 00:02:00 00:02 0.543632
2019-10-01 00:03:00 00:03 3.615384
2019-10-01 00:04:00 00:04 2.754009
Convert 'HH:MM' to_datetime then plot:
df['HH:MM'] = pd.to_datetime(df['HH:MM'])
ax = df.plot(kind='scatter', x='HH:MM', y='y', rot=45)
To adjust the number of ticks and format in 'HH:MM' format set the Date Locator and the Date Formatter:
ax.xaxis.set_major_locator(mdates.HourLocator(interval=2))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
Adjust the type of locator or interval to increase or decrease the number of ticks.
Related
I'm working with a dataset that only contains datetime objects and I have retrieved the day of the week and reformatted the time in a separate column like this (conversion functions included below):
datetime day_of_week time_of_day
0 2021-06-13 12:56:16 Sunday 20:00:00
5 2021-06-13 12:56:54 Sunday 20:00:00
6 2021-06-13 12:57:27 Sunday 20:00:00
7 2021-07-16 18:55:42 Friday 20:00:00
8 2021-07-16 18:56:03 Friday 20:00:00
9 2021-06-04 18:42:06 Friday 20:00:00
10 2021-06-04 18:49:05 Friday 20:00:00
11 2021-06-04 18:58:22 Friday 20:00:00
What I would like to do is create a kde plot with x-axis = time_of_day (spanning 00:00:00 to 23:59:59), y-axis to be the count of each day_of_week at each hour of the day, and hue = day_of_week. In essence, I'd have seven different distributions representing occurrences during each day of the week.
Here's a sample of the data and my code. Any help would be appreciated:
df = pd.DataFrame([
'2021-06-13 12:56:16',
'2021-06-13 12:56:16',
'2021-06-13 12:56:16',
'2021-06-13 12:56:16',
'2021-06-13 12:56:54',
'2021-06-13 12:56:54',
'2021-06-13 12:57:27',
'2021-07-16 18:55:42',
'2021-07-16 18:56:03',
'2021-06-04 18:42:06',
'2021-06-04 18:49:05',
'2021-06-04 18:58:22',
'2021-06-08 21:31:44',
'2021-06-09 02:14:30',
'2021-06-09 02:20:19',
'2021-06-12 18:05:47',
'2021-06-15 23:46:41',
'2021-06-15 23:47:18',
'2021-06-16 14:19:08',
'2021-06-17 19:08:17',
'2021-06-17 22:37:27',
'2021-06-21 23:31:32',
'2021-06-23 20:32:09',
'2021-06-24 16:04:21',
'2020-05-22 18:29:02',
'2020-05-22 18:29:02',
'2020-05-22 18:29:02',
'2020-05-22 18:29:02',
'2020-08-31 21:38:07',
'2020-08-31 21:38:22',
'2020-08-31 21:38:42',
'2020-08-31 21:39:03',
], columns=['datetime'])
def convert_date(date):
return calendar.day_name[date.weekday()]
def convert_hour(time):
return time[:2]+':00:00'
df['day_of_week'] = pd.to_datetime(df['datetime']).apply(convert_date)
df['time_of_day'] = df['datetime'].astype(str).apply(convert_hour)
Let's try:
converting the datetime column to_datetime
Create a Categorical column from day_of_week codes (so categorical ordering functions correctly)
normalizing the time_of_day to a single day (so comparisons function correctly). This makes it seem like all events occurred within the same day making plotting logic much simpler.
plot the kdeplot
set the xaxis formatter to only display HH:MM:SS
import calendar
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt, dates as mdates
# df = pd.DataFrame({...})
# Convert to datetime
df['datetime'] = pd.to_datetime(df['datetime'])
# Create Categorical Column
cat_type = pd.CategoricalDtype(list(calendar.day_name), ordered=True)
df['day_of_week'] = pd.Categorical.from_codes(
df['datetime'].dt.day_of_week, dtype=cat_type
)
# Create Normalized Date Column
df['time_of_day'] = pd.to_datetime('2000-01-01 ' +
df['datetime'].dt.time.astype(str))
# Plot
ax = sns.kdeplot(data=df, x='time_of_day', hue='day_of_week')
# X axis format
ax.set_xlim([pd.to_datetime('2000-01-01 00:00:00'),
pd.to_datetime('2000-01-01 23:59:59')])
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M:%S'))
plt.tight_layout()
plt.show()
Note sample size is small here:
If looking for count on y then maybe histplot is better:
ax = sns.histplot(data=df, x='time_of_day', hue='day_of_week')
I would use Timestamp of pandas straight away. By the way your convert_hour function seems to do wrong. It gives time_of_the day as 20:00:00 for all data.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_context("paper", font_scale=2)
sns.set_style('whitegrid')
df['day_of_week'] = df['datetime'].apply(lambda x: pd.Timestamp(x).day_name())
df['time_of_day'] = df['datetime'].apply(lambda x: pd.Timestamp(x).hour)
plt.figure(figsize=(8, 4))
for idx, day in enumerate(days):
sns.kdeplot(df[df.day_of_week == day]['time_of_day'], label=day)
The kde for wednesday, looks a bit strange because the time varies between 2 and 20, hence the long tail from -20 to 40 in the plot.
Here is a simple code and using df.plot.kde.
Added more data so that multiple values are present for each day_of_week for kde to plot. Simplified the code to remove functions.
df1 = pd.DataFrame([
'2020-09-01 16:39:03',
'2020-09-02 16:39:03',
'2020-09-03 16:39:03',
'2020-09-04 16:39:03',
'2020-09-05 16:39:03',
'2020-09-06 16:39:03',
'2020-09-07 16:39:03',
'2020-09-08 16:39:03',
], columns=['datetime'])
df = pd.concat([df,df1]).reset_index(drop=True)
df['day_of_week'] = pd.to_datetime(df['datetime']).dt.day_name()
df['time_of_day'] = df['datetime'].str.split(expand=True)[1].str.split(':',expand=True)[0].astype(int)
df.pivot(columns='day_of_week').time_of_day.plot.kde()
Plots:
Can anyone solve this problem! I am trying to convert a Date object column to Datetime string format with the help of python. From 'YY-mm-dd' to 'YY/mm/dd 00:00' format. Dataset is given below. I have tried every options like energy_df['Date']= pd.to_datetime(energy_df['Date']),
energy_df['Date'] = pd.to_datetime(energy_df['Date'])
energy_df['month'] = energy_df['Date'].dt.month.astype(int)
energy_df['day_of_month'] = energy_df['Date'].dt.day.astype(int)
energy_df['day_of_week'] = energy_df['Date'].dt.dayofweek.astype(int)
energy_df['hour_of_day'] = energy_df['Hours']
selected_columns = ['Date', 'day_of_week', 'hour_of_day', 'Avg Specific Humidity[g/Kg]']
energy_df = energy_df[selected_columns]
Dataset image:
Convert the 'date' column to dtype datetime, the 'hour' column to dtype timedelta, add them together, and format to string.
Ex:
import pandas as pd
# some dummy input...
df = pd.DataFrame({'date': ['2015-01-01', '2015-01-01', '2015-01-01'],
'hour': [1, 2, 3]})
# to datetime / timedelta...
df['datetime'] = pd.to_datetime(df['date']) + pd.to_timedelta(df['hour'], unit='h')
# and format to string...
df['timestamp'] = df['datetime'].dt.strftime('%Y/%m/%d %H:%M')
# will give you:
df
date hour datetime timestamp
0 2015-01-01 1 2015-01-01 01:00:00 2015/01/01 01:00
1 2015-01-01 2 2015-01-01 02:00:00 2015/01/01 02:00
2 2015-01-01 3 2015-01-01 03:00:00 2015/01/01 03:00
I have a dataframe of a long time range in format datetime64[ns] and a int value
Data looks like this:
MIN_DEP DELAY
0 2018-01-01 05:09:00 0
1 2018-01-01 05:13:00 0
2 2018-01-01 05:39:00 0
3 2018-01-01 05:43:00 0
4 2018-01-01 06:12:00 34
... ... ...
77005 2020-09-30 23:42:00 0
77006 2020-09-30 23:43:00 0
77007 2020-09-30 23:43:00 43
77008 2020-10-01 00:18:00 0
77009 2020-10-01 00:59:00 0
[77010 rows x 2 columns]
MIN_DEP datetime64[ns]
DELAY int64
dtype: object
Target is to plot all the data in just a 00:00 - 24:00 range on the x-axis, no dates anymore.
As i try to plot it, the timeline is 00:00 at any point. How to fix this?
import matplotlib.dates as mdates
fig, ax = plt.subplots()
ax.plot(pd_to_stat['MIN_DEP'],pd_to_stat['DELAY'])
xfmt = mdates.DateFormatter('%H:%M')
ax.xaxis.set_major_formatter(xfmt)
plt.show()
tried to convert the timestamps before to dt.time and plot it then
pd_to_stat['time'] = pd.to_datetime(pd_to_stat['MIN_DEP'], format='%H:%M').dt.time
fig, ax = plt.subplots()
ax.plot(pd_to_stat['time'],pd_to_stat['DELAY'])
plt.show()
Plot does not allow to do that:
TypeError: float() argument must be a string or a number, not 'datetime.time'
According to your requirement, I guess you don't need the dates and as well as the seconds field in your timestamp. So you need a little bit of preprocessing at first.
Remove the seconds field using the code below
dataset['MIN_DEP'] = dataset['MIN_DEP'].strftime("%H:%M")
Then you can remove the date from your timestamp in the following manner
dataset['MIN_DEP'] = pd.Series([val.time() for val in dataset['MIN_DEP']])
Then you can plot your data in the usual manner.
This seems to work now. I did not recognise, the plot was still splitting up in dates. To work around I hat to replace all the dates with the same date and plottet it hiding the date using DateFormatter
import matplotlib.dates as mdates
pd_to_stat['MIN_DEP'] = pd_to_stat['MIN_DEP'].map(lambda t: t.replace(year=2020, month=1, day=1))
fig, ax = plt.subplots()
ax.plot(pd_to_stat['MIN_DEP'],pd_to_stat['DELAY'])
xfmt = mdates.DateFormatter('%H:%M')
ax.xaxis.set_major_formatter(xfmt)
plt.show()
id timestamp energy
0 a 2012-03-18 10:00:00 0.034
1 b 2012-03-20 10:30:00 0.052
2 c 2013-05-29 11:00:00 0.055
3 d 2014-06-20 01:00:00 0.028
4 a 2015-02-10 12:00:00 0.069
I want to plot these data like below.
just time on x-axis, not date nor datetime.
because I want to see the values per each hour.
https://i.stack.imgur.com/u73eJ.png
but this code plot like this.
plt.plot(df['timestamp'], df['energy'])
https://i.stack.imgur.com/yd6NL.png
I tried some codes but they just format the X data hide date part and plot like second graph.
+ df['timestamp'] is datetime type.
what should I do? Thanks.
you can convert your datetime into time, if your df["timestamp"] is already in datetime format then
df["time"] = df["timestamp"].map(lambda x: x.time())
plt.plot(df['time'], df['energy'])
if df["timestamp"] is of type string then you can add one more line in front as df["timestamp"] = pd.to_datetime(df["timestamp"])
Update: look like matplotlib does not accept time types, just convert to string
df["time"] = df["timestamp"].map(lambda x: x.strftime("%H:%M"))
plt.scatter(df['time'], df['energy'])
First check, if type of df["timestamp"] is in datetime format.
if not
import pandas as pd
time = pd.to_datetime(df["timestamp"])
print(type(time))
Then,
import matplotlib.pyplot as plt
values = df['energy']
plt.plot_date(dates , values )
plt.xticks(rotation=45)
plt.show()
I have a pandas dataframe with datetime values including microseconds:
column1 column2
time
1900-01-01 10:39:52.887916 19363.876 19362.7575
1900-01-01 10:39:53.257916 19363.876 19362.7575
1900-01-01 10:39:53.808007 19363.876 19362.7575
1900-01-01 10:39:53.827894 19363.876 19362.7575
1900-01-01 10:39:54.277931 19363.876 19362.7575
I plot the dataframe as follows:
def plot(df):
ax = df.plot(y='column1', figsize=(20, 8))
df.plot(y='column2', ax=ax)
ax.get_yaxis().get_major_formatter().set_useOffset(False)
mpl.pyplot.show()
Notice on the image below that the microseconds are displayed as %f rather than their actual value.
That is, instead of 10:39:52.887916 it displays 10:39:52.%f
How can I display the actual microseconds in the tick labels (even if it's only a few significant digits)?
You should be able to set the major ticks to the format you want, using set_major_formatter:
In [14]:
import matplotlib as mpl
import matplotlib.dates
df = pd.DataFrame({'column1': [1,2,3,4],
'column2': [2,3,4,5]},
index =pd.to_datetime([1e8,2e8,3e8,4e8]))
def plot(df):
ax = df.plot(y='column1', figsize=(20, 8))
df.plot(y='column2', ax=ax)
ax.get_yaxis().get_major_formatter().set_useOffset(False)
ax.get_xaxis().set_major_formatter(matplotlib.dates.DateFormatter('%H:%M:%S.%f'))
#mpl.pyplot.show()
return ax
print df
column1 column2
1970-01-01 00:00:00.100000 1 2
1970-01-01 00:00:00.200000 2 3
1970-01-01 00:00:00.300000 3 4
1970-01-01 00:00:00.400000 4 5
If the problem do go away, then I think somewhere in the code the formatter format is specified incorrectly, namely %%f instead of %f, which returns a literal '%' character.