Extracting the hour from a time column in pandas

Extracting the hour from a time column in pandas - python

Suppose I have the following dataset:
How would I create a new column, to be the hour of the time?
For example, the code below works for individual times, but I haven't been able to generalise it for a column in pandas.
t = datetime.strptime('9:33:07','%H:%M:%S')
print(t.hour)

Use to_datetime to datetimes with dt.hour:
df = pd.DataFrame({'TIME':['9:33:07','9:41:09']})
#should be slowier
#df['hour'] = pd.to_datetime(df['TIME']).dt.hour
df['hour'] = pd.to_datetime(df['TIME'], format='%H:%M:%S').dt.hour
print (df)
TIME hour
0 9:33:07 9
1 9:41:09 9
If want working with datetimes in column TIME is possible assign back:
df['TIME'] = pd.to_datetime(df['TIME'], format='%H:%M:%S')
df['hour'] = df['TIME'].dt.hour
print (df)
TIME hour
0 1900-01-01 09:33:07 9
1 1900-01-01 09:41:09 9

My suggestion:
df = pd.DataFrame({'TIME':['9:33:07','9:41:09']})
df['hour']= df.TIME.str.extract("(^\d+):", expand=False)
"str.extract(...)" is a vectorized function that extract a regular expression pattern ( in our case "(^\d+):" which is the hour of the TIME) and return a Pandas Series object by specifying the parameter "expand= False"
The result is stored in the "hour" column

You can use extract() twice to feature out the 'hour' column
df['hour'] = df. TIME. str. extract("(\d+:)")
df['hour'] = df. hour. str. extract("(\d+)")

Related

how to change date datatype from object to int64 without changing it's values

I have a column in my dataframe which consists of date 1/6/2023 (m/d/yyy) format. The date datatype is object but I want to convert it from object to int64 data type. I have tried the following code but it is drastically changing date values:
df = df.astype({'date':'int'})
is changing my values drastically is there any other alternative for the same ?
df = df.astype({'date':'int'})

Convert values to datetimes, then to strings - e.g. here YYYYMMDD format and last to integers:
print (df)
date
0 1/6/2023
df['date'] = pd.to_datetime(df['date'], dayfirst=True).dt.strftime('%Y%m%d').astype(int)
print (df)
date
0 20230601

Separating Date and Time in Pandas

I have a data file with timestamps that look like this:
It gets loaded into pandas with a column name of "Time". I am trying to create two new datetime64 type columns, one with the date and one with the time (hour). I have explored a few solutions to this problem on StackOverflow but am still having issues. Quick note, I need the final columns to not be objects so I can use pandas and numpy functionality.
I load the dataframe and create two new columns like so:
df = pd.read_csv('C:\\Users\\...\\xyz.csv')
df['Date'] = pd.to_datetime(df['Time']).dt.date
df['Hour'] = pd.to_datetime(df['Time']).dt.time
This works but the Date and Hour columns are now objects.
I run this to convert the date to my desired datetime64 data type and it works:
df['Date'] = pd.to_datetime(df['Date'])
However, when I try to use this same code on the Hour column, I get an error:
TypeError: <class 'datetime.time'> is not convertible to datetime
I did some digging and found the following code which runs:
df['Hour'] = pd.to_datetime(df['Hour'], format='%H:%M:%S')
However the actual output includes a generic date like so:
When I try to run code referencing the Hour column like so:
HourVarb = '15:00:00'
df['Test'] = np.where(df['Hour']==HourVarb,1,np.nan)
It runs but doesn't produce the result I want.
Perhaps my HourVarb variable is the wrong format for the numpy code? Alternatively, the 1/1/1900 is causing problems and the format %H: %M: %S needs to change? My end goal is to be able to reference the hour and the date to filter out specific date/hour combinations. Please help.
One note, when I change the HourVarb to '1/1/1900 15:00:00' the code above works as intended, but I'd still like to understand if there is a cleaner way that removes the date. Thanks

I'm not sure I understand the problem with the 'object' datatypes of these columns.
I loaded the data you provided this way:
df = pd.read_csv('xyz.csv')
df['Time'] = pd.to_datetime(df['Time'])
df['Date'] = df['Time'].dt.date
df['Hour'] = df['Time'].dt.time
print(df.dtypes)
And I get these data types:
Time datetime64[ns]
Date object
Hour object
The fact that Date and Hour are object types should not be a problem. The underlying data is a datetime type:
print(type(df.Date.iloc[0]))
print(type(df.Hour.iloc[0]))
<class 'datetime.date'>
<class 'datetime.time'>
This means you can use these columns as such. For example:
print(df['Date'] + pd.Timedelta('1D'))
What are you trying to do that is requiring the column dtype to be a Pandas dtype?
UPDATE
Here is how you achieve the last part of your question:
from datetime import datetime, time
hourVarb = datetime.strptime("15:00:00", '%H:%M:%S').time()
# or hourVarb = time(15, 0)
df['Test'] = df['Hour'] == hourVarb
print(df['Test'])
0 True
1 False
2 False
3 False
Name: Test, dtype: bool

Error converting string to date field in Pandas

As you can infer from the above , When I try to convert the string , it gives error.
Tried below codes but got same error as,day is not defined,
df['day'] = pd.to_datetime(df['day'],format='%d %b %Y %H:%M:%S:%f')
As SO memeber suggested,I edited code but index stills the string, did not convert to day

If you don't want to create another column, then just this will do:
df.index = pd.to_datetime(df.index)

In your example, df['day'] actually appears to be your index. To fix this, you'd want to call pd.to_datetime on your index:
df.index = pd.to_datetime(df.index)
I could tell it was your index because pandas offsets the row height of the columns for the index column and the other columns. Take this example:
df = pd.DataFrame({'a':[1,2,3], 'b':['a','b','c']})
df.set_index('a', inplace=True)
outputs:
b
a
1 a
2 b
3 c

Date concatenating in new column in dataframe

I have dataframe with column date with type datetime64[ns].
When I try to create new column day with format MM-DD based on date column only first method works from below. Why second method doesn't work in pandas?
df['day'] = df['date'].dt.strftime('%m-%d')
df['day2'] = str(df['date'].dt.month) + '-' + str(df['date'].dt.day)
Result for one row:
day 01-04
day2 0 1\n1 1\n2 1\n3 1\n4 ...
Types of columns
day object
day2 object

Problem of solution is if use str with df['date'].dt.month it return Series, correct way is use Series.astype:
df['day2'] = df['date'].dt.month.astype(str) + '-' + df['date'].dt.day.astype(str)

Pandas: number of days elapsed since a certain date

I have a dataframe with a 'date' column with ~200 elements in the format yyyy-mm-dd.
I want to compute the number of days elapsed since 2001-11-25 for each of those elements and add a column of those numbers of elapsed days to the dataframe.
I know of the to_datetime() function but can't figure out how to make this happen.

Assuming your time values are in your index, you can just do this:
import pandas
x = pandas.DatetimeIndex(start='2014-01-01', end='2014-01-06', freq='30T')
df = pandas.DataFrame(index=x, columns=['time since'])
basedate = pandas.Timestamp('2011-11-25')
df['time since'] = df.apply(lambda x: (x.name.to_datetime() - basedate).days, axis=1)
If they're in a column, do:
df['time since'] = df['datetime_column'].apply(lambda x: (x.name.to_datetime() - basedate).days)

In accordance with Jeff's comment, here's a correction to the second (and most relevant) part of the accepted answer:
df['time since'] = (df['datetime_column'] - basedate).dt.days
The subtraction yields a series of type Timedelta, which can then be represented as days.
In some case you might need to pass the original column through pd.to_datetime(...) first.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extracting the hour from a time column in pandas - python

Suppose I have the following dataset: How would I create a new column, to be the hour of the time? For example, the code below works for individual times, but I haven't been able to generalise it for a column in pandas. t = datetime.strptime('9:33:07','%H:%M:%S') print(t.hour)

You can use extract() twice to feature out the 'hour' column df['hour'] = df. TIME. str. extract("(\d+:)") df['hour'] = df. hour. str. extract("(\d+)")

Related

how to change date datatype from object to int64 without changing it's values

Separating Date and Time in Pandas

Error converting string to date field in Pandas

Date concatenating in new column in dataframe

Pandas: number of days elapsed since a certain date

Categories

Resources