Suppose I have the following dataset:
How would I create a new column, to be the hour of the time?
For example, the code below works for individual times, but I haven't been able to generalise it for a column in pandas.
t = datetime.strptime('9:33:07','%H:%M:%S')
print(t.hour)
Use to_datetime to datetimes with dt.hour:
df = pd.DataFrame({'TIME':['9:33:07','9:41:09']})
#should be slowier
#df['hour'] = pd.to_datetime(df['TIME']).dt.hour
df['hour'] = pd.to_datetime(df['TIME'], format='%H:%M:%S').dt.hour
print (df)
TIME hour
0 9:33:07 9
1 9:41:09 9
If want working with datetimes in column TIME is possible assign back:
df['TIME'] = pd.to_datetime(df['TIME'], format='%H:%M:%S')
df['hour'] = df['TIME'].dt.hour
print (df)
TIME hour
0 1900-01-01 09:33:07 9
1 1900-01-01 09:41:09 9
My suggestion:
df = pd.DataFrame({'TIME':['9:33:07','9:41:09']})
df['hour']= df.TIME.str.extract("(^\d+):", expand=False)
"str.extract(...)" is a vectorized function that extract a regular expression pattern ( in our case "(^\d+):" which is the hour of the TIME) and return a Pandas Series object by specifying the parameter "expand= False"
The result is stored in the "hour" column
You can use extract() twice to feature out the 'hour' column
df['hour'] = df. TIME. str. extract("(\d+:)")
df['hour'] = df. hour. str. extract("(\d+)")
Related
I have a column in my dataframe which consists of date 1/6/2023 (m/d/yyy) format. The date datatype is object but I want to convert it from object to int64 data type. I have tried the following code but it is drastically changing date values:
df = df.astype({'date':'int'})
is changing my values drastically is there any other alternative for the same ?
df = df.astype({'date':'int'})
Convert values to datetimes, then to strings - e.g. here YYYYMMDD format and last to integers:
print (df)
date
0 1/6/2023
df['date'] = pd.to_datetime(df['date'], dayfirst=True).dt.strftime('%Y%m%d').astype(int)
print (df)
date
0 20230601
I have a data file with timestamps that look like this:
It gets loaded into pandas with a column name of "Time". I am trying to create two new datetime64 type columns, one with the date and one with the time (hour). I have explored a few solutions to this problem on StackOverflow but am still having issues. Quick note, I need the final columns to not be objects so I can use pandas and numpy functionality.
I load the dataframe and create two new columns like so:
df = pd.read_csv('C:\\Users\\...\\xyz.csv')
df['Date'] = pd.to_datetime(df['Time']).dt.date
df['Hour'] = pd.to_datetime(df['Time']).dt.time
This works but the Date and Hour columns are now objects.
I run this to convert the date to my desired datetime64 data type and it works:
df['Date'] = pd.to_datetime(df['Date'])
However, when I try to use this same code on the Hour column, I get an error:
TypeError: <class 'datetime.time'> is not convertible to datetime
I did some digging and found the following code which runs:
df['Hour'] = pd.to_datetime(df['Hour'], format='%H:%M:%S')
However the actual output includes a generic date like so:
When I try to run code referencing the Hour column like so:
HourVarb = '15:00:00'
df['Test'] = np.where(df['Hour']==HourVarb,1,np.nan)
It runs but doesn't produce the result I want.
Perhaps my HourVarb variable is the wrong format for the numpy code? Alternatively, the 1/1/1900 is causing problems and the format %H: %M: %S needs to change? My end goal is to be able to reference the hour and the date to filter out specific date/hour combinations. Please help.
One note, when I change the HourVarb to '1/1/1900 15:00:00' the code above works as intended, but I'd still like to understand if there is a cleaner way that removes the date. Thanks
I'm not sure I understand the problem with the 'object' datatypes of these columns.
I loaded the data you provided this way:
df = pd.read_csv('xyz.csv')
df['Time'] = pd.to_datetime(df['Time'])
df['Date'] = df['Time'].dt.date
df['Hour'] = df['Time'].dt.time
print(df.dtypes)
And I get these data types:
Time datetime64[ns]
Date object
Hour object
The fact that Date and Hour are object types should not be a problem. The underlying data is a datetime type:
print(type(df.Date.iloc[0]))
print(type(df.Hour.iloc[0]))
<class 'datetime.date'>
<class 'datetime.time'>
This means you can use these columns as such. For example:
print(df['Date'] + pd.Timedelta('1D'))
What are you trying to do that is requiring the column dtype to be a Pandas dtype?
UPDATE
Here is how you achieve the last part of your question:
from datetime import datetime, time
hourVarb = datetime.strptime("15:00:00", '%H:%M:%S').time()
# or hourVarb = time(15, 0)
df['Test'] = df['Hour'] == hourVarb
print(df['Test'])
0 True
1 False
2 False
3 False
Name: Test, dtype: bool
As you can infer from the above , When I try to convert the string , it gives error.
Tried below codes but got same error as,day is not defined,
df['day'] = pd.to_datetime(df['day'],format='%d %b %Y %H:%M:%S:%f')
As SO memeber suggested,I edited code but index stills the string, did not convert to day
If you don't want to create another column, then just this will do:
df.index = pd.to_datetime(df.index)
In your example, df['day'] actually appears to be your index. To fix this, you'd want to call pd.to_datetime on your index:
df.index = pd.to_datetime(df.index)
I could tell it was your index because pandas offsets the row height of the columns for the index column and the other columns. Take this example:
df = pd.DataFrame({'a':[1,2,3], 'b':['a','b','c']})
df.set_index('a', inplace=True)
outputs:
b
a
1 a
2 b
3 c
I have dataframe with column date with type datetime64[ns].
When I try to create new column day with format MM-DD based on date column only first method works from below. Why second method doesn't work in pandas?
df['day'] = df['date'].dt.strftime('%m-%d')
df['day2'] = str(df['date'].dt.month) + '-' + str(df['date'].dt.day)
Result for one row:
day 01-04
day2 0 1\n1 1\n2 1\n3 1\n4 ...
Types of columns
day object
day2 object
Problem of solution is if use str with df['date'].dt.month it return Series, correct way is use Series.astype:
df['day2'] = df['date'].dt.month.astype(str) + '-' + df['date'].dt.day.astype(str)
I have a dataframe with a 'date' column with ~200 elements in the format yyyy-mm-dd.
I want to compute the number of days elapsed since 2001-11-25 for each of those elements and add a column of those numbers of elapsed days to the dataframe.
I know of the to_datetime() function but can't figure out how to make this happen.
Assuming your time values are in your index, you can just do this:
import pandas
x = pandas.DatetimeIndex(start='2014-01-01', end='2014-01-06', freq='30T')
df = pandas.DataFrame(index=x, columns=['time since'])
basedate = pandas.Timestamp('2011-11-25')
df['time since'] = df.apply(lambda x: (x.name.to_datetime() - basedate).days, axis=1)
If they're in a column, do:
df['time since'] = df['datetime_column'].apply(lambda x: (x.name.to_datetime() - basedate).days)
In accordance with Jeff's comment, here's a correction to the second (and most relevant) part of the accepted answer:
df['time since'] = (df['datetime_column'] - basedate).dt.days
The subtraction yields a series of type Timedelta, which can then be represented as days.
In some case you might need to pass the original column through pd.to_datetime(...) first.