I have a pandas dataframe column that contains a str of time as follows:
Time
17:24:00
17:25:00
17:26:00
and I have already cast the column to datetime as follows:
new_data['Time'] = pd.to_datetime(new_data['Time'],
format='%H:%M:%S').apply(pd.Timestamp)
Now the column is in datetime64 as needed and shows:
0 1900-01-01 17:24:00
1 1900-01-01 17:25:00
2 1900-01-01 17:26:00
My question is as follows: I need to use the different time components of time as selection criteria for filtering. For example:
filtered_df = data.loc[(new_data['Time'].Hour == 17 )]
or
filtered_df = data.loc[(new_data['Time'].Minutes == 24 )]
But I cannot figure out how to select the individual components (Hour, Minutes, Seconds) now that I have it converted to datetime.
Related
This is a follow up question of the accepted solution in here.
I have a pandas dataframe:
In one column 'time' is the time stored in the following format: 'HHMMSS' (e.g. 203412 means 20:34:12).
In another column 'date' the date is stored in the following format: 'YYmmdd' (e.g 200712 means 2020-07-12). YY represents the addon to the year 2000.
Example:
import pandas as pd
data = {'time': ['123455', '000010', '100000'],
'date': ['200712', '210601', '190610']}
df = pd.DataFrame(data)
print(df)
# time date
#0 123455 200712
#1 000010 210601
#2 100000 190610
I need a third column which contains the combined datetime format (e.g. 2020-07-12 12:34:55) of the two other columns. So far, I can only modify the time but I do not know how to add the date.
df['datetime'] = pd.to_datetime(df['time'], format='%H%M%S')
print(df)
# time date datetime
#0 123455 200712 1900-01-01 12:34:55
#1 000010 210601 1900-01-01 00:00:10
#2 100000 190610 1900-01-01 10:00:00
How can I add in column df['datetime'] the date from column df['date'], so that the dataframe is:
time date datetime
0 123455 200712 2020-07-12 12:34:55
1 000010 210601 2021-06-01 00:00:10
2 100000 190610 2019-06-10 10:00:00
I found this question, but I am not exactly sure how to use it for my purpose.
You can join columns first and then specify formar:
df['datetime'] = pd.to_datetime(df['date'] + df['time'], format='%y%m%d%H%M%S')
print(df)
time date datetime
0 123455 200712 2020-07-12 12:34:55
1 000010 210601 2021-06-01 00:00:10
2 100000 190610 2019-06-10 10:00:00
If possible integer columns:
df['datetime'] = pd.to_datetime(df['date'].astype(str) + df['time'].astype(str), format='%y%m%d%H%M%S')
I have a data frame where there is time columns having minutes from 0-1339 meaning 1440 minutes of a day. I want to add a column datetime representing the day 2021-3-21 including hh amd mm like this 1980-03-01 11:00 I tried following code
from datetime import datetime, timedelta
date = datetime.date(2021, 3, 21)
days = date - datetime.date(1900, 1, 1)
df['datetime'] = pd.to_datetime(df['time'],format='%H:%M:%S:%f') + pd.to_timedelta(days, unit='d')
But the error seems like descriptor 'date' requires a 'datetime.datetime' object but received a 'int'
Is there any other way to solve this problem or fixing this code? Please help to figure this out.
>>df
time
0
1
2
3
..
1339
I want to convert this minutes to particular format 1980-03-01 11:00 where I will use the date 2021-3-21 and convert the minutes tohhmm part. The dataframe will look like.
>df
datetime time
2021-3-21 00:00 0
2021-3-21 00:01 1
2021-3-21 00:02 2
...
How can I format my data in this way?
Let's try with pd.to_timedelta instead to get the duration in minutes from time then add a TimeStamp:
df['datetime'] = (
pd.Timestamp('2021-3-21') + pd.to_timedelta(df['time'], unit='m')
)
df.head():
time datetime
0 0 2021-03-21 00:00:00
1 1 2021-03-21 00:01:00
2 2 2021-03-21 00:02:00
3 3 2021-03-21 00:03:00
4 4 2021-03-21 00:04:00
Complete Working Example with Sample Data:
import numpy as np
import pandas as pd
df = pd.DataFrame({'time': np.arange(0, 1440)})
df['datetime'] = (
pd.Timestamp('2021-3-21') + pd.to_timedelta(df['time'], unit='m')
)
print(df)
I have a dataframe with a datetime column. I want to group by the time component only and aggregate, e.g. by taking the mean.
I know that I can use pd.Grouper to group by date AND time, but it doesn't work on time only.
Say we have the following dataframe:
import numpy as np
import pandas as pd
drange = pd.date_range('2019-08-01 00:00', '2019-08-12 12:00', freq='1T')
time = drange.time
c0 = np.random.rand(len(drange))
c1 = np.random.rand(len(drange))
df = pd.DataFrame(dict(drange=drange, time=time, c0=c0, c1=c1))
print(df.head())
drange time c0 c1
0 2019-08-01 00:00:00 00:00:00 0.031946 0.159739
1 2019-08-01 00:01:00 00:01:00 0.809171 0.681942
2 2019-08-01 00:02:00 00:02:00 0.036720 0.133443
3 2019-08-01 00:03:00 00:03:00 0.650522 0.409797
4 2019-08-01 00:04:00 00:04:00 0.239262 0.814565
In this case, the following throws a TypeError:
grouper = pd.Grouper(key='time', freq='5T')
grouped = df.groupby(grouper).mean()
I could set key=drange to group by date and time and then:
Reset the index
Transform the new column to float
Bin with pd.cut
Cast back to time
Finally group-by and then aggregate
... But I wonder whether there is a cleaner way to achieve the same results.
Series.dt.time/DatetimeIndex.time returns the time as datetime.time. This isn't great because pandas works best withtimedelta64 and so your 'time' column is cast to object, losing all datetime functionality.
You can subtract off the normalized date to obtain the time as a timedelta so you can continue to use the datetime tools of pandas. You can floor this to group.
s = (df.drange - df.drange.dt.normalize()).dt.floor('5T')
df.groupby(s).mean()
c0 c1
drange
00:00:00 0.436971 0.530201
00:05:00 0.441387 0.518831
00:10:00 0.465008 0.478130
... ... ...
23:45:00 0.523233 0.515991
23:50:00 0.468695 0.434240
23:55:00 0.569989 0.510291
Alternatively if you feel unsure of floor, this gets the identical output up to the index name
df['time'] = (df.drange - df.drange.dt.normalize()) # timedelta64[ns]
df.groupby(pd.Grouper(key='time', freq='5T')).mean()
When you use DataFrame.groupby you can a Series an argument. Moreover, if your series is a datetime, you can use the series.dt to access the properties of date. In your case df['drange'].dt.hour or df['drange'].dt.time should do it.
# df['drange']=pd.to_datetime(df['drange'])
df.groupby(df['drange'].dt.hour).agg(...)
I am trying to alter the text on every second row after interpolation the numeric values between rows.
stamp value
0 00:00:00 2
1 00:00:00 3
2 01:00:00 5
trying to apply this change to every second stamp row (ie 30 instead of 00 between colons) - str column
stamp value
0 00:00:00 2
1 00:30:00 3
2 01:00:00 5
function to change string
def time_vals(row):
#run only on odd rows (1/2 hr)
if int(row.name) % 2 != 0:
l, m, r = row.split(':')
return l+":30:"+r
I have tried the following:
hh_weather['time'] =hh_weather[hh_weather.rows[::2]['time']].apply(time_vals(2))
but I get an error: AttributeError: 'DataFrame' object has no attribute 'rows'
and when I try:
hh_weather['time'] = hh_weather['time'].apply(time_vals)
AttributeError: 'str' object has no attribute 'name'
Any ideas?
Use timedelta instead of str
The strength of Pandas lies in vectorised functionality. Here you can use timedelta to represent times numerically. If data is as in your example, i.e. seconds are always zero, you can floor by hour and add 30 minutes. Then assign this series conditionally to df['stamp'].
# convert to timedelta
df['stamp'] = pd.to_timedelta(df['stamp'])
# create series by flooring by hour, then adding 30 minutes
s = df['stamp'].dt.floor('h') + pd.Timedelta(minutes=30)
# assign new series conditional on index
df['stamp'] = np.where(df.index % 2, s, df['stamp'])
print(df)
stamp value
0 00:00:00 2
1 00:30:00 3
2 01:00:00 5
#convert string value to timedelta (better to work with time)
df['stamp']=pd.to_timedelta(df['stamp'])
#slicing only odd row's from `stamp` column and adding 30 minutes to all the odd row's
odd_df=pd.to_timedelta(df.loc[1::2,'stamp'])+pd.to_timedelta('30 min')
#updating new series (out_df) with the existing df, based on index.
df['stamp'].update(odd_df)
#print(df)
stamp value
0 00:00:00 2
1 00:30:00 3
2 01:00:00 5
I have a dataset with measurements acquired almost every 2-hours over a week. I would like to calculate a mean of measurements taken at the same time on different days. For example, I want to calculate the mean of every measurement taken between 12:00 and 13:59.
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
#generating test dataframe
date_today = datetime.now()
time_of_taken_measurment = pd.date_range(date_today, date_today +
timedelta(72), freq='2H20MIN')
np.random.seed(seed=1111)
data = np.random.randint(1, high=100,
size=len(time_of_taken_measurment))
df = pd.DataFrame({'measurementTimestamp': time_of_taken_measurment, 'measurment': data})
df = df.set_index('measurementTimestamp')
#Calculating the mean for measurments taken in the same hour
hourly_average = df.groupby([df.index.hour]).mean()
hourly_average
The code above gives me this output:
0 47.967742
1 43.354839
2 46.935484
.....
22 42.833333
23 52.741935
I would like to have a result like this:
0 mean0
2 mean1
4 mean2
.....
20 mean10
22 mean11
I was trying to solve my problem using rolling_mean function, but I could not find a way to apply it to my static case.
Use the built-in floor functionality of datetimeIndex, which allows you to easily create 2 hour time bins.
df.groupby(df.index.floor('2H').time).mean()
Output:
measurment
00:00:00 51.516129
02:00:00 54.868852
04:00:00 52.935484
06:00:00 43.177419
08:00:00 43.903226
10:00:00 55.048387
12:00:00 50.639344
14:00:00 48.870968
16:00:00 43.967742
18:00:00 49.225806
20:00:00 43.774194
22:00:00 50.590164