I have datetime parameter in pandas dataframe, with time that include minutes and seconds .
index date
0 2021-03-01 07:55:00
1 2021-03-01 07:56:13
2 2021-03-01 07:56:43
3 2021-03-01 07:57:19
4 2021-03-01 07:57:57
5 2021-03-01 11:39:25
6 2021-03-01 11:39:59
7 2021-03-01 11:40:53
8 2021-03-01 11:41:44
9 2021-03-01 11:43:31
how can I create parameter like this (date and hour)
index date
0 2021-03-01 07:00:00
1 2021-03-01 07:00:00
2 2021-03-01 07:00:00
3 2021-03-01 07:00:00
4 2021-03-01 07:00:00
5 2021-03-01 11:00:00
6 2021-03-01 11:00:00
7 2021-03-01 11:00:00
8 2021-03-01 11:00:00
9 2021-03-01 11:00:00
Update
The data is in fact a pandas DataFrame with the column containing datetime objects. Given that the column is named "date" here's one way to effect the change:
df = df['date'].apply(lambda dt: dt.replace(minute=0, second=0))
print(df)
0 2021-03-01 07:00:00
1 2021-03-01 07:00:00
2 2021-03-01 07:00:00
3 2021-03-01 07:00:00
4 2021-03-01 07:00:00
5 2021-03-01 11:00:00
6 2021-03-01 11:00:00
7 2021-03-01 11:00:00
8 2021-03-01 11:00:00
9 2021-03-01 11:00:00
Name: date, dtype: datetime64[ns]
Original answer follows...
Use datetime.replace() to reset the minute and second in the datetime object:
from datetime import datetime
dt = datetime.strptime('2021-03-01 07:55:00', '%Y-%m-%d %H:%M:%S')
dt = dt.replace(minute=0, second=0)
print(dt)
# 2021-03-01 07:00:00
Your examples do not appear to have resolution smaller than one second, however, if it does, you could also set microseconds to 0 too:
dt = dt.replace(minute=0, second=0, microsecond=0)
Related
I'm Stuck on a problem it would be great if you could help me :)
I created a dataframe with pandas:
looks like that:
HostName
Date
A
2021-01-01 12:30
B
2021-01-01 12:42
B
2021-02-01 12:30
A
2021-02-01 12:40
A
2021-02-25 12:40
A
2021-03-01 12:41
A
2021-03-01 12:42
I try to Aggregat based on the previous month but it's not working.
the end result should look like this:
HostName
Date
previous month
A
2021-01-01 12:30
Nan
B
2021-01-01 12:42
Nan
B
2021-02-01 12:30
1
A
2021-02-01 12:40
Nan
A
2021-02-25 12:40
1
A
2021-03-01 12:41
2
A
2021-03-01 12:42
3
for every row Date should look one-month before and Aggregat the number of Hostnames found.
for example row number 6 count HostName A from 2021-02-01 12:41 to 2021-03-01 12:41
what I try to do and failed:
extract the previous month:
df['Date Before'] = df['Date'] - pd.DateOffset(months=1)
and Aggregate between this month
df.resample('M', on='Date').HostName.count()
df.groupby('HostName').resample('M', on='Date Before').HostName.count()
Please Help Me many thanks!!!
use shift to look back a n rows for a dataframe column. df is the group by results.
data1="""HostName Date
A 2021-01-01 12:30
B 2021-01-01 12:42
B 2021-02-01 12:30
A 2021-02-01 12:40
A 2021-02-25 12:40
A 2021-03-01 12:41
A 2021-03-01 12:42"""
df = pd.read_table(StringIO(data1), sep='\t')
df['Date']=pd.to_datetime(df['Date'])
grouped=df.groupby('HostName')['Date']
def previous_date(group):
return group.sort_values().shift(1)
df['Previous Date']=grouped.apply(previous_date)
df['Previous Count']=df.apply(lambda x: x['Date']-x['Previous Date'],axis=1)
print(df.sort_values(by=["HostName","Date"]))
df['Con'] = np.where( (df['Previous Date'].notnull() & df['Previous Count']>0) , 1, 0)
print(df.sort_values(by=["HostName","Date"]))
output:
HostName Date Previous Date Previous Count Con
0 A 2021-01-01 12:30:00 NaT NaN 0
3 A 2021-02-01 12:40:00 2021-01-01 12:30:00 31.0 1
4 A 2021-02-25 12:40:00 2021-02-01 12:40:00 24.0 1
5 A 2021-03-01 12:41:00 2021-02-25 12:40:00 4.0 1
6 A 2021-03-01 12:42:00 2021-03-01 12:41:00 0.0 0
1 B 2021-01-01 12:42:00 NaT NaN 0
2 B 2021-02-01 12:30:00 2021-01-01 12:42:00 30.0 1
use cumsum to create a running total by hostname
i found solution:
original:
HostName Date
0 A 2021-01-01 12:30:00
1 B 2021-01-01 12:42:00
2 B 2021-02-01 12:30:00
3 A 2021-02-01 12:40:00
4 A 2021-02-25 12:40:00
5 A 2021-03-01 12:41:00
6 A 2021-03-01 12:42:00
get month before
df['Month Before'] = df['Date'] - pd.DateOffset(months=1)
order datafarme
df = df.sort_values(['HostName','Date'])
shift by Host
df['prev_value'] = df.groupby('HostName')['Date'].shift()
checking
df['Con'] = np.where((df['Month Before'] <= df['prev_value']) | (df['prev_value'].notnull()) , 1, 0)
and group
gpc = df.groupby(['HostName','Con'])['HostName']
df['Count Per Host'] = gpc.cumcount()
look like that
HostName Date Month Before prev_value Con CountPerHost
0 A 2021-01-01 12:30:00 2020-12-01 12:30:00 NaT 0 0
3 A 2021-02-01 12:40:00 2021-01-01 12:40:00 2021-01-01 12:30:00 0 0
4 A 2021-02-25 12:40:00 2021-01-25 12:40:00 2021-02-01 12:40:00 1 1
5 A 2021-03-01 12:41:00 2021-02-01 12:41:00 2021-02-25 12:40:00 1 2
6 A 2021-03-01 12:42:00 2021-02-01 12:42:00 2021-03-01 12:41:00 1 3
1 B 2021-01-01 12:42:00 2020-12-01 12:42:00 NaT 0 0
2 B 2021-02-01 12:30:00 2021-01-01 12:30:00 2021-01-01 12:42:00 1 0
I was trying to find difference of a series of dates and a date. for example, the series is
from may1 to june1 which is
date = pd.DataFrame()
In [0]: date['test'] = pd.date_range("2021-05-01", "2021-06-01", freq = "D")
Out[0]: date
test
0 2021-05-01 00:00:00
1 2021-05-02 00:00:00
2 2021-05-03 00:00:00
3 2021-05-04 00:00:00
4 2021-05-05 00:00:00
5 2021-05-06 00:00:00
6 2021-05-07 00:00:00
7 2021-05-08 00:00:00
8 2021-05-09 00:00:00
9 2021-05-10 00:00:00
In[1]
date['test'] = date['test'].dt.date
Out[1]:
test
0 2021-05-01
1 2021-05-02
2 2021-05-03
3 2021-05-04
4 2021-05-05
5 2021-05-06
6 2021-05-07
7 2021-05-08
8 2021-05-09
9 2021-05-10
In[2]:date['base'] = dt.strptime("2021-05-01",'%Y-%m-%d')
Out[2]:
0 2021-05-01 00:00:00
1 2021-05-01 00:00:00
2 2021-05-01 00:00:00
3 2021-05-01 00:00:00
4 2021-05-01 00:00:00
5 2021-05-01 00:00:00
6 2021-05-01 00:00:00
7 2021-05-01 00:00:00
8 2021-05-01 00:00:00
9 2021-05-01 00:00:00
In[3]:date['base'] = date['base'].dt.date
Out[3]:
base
0 2021-05-01
1 2021-05-01
2 2021-05-01
3 2021-05-01
4 2021-05-01
5 2021-05-01
6 2021-05-01
7 2021-05-01
8 2021-05-01
9 2021-05-01
In[4]:date['test']-date['base']
Out[4]:
diff
0 0 days 00:00:00.000000000
1 1 days 00:00:00.000000000
2 2 days 00:00:00.000000000
3 3 days 00:00:00.000000000
4 4 days 00:00:00.000000000
5 5 days 00:00:00.000000000
6 6 days 00:00:00.000000000
7 7 days 00:00:00.000000000
8 8 days 00:00:00.000000000
9 9 days 00:00:00.000000000
10 10 days 00:00:00.000000000
the only thing i could get is this. I don't want anything other than the number 1-10 cuz i need them for further numerical calculation but i can't get rid of those. Also how could i construct a time series which just outputs the date not the hms after it? i don't want to manually .dt.date for all of those and it sometimes mess things up
You don't need to create a column base for this, simply do:
>>> (date['test'] - pd.to_datetime("2021-05-01", format='%Y-%m-%d')).dt.days
0 0
1 1
2 2
3 3
4 4
...
27 27
28 28
29 29
30 30
31 31
Name: test, dtype: int64
You can convert the timestamps first to epoch seconds (they are actually stored internally as some number, and likely a factor of epoch seconds)
Using pandas datetime to unix timestamp seconds
import pandas as pd
# start df with date column
df = pd.DataFrame({"date": pd.date_range("2021-05-01", "2021-06-01", freq = "D")})
# create a column for datetimes
df["ts"] = (df["date"] - pd.Timestamp("1970-01-01")) // pd.Timedelta("1s")
>>> df
date ts
0 2021-05-01 1619827200
1 2021-05-02 1619913600
2 2021-05-03 1620000000
3 2021-05-04 1620086400
...
31 2021-06-01 1622505600
This will allow you to do integer math before converting back
>>> df["days"] = (df["ts"] - min(df["ts"])) // (60*60*24) # 1 day in seconds
>>> df
date ts days
0 2021-05-01 1619827200 0
1 2021-05-02 1619913600 1
2 2021-05-03 1620000000 2
3 2021-05-04 1620086400 3
...
31 2021-06-01 1622505600 31
Alternatively, with a naive day-based series, you can use the index as the day offset (as that's how the DataFrame was generated)!
>>> import pandas as pd
>>> df = pd.DataFrame({"date": pd.date_range("2021-05-01", "2021-06-01", freq = "D")})
>>> df["days"] = df.index
>>> df
date days
0 2021-05-01 0
1 2021-05-02 1
2 2021-05-03 2
3 2021-05-04 3
...
31 2021-06-01 31
In the example dataframe below, how can I convert t_relative into hours? For example, the relative time in the first row would be 49 hours.
tstart tend t_relative
0 2131-05-16 23:00:00 2131-05-19 00:00:00 2 days 01:00:00
1 2131-05-16 23:00:00 2131-05-19 00:15:00 2 days 01:15:00
2 2131-05-16 23:00:00 2131-05-19 00:45:00 2 days 01:45:00
3 2131-05-16 23:00:00 2131-05-19 01:00:00 2 days 02:00:00
4 2131-05-16 23:00:00 2131-05-19 01:15:00 2 days 02:15:00
t_relative was calculated with the operation, df['t_relative'] = df['tend']-df['tstart'].
You can divide Timedelta:
df['t_relative']/pd.Timedelta('1H')
Output:
0 49.00
1 49.25
2 49.75
3 50.00
4 50.25
Name: t_relative, dtype: float64
I have a dataframe that has a date time column called start time and it is set to a default of 12:00:00 AM. I would like to reset this column so that the first row is 00:01:00 and the second row is 00:02:00, that is one minute interval.
This is the original table.
ID State Time End Time
A001 12:00:00 12:00:00
A002 12:00:00 12:00:00
A003 12:00:00 12:00:00
A004 12:00:00 12:00:00
A005 12:00:00 12:00:00
A006 12:00:00 12:00:00
A007 12:00:00 12:00:00
I want to reset the start time column so that my output is this:
ID State Time End Time
A001 0:00:00 12:00:00
A002 0:00:01 12:00:00
A003 0:00:02 12:00:00
A004 0:00:03 12:00:00
A005 0:00:04 12:00:00
A006 0:00:05 12:00:00
A007 0:00:06 12:00:00
How do I go about this?
you could use pd.date_range:
df['Start Time'] = pd.date_range('00:00', periods=df['Start Time'].shape[0], freq='1min')
gives you
df
Out[23]:
Start Time
0 2019-09-30 00:00:00
1 2019-09-30 00:01:00
2 2019-09-30 00:02:00
3 2019-09-30 00:03:00
4 2019-09-30 00:04:00
5 2019-09-30 00:05:00
6 2019-09-30 00:06:00
7 2019-09-30 00:07:00
8 2019-09-30 00:08:00
9 2019-09-30 00:09:00
supply a full date/time string to get another starting date.
First we convert your State Time column to datetime type. Then we use pd.date_range and use the first time as starting point with a frequency of 1 minute.
df['State Time'] = pd.to_datetime(df['State Time'])
df['State Time'] = pd.date_range(start=df['State Time'].min(),
periods=len(df),
freq='min').time
Output
ID State Time End Time
0 A001 12:00:00 12:00:00
1 A002 12:01:00 12:00:00
2 A003 12:02:00 12:00:00
3 A004 12:03:00 12:00:00
4 A005 12:04:00 12:00:00
5 A006 12:05:00 12:00:00
6 A007 12:06:00 12:00:00
I have the above dataframe (snippet) and want create a new dataframe which is a conditional selection where I keep only the rows that are timestamped with a time before 15:00:00.
I'm still somewhat new to Pandas / python and have been stuck on this for a while :(
You can use DataFrame.between_time:
start = pd.to_datetime('2015-02-24 11:00')
rng = pd.date_range(start, periods=10, freq='14h')
df = pd.DataFrame({'Date': rng, 'a': range(10)})
print (df)
Date a
0 2015-02-24 11:00:00 0
1 2015-02-25 01:00:00 1
2 2015-02-25 15:00:00 2
3 2015-02-26 05:00:00 3
4 2015-02-26 19:00:00 4
5 2015-02-27 09:00:00 5
6 2015-02-27 23:00:00 6
7 2015-02-28 13:00:00 7
8 2015-03-01 03:00:00 8
9 2015-03-01 17:00:00 9
df = df.set_index('Date').between_time('00:00:00', '15:00:00')
print (df)
a
Date
2015-02-24 11:00:00 0
2015-02-25 01:00:00 1
2015-02-25 15:00:00 2
2015-02-26 05:00:00 3
2015-02-27 09:00:00 5
2015-02-28 13:00:00 7
2015-03-01 03:00:00 8
If need exclude 15:00:00 add parameter include_end=False:
df = df.set_index('Date').between_time('00:00:00', '15:00:00', include_end=False)
print (df)
a
Date
2015-02-24 11:00:00 0
2015-02-25 01:00:00 1
2015-02-26 05:00:00 3
2015-02-27 09:00:00 5
2015-02-28 13:00:00 7
2015-03-01 03:00:00 8
You can check the hours of the date column and use it for subsetting:
df['date'] = pd.to_datetime(df['date']) # optional if the date column is of datetime type
df[df.date.dt.hour < 15]