I have the following DataFrame:
import pandas as pd
data = {'ID': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
'Time_order': ['2019-01-01 07:00:00', '2019-01-01 07:25:00', '2019-01-02 07:02:00', '2019-01-02 07:27:00', '2019-01-02 06:58:00', '2019-01-03 07:24:00', '2019-01-04 07:03:00', '2019-01-04 07:24:00', '2019-01-05 07:05:00', '2019-01-05 07:30:00', '2019-01-06 07:00:00', '2019-01-06 07:25:00', '2019-01-07 07:02:00', '2019-01-07 07:27:00', '2019-01-08 06:58:00', '2019-01-08 07:24:00', '2019-01-09 07:03:00', '2019-01-09 07:24:00', '2019-01-10 07:05:00', '2019-01-10 07:30:00',
'2019-01-11 017:00:00', '2019-01-11 17:25:00', '2019-01-12 07:02:00', '2019-01-12 07:27:00', '2019-01-13 06:58:00', '2019-01-13 07:24:00', '2019-01-14 07:03:00', '2019-01-14 07:24:00', '2019-01-15 07:05:00', '2019-01-15 07:30:00']}
df = pd.DataFrame(data)
df['Time_order'] = pd.to_datetime(df['Time_order'])
df['hour'] = df['Time_order'].dt.strftime('%H:%M:%S)
I wanted to make a time_period = 25 minutes of length 25 minutes, so that I can check wether there are orders in that time_period. For example: I will start checking everyday starting from midnight, e.g. from 00:00:00 to 00:25:00 and claculate how many orders are in that order and then move on by a 5 minutes, e.g. from 00:05:00 to 00:30:00 and so on scanning the whole day until 23:59:00. What I am expecting is how many orders where made and pick the maximum ones, so that it returns at which time there is a peak of orders during that time period.
I tired the following:
x = 12 * 24 # each five minutes per hour (12) times 24 hours (a day)
for i in range(x):
df[f'each{i}_minutes_start'] = pd.to_datetime(df['Time_order']).dt.floor(f'{i}_min')
df[f'each{i}_minutes_end'] = df[f'each{i}_minutes_start'] + pd.Timedelta(minutes = 5)
df['time_period'] = df[f'each{i}_minutes_start'].dt.strftime('%H:%M:S%') + '-' + pd.to_datetime(df[f'each{i}_minutes_end']).dt.strtime('%H:%M:S%')
at this point I stucked and could not come forward. Thank you in advance
I think this works:
df.set_index('Time_order').resample("5min").count().rolling(6)['ID'].sum()
Related
How would I be able to subtract 1 second and 1 minute and 1 month from data['date'] column?
import pandas as pd
d = {'col1': [4, 5, 2, 2, 3, 5, 1, 1, 6], 'col2': [6, 2, 1, 7, 3, 5, 3, 3, 9],
'label':['Old','Old','Old','Old','Old','Old','Old','Old','Old'],
'date': ['2022-01-24 10:07:02', '2022-01-27 01:55:03', '2022-01-30 19:09:03', '2022-02-02 14:34:06',
'2022-02-08 12:37:03', '2022-02-10 03:07:02', '2022-02-10 14:02:03', '2022-02-11 00:32:25',
'2022-02-12 21:42:03']}
data = pd.DataFrame(d)
# subtract the dates by 1 second
date_mod_s = pd.to_datetime(data['date'])
# subtract the dates by 1 minute
date_mod_m = pd.to_datetime(data['date'])
# subtract the dates by 1 month
date_mod_M = pd.to_datetime(data['date'])
Your date column is of type string. Convert it to pd.Timestamp and you can use pd.DateOffset:
pd.to_datetime(data["date"]) - pd.DateOffset(months=1, minutes=1, seconds=1)
time range like this
strart_date_time = datetime(2021, 1, 1, 5, 00, 00)
end_date_time = datetime(2021,1, 3, 00,00,00)
I want Output Like This I mean I want every single second count between given time range hope you understand.....
ThankYou in advance
output-
2021, 1, 1, 0, 00, 01
2021, 1, 1, 0, 01, 02
2021, 1, 1, 2, 30, 03
...
You can get the time difference in seconds, loop and display new date at each second
from datetime import datetime, timedelta
start_date_time = datetime(2021, 1, 1, 5, 00, 00)
end_date_time = datetime(2021,1, 3, 00,00,00)
time_diff = int((end_date_time - start_date_time).total_seconds())
for i in range(time_diff):
print(start_date_time + timedelta(0, i))
I am fairly new to python and I am looking for an efficient way to organize time in designated bins. I have a table with [start_time] and [duration]. I want to fill the time spent in each hourly interval based on this table.
Example: if I have this table,
start_time duration
12:25 1:00
13:35 0:15
14:03 0:20
15:40 0:10
16:15 1:05
17:30 0:40
then the expected output is
bins time
12:00 - 13:00 0:35
13:00 - 14:00 0:40
14:00 - 15:00 0:20
15:00 - 16:00 0:10
16:00 - 17:00 0:45
17:00 - 18:00 0:50
18:00 - 19:00 0:10
19:00 - 20:00 0:00
I would appreciate any help on this task! :)
The python-ranges library I wrote a while ago could be useful for this:
from ranges import Range, RangeSet
from datetime import datetime, timedelta
from functools import reduce
# times, transcribed from above
# and converted to datetimes (so that we can use timedelta math)
times = [
(datetime(1, 1, 1, 12, 25), timedelta(hours=1)), # (12:25, 1:00),
(datetime(1, 1, 1, 13, 35), timedelta(minutes=15)), # (13:35, 0:15),
(datetime(1, 1, 1, 14, 3), timedelta(minutes=20)), # (14:03, 0:20),
(datetime(1, 1, 1, 15, 40), timedelta(minutes=10)), # (15:40, 0:10),
(datetime(1, 1, 1, 16, 15), timedelta(minutes=65)), # (16:15, 1:05),
(datetime(1, 1, 1, 17, 30), timedelta(minutes=40)), # (17:30, 0:40),
]
# make a RangeSet that encompasses the entire day
wholeDay = RangeSet(Range(datetime(1, 1, 1, 12, 00), datetime(1, 1, 1, 20, 00)))
# remove our times from the whole day
wholeDay -= [Range(start, start + duration) for (start, duration) in times]
# get a list of correspondences with timedeltas
bins = {}
for h in range(12, 20):
# create the 1-hour-long range
period = Range(datetime(1, 1, 1, h), datetime(1, 1, 1, h + 1))
# compute number of minutes in this range *were consumed* during the whole day
# which is the same as the number of minutes in this period that are not contained in wholeDay
# (in other words, the length of the set difference)
# We have to do this roundabout counting-second method,
# because timedelta() doesn't work with sum() natively
time_seconds = sum(rng.length().seconds for rng in period.difference(wholeDay))
# finally, add to dict
bins[period] = timedelta(seconds=time_seconds)
This produces the following bins:
{Range[datetime.datetime(1, 1, 1, 12, 0), datetime.datetime(1, 1, 1, 13, 0)): datetime.timedelta(seconds=2100),
Range[datetime.datetime(1, 1, 1, 13, 0), datetime.datetime(1, 1, 1, 14, 0)): datetime.timedelta(seconds=2400),
Range[datetime.datetime(1, 1, 1, 14, 0), datetime.datetime(1, 1, 1, 15, 0)): datetime.timedelta(seconds=1200),
Range[datetime.datetime(1, 1, 1, 15, 0), datetime.datetime(1, 1, 1, 16, 0)): datetime.timedelta(seconds=600),
Range[datetime.datetime(1, 1, 1, 16, 0), datetime.datetime(1, 1, 1, 17, 0)): datetime.timedelta(seconds=2700),
Range[datetime.datetime(1, 1, 1, 17, 0), datetime.datetime(1, 1, 1, 18, 0)): datetime.timedelta(seconds=3000),
Range[datetime.datetime(1, 1, 1, 18, 0), datetime.datetime(1, 1, 1, 19, 0)): datetime.timedelta(seconds=600),
Range[datetime.datetime(1, 1, 1, 19, 0), datetime.datetime(1, 1, 1, 20, 0)): datetime.timedelta(0)}
which is your intended output, represented by datetimes.
I have a data set:
df = pd.DataFrame({
'service': ['a', 'a', 'a', 'b', 'c', 'a', 'a'],
'status': ['problem', 'problem', 'ok', 'problem', 'ok', 'problem', 'ok'],
'created': [
datetime(2019, 1, 1, 1, 1, 0),
datetime(2019, 1, 1, 1, 1, 10),
datetime(2019, 1, 1, 1, 2, 0),
datetime(2019, 1, 1, 1, 3, 0),
datetime(2019, 1, 1, 1, 5, 0),
datetime(2019, 1, 1, 1, 10, 0),
datetime(2019, 1, 1, 1, 20, 0),
],
})
print(df.head(10))
service status created
0 a problem 2019-01-01 01:01:00 # -\
1 a problem 2019-01-01 01:01:10 # --> one group
2 a ok 2019-01-01 01:02:00 # -/
3 b problem 2019-01-01 01:03:00
4 c ok 2019-01-01 01:05:00
5 a problem 2019-01-01 01:10:00 # -\
6 a ok 2019-01-01 01:20:00 # - --> one group
As you can see a service changed status problem -> ok(0, 2 items; 5, 6 items). Also you can see that 3, 4 items has no changes(only 1 record - without group/chunk). I need to create the next data set:
service downtime_seconds
0 a 60 # `created` difference between 2 and 0
1 a 600 # `created` difference between 6 and 5
I can do it through iteration:
for i in range(len(df.index)):
# if df.loc[i]['status'] blablabla...
Is it possible to do it using pandas without iteration? Maybe there is a more elegant method?
Thank you.
In your case we need create the groupby key by reverse the order and cumsum , then we just need to filter the df before we groupby , use nunique with transform
s=df.status.eq('ok').iloc[::-1].cumsum()
con=df.service.groupby(s).transform('nunique')==1
df_g=df[con].groupby(s).agg({'service':'first','created':lambda x : (x.iloc[-1]-x.iloc[0]).seconds})
Out[124]:
service created
status
1 a 600
3 a 60
Rookie question:
the following works:
import time
# create time
dztupel = 1971, 1, 1, 0, 0, 1, 0, 0, 0
print(time.strftime("%d.%m.%Y %H:%M:%S", dztupel))
damals = time.mktime(dztupel)
# output
lt = time.localtime(damals)
wtage = ["Montag", "Dienstag", "Mittwoch","Donnerstag","Freitag","Samstag", "Sonntag"]
wtagnr = lt[6]
print("Das ist ein", wtage[wtagnr])
tag_des_jahres = lt[7]
print("Der {0:d}. Tag des Jahres".format(tag_des_jahres))
but:
dztupel = 1970, 1, 1, 0, 0, 1, 0, 0, 0
does not work,at least not at windows 10. edit: I get out of range error.
But time should start at January 1st 1970 at 0 hour 0 min and 0 seconds. shouldn't it ?
In your second snippet, check out what the time.mktime() function returns, given that dztupel represents a datetime of 11:01am UTC on 1/1/1969 (shows as one hour ahead because of BST (i.e., UTC+0100) locally on my system):
>>> import time
>>> dztupel = 1970, 1, 1, 0, 0, 1, 0, 0, 0 # In BST locally for me, remember, so one hour less seconds than printed EPOCH seconds
>>> time.mktime(dztupel) # This command
-3599.0 # seconds after (i.e., before as is negative) 1/1/1970 UTC0000
It's negative because EPOCH time (which time.mktike is printing, in seconds) starts at UTC midnight on 1/1/1970:
>>> dztupel = 1970, 1, 1, 1, 0, 0, 0, 0, 0 # 1/1/1970 BST0100 == 1/1/1970 UTC0000
>>> time.mktime(dztupel)
0.0 # seconds after 1/1/1970 UTC0000
Hence 0.0, as it's 0 seconds since dztupel = 1970, 1, 1, 1, 0, 0, 0, 0, 0 since BST 0100 on 1/1/1970, or since UTC midnight on 1/1/1970.
Really, we want to print as UTC, so instead of time.localtime(), use time.gmtime():
>>> dztupel = 1970, 1, 1, 0, 0, 1, 0, 0, 0
>>> time.gmtime(time.mktime(dztupel))
time.struct_time(tm_year=1969, tm_mon=12, tm_mday=31, tm_hour=23, tm_min=0, tm_sec=1, tm_wday=2, tm_yday=365, tm_isdst=0)
Then use strftime() to format it:
>>> gmt = time.gmtime(time.mktime(dztupel))
>>> time.strftime('%Y-%m-%d %H:%M:%S', gmt)
'1969-12-31 23:00:01'