Split list of datetimes into days - python

I've got a sorted list of datetimes: (with day gaps)
list_of_dts = [
datetime.datetime(2012,1,1,0,0,0),
datetime.datetime(2012,1,1,1,0,0),
datetime.datetime(2012,1,2,0,0,0),
datetime.datetime(2012,1,3,0,0,0),
datetime.datetime(2012,1,5,0,0,0),
]
And I'd like to split them in to a list for each day:
result = [
[datetime.datetime(2012,1,1,0,0,0), datetime.datetime(2012,1,1,1,0,0)],
[datetime.datetime(2012,1,2,0,0,0)],
[datetime.datetime(2012,1,3,0,0,0)],
[], # Empty list for no datetimes on day
[datetime.datetime(2012,1,5,0,0,0)]
]
Algorithmically, it should be possible to achieve at least O(n).
Perhaps something like the following:
(This obviously doesn't handle missed days, and drops the last dt, but it's a start)
def dt_to_d(list_of_dts):
result = []
start_dt = list_of_dts[0]
day = [start_dt]
for i, dt in enumerate(list_of_dts[1:]):
previous = start_dt if i == 0 else list_of_dts[i-1]
if dt.day > previous.day or dt.month > previous.month or dt.year > previous.year:
# split to new sub-list
result.append(day)
day = []
# Loop for each day gap?
day.append(dt)
return result
Thoughts?

The easiest way to go is to use dict.setdefault to group entries falling on the same day and then loop over the lowest day to the highest:
>>> import datetime
>>> list_of_dts = [
datetime.datetime(2012,1,1,0,0,0),
datetime.datetime(2012,1,1,1,0,0),
datetime.datetime(2012,1,2,0,0,0),
datetime.datetime(2012,1,3,0,0,0),
datetime.datetime(2012,1,5,0,0,0),
]
>>> days = {}
>>> for dt in list_of_dts:
days.setdefault(dt.toordinal(), []).append(dt)
>>> [days.get(day, []) for day in range(min(days), max(days)+1)]
[[datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 1, 1, 1, 0)],
[datetime.datetime(2012, 1, 2, 0, 0)],
[datetime.datetime(2012, 1, 3, 0, 0)],
[],
[datetime.datetime(2012, 1, 5, 0, 0)]]
Another approach for making such groupings is itertools.groupby. It is designed for this kind of work, but it doesn't provide a way to fill-in an empty list for missing days:
>>> import itertools
>>> [list(group) for k, group in itertools.groupby(list_of_dts,
key=datetime.datetime.toordinal)]
[[datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 1, 1, 1, 0)],
[datetime.datetime(2012, 1, 2, 0, 0)],
[datetime.datetime(2012, 1, 3, 0, 0)],
[datetime.datetime(2012, 1, 5, 0, 0)]]

You can use itertools.groupby to easily handle this kind of problems:
import datetime
import itertools
list_of_dts = [
datetime.datetime(2012,1,1,0,0,0),
datetime.datetime(2012,1,1,1,0,0),
datetime.datetime(2012,1,2,0,0,0),
datetime.datetime(2012,1,3,0,0,0),
datetime.datetime(2012,1,5,0,0,0),
]
print [list(g) for k, g in itertools.groupby(list_of_dts, key=lambda d: d.date())]

Filling the gaps:
date_dict = {}
for date_value in list_of_dates:
if date_dict.has_key(date_value.date()):
date_dict[date_value.date()].append(date_value)
else:
date_dict[date_value.date()] = [ date_value ]
sorted_dates = sorted(date_dict.keys())
date = sorted_dates[0]
while date <= sorted_dates[-1]:
print date_dict.get(date, [])
date += datetime.timedelta(1)
Results:
[datetime.datetime(2012, 1, 1, 0, 0), datetime.datetime(2012, 1, 1, 1, 0)]
[datetime.datetime(2012, 1, 2, 0, 0)]
[datetime.datetime(2012, 1, 3, 0, 0)]
[]
[datetime.datetime(2012, 1, 5, 0, 0)]
This solution does not requires the original datetime list to be sorted.

list_of_dts = [
datetime.datetime(2012,1,1,0,0,0),
datetime.datetime(2012,1,1,1,0,0),
datetime.datetime(2012,1,2,0,0,0),
datetime.datetime(2012,1,3,0,0,0),
datetime.datetime(2012,1,5,0,0,0),
]
groupedByDay={}
for date in list_of_dts:
if date.date() in groupedByDay:
groupedByDay[date.date()].append(date)
else:
groupedByDay[date.date()]=[date]
Now you have a dictionary, where the date is the key and the value is a list of similar dates.
and if you are set on having a list instead
result = groupedByDay.values()
result.sort()
now results is a list of lists, where all the dates with the same day are grouped together

Related

How to create a nested list conditioned on a parameter in python

I have generated a day-wise nested list and want to calculate total duration between login and logout sessions and store that value individually in a duration nested list, organized by the day in which the login happened.
My python script is:
import datetime
import itertools
Logintime = [
datetime.datetime(2021,1,1,8,10,10),
datetime.datetime(2021,1,1,10,25,19),
datetime.datetime(2021,1,2,8,15,10),
datetime.datetime(2021,1,2,9,35,10)
]
Logouttime = [
datetime.datetime(2021,1,1,10,10,11),
datetime.datetime(2021,1,1,17,0,10),
datetime.datetime(2021,1,2,9,30,10),
datetime.datetime(2021,1,2,17,30,12)
]
Logintimedaywise = [list(group) for k, group in itertools.groupby(Logintime,
key=datetime.datetime.toordinal)]
Logouttimedaywise = [list(group) for j, group in itertools.groupby(Logouttime,
key=datetime.datetime.toordinal)]
print(Logintimedaywise)
print(Logouttimedaywise)
# calculate total duration
temp = []
l = []
for p,q in zip(Logintimedaywise,Logouttimedaywise):
for a,b in zip(p, q):
tdelta = (b-a)
diff = int(tdelta.total_seconds()) / 3600
if diff not in temp:
temp.append(diff)
l.append(temp)
print(l)
this script generating the following output (the duration in variable l is coming out as a flat list inside a singleton list):
[[datetime.datetime(2021, 1, 1, 8, 10, 10), datetime.datetime(2021, 1, 1, 10, 25, 19)], [datetime.datetime(2021, 1, 2, 8, 15, 10), datetime.datetime(2021, 1, 2, 9, 35, 10)]]
[[datetime.datetime(2021, 1, 1, 10, 10, 11), datetime.datetime(2021, 1, 1, 17, 0, 10)], [datetime.datetime(2021, 1, 2, 9, 30, 10), datetime.datetime(2021, 1, 2, 17, 30, 12)]]
[[2.000277777777778, 6.5808333333333335, 1.25, 7.917222222222223]]
But my desired output format is the following nested list of durations (each item in the list should be the list of durations for a given login day):
[[2.000277777777778, 6.5808333333333335] , [1.25, 7.917222222222223]]
anyone can help how can i store total duration as a nested list according to the login day?
thanks in advance.
Try changing this peace of code:
# calculate total duration
temp = []
l = []
for p,q in zip(Logintimedaywise,Logouttimedaywise):
for a,b in zip(p, q):
tdelta = (b-a)
diff = int(tdelta.total_seconds()) / 3600
if diff not in temp:
temp.append(diff)
l.append(temp)
print(l)
To:
# calculate total duration
l = []
for p,q in zip(Logintimedaywise,Logouttimedaywise):
l.append([])
for a,b in zip(p, q):
tdelta = (b-a)
diff = int(tdelta.total_seconds()) / 3600
if diff not in l[-1]:
l[-1].append(diff)
print(l)
Then the output would be:
[[datetime.datetime(2021, 1, 1, 8, 10, 10), datetime.datetime(2021, 1, 1, 10, 25, 19)], [datetime.datetime(2021, 1, 2, 8, 15, 10), datetime.datetime(2021, 1, 2, 9, 35, 10)]]
[[datetime.datetime(2021, 1, 1, 10, 10, 11), datetime.datetime(2021, 1, 1, 17, 0, 10)], [datetime.datetime(2021, 1, 2, 9, 30, 10), datetime.datetime(2021, 1, 2, 17, 30, 12)]]
[[2.000277777777778, 6.5808333333333335], [1.25, 7.917222222222223]]
I add a new sublist for every iteration.
Your solution and the answer by #U11-Forward will break if login and logout for the same session happen in different days, since the inner lists in Logintimedaywise and Logouttimedaywise will have different number of elements.
To avoid that, a way simpler solution is if you first calculate the duration for all pairs of login, logout, then you create the nested lists based only on the login day (or logout day if you wish), like this:
import datetime
import itertools
import numpy
# define the login and logout times
Logintime = [datetime.datetime(2021,1,1,8,10,10),datetime.datetime(2021,1,1,10,25,19),datetime.datetime(2021,1,2,8,15,10),datetime.datetime(2021,1,2,9,35,10)]
Logouttime = [datetime.datetime(2021,1,1,10,10,11),datetime.datetime(2021,1,1,17,0,10), datetime.datetime(2021,1,2,9,30,10),datetime.datetime(2021,1,2,17,30,12) ]
# calculate the duration and the unique days in the set
duration = [ int((logout - login).total_seconds())/3600 for login,logout in zip(Logintime,Logouttime) ]
login_days = numpy.unique([login.day for login in Logintime])
# create the nested list of durations
# each inner list correspond to a unique login day
Logintimedaywise = [[ login for login in Logintime if login.day == day ] for day in login_days ]
Logouttimedaywise = [[ logout for login,logout in zip(Logintime,Logouttime) if login.day == day ] for day in login_days ]
duration_daywise = [[ d for d,login in zip(duration,Logintime) if login.day == day ] for day in login_days ]
# check
print(Logintimedaywise)
print(Logouttimedaywise)
print(duration_daywise)
Outputs
[[datetime.datetime(2021, 1, 1, 8, 10, 10), datetime.datetime(2021, 1, 1, 10, 25, 19)], [datetime.datetime(2021, 1, 2, 8, 15, 10), datetime.datetime(2021, 1, 2, 9, 35, 10)]]
[[datetime.datetime(2021, 1, 1, 10, 10, 11), datetime.datetime(2021, 1, 1, 17, 0, 10)], [datetime.datetime(2021, 1, 2, 9, 30, 10), datetime.datetime(2021, 1, 2, 17, 30, 12)]]
[[2.000277777777778, 6.5808333333333335], [1.25, 7.917222222222223]]

Finding consecutive duplicates and listing their indexes of where they occur in python

I have a list in python for example:
mylist = [1,1,1,1,1,1,1,1,1,1,1,
0,0,1,1,1,1,0,0,0,0,0,
1,1,1,1,1,1,1,1,0,0,0,0,0,0]
my goal is to find where there are five or more zeros in a row and then list the indexes of where this happens, for example the output for this would be:
[17,21][30,35]
here is what i have tried/seen in other questions asked on here:
def zero_runs(a):
# Create an array that is 1 where a is 0, and pad each end with an extra 0.
iszero = np.concatenate(([0], np.equal(a, 0).view(np.int8), [0]))
absdiff = np.abs(np.diff(iszero))
# Runs start and end where absdiff is 1.
ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
return ranges
runs = zero_runs(list)
this gives output:
[0,10]
[11,12]
...
which is basically just listing indexes of all duplicates, how would i go about separating this data into what i need
You could use itertools.groupby, it will identify the contiguous groups in the list:
from itertools import groupby
lst = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0]
groups = [(k, sum(1 for _ in g)) for k, g in groupby(lst)]
cursor = 0
result = []
for k, l in groups:
if not k and l >= 5:
result.append([cursor, cursor + l - 1])
cursor += l
print(result)
Output
[[17, 21], [30, 35]]
Your current attempt is very close. It returns all of the runs of consecutive zeros in an array, so all you need to accomplish is adding a check to filter runs of less than 5 consecutive zeros out.
def threshold_zero_runs(a, threshold):
iszero = np.concatenate(([0], np.equal(a, 0).view(np.int8), [0]))
absdiff = np.abs(np.diff(iszero))
ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
m = (np.diff(ranges, 1) >= threshold).ravel()
return ranges[m]
array([[17, 22],
[30, 36]], dtype=int64)
Use the shift operator on the array. Compare the shifted version with the original. Where they do not match, you have a transition. You then need only to identify adjacent transitions that are at least 5 positions apart.
Can you take it from there?
Another way using itertools.groupby and enumerate.
First find the zeros and the indices:
from operator import itemgetter
from itertools import groupby
zerosList = [
list(map(itemgetter(0), g))
for i, g in groupby(enumerate(mylist), key=itemgetter(1))
if not i
]
print(zerosList)
#[[11, 12], [17, 18, 19, 20, 21], [30, 31, 32, 33, 34, 35]]
Now just filter zerosList:
runs = [[x[0], x[-1]] for x in zerosList if len(x) >= 5]
print(runs)
#[[17, 21], [30, 35]]

Value difference comparison within a list in python

I have a nested list that contains different variables in it. I am trying to check the difference value between two consecutive items, where if a condition match, group these items together.
i.e.
Item 1 happened on 1-6-2012 1 pm
Item 2 happened on 1-6-2012 4 pm
Item 3 happened on 1-6-2012 6 pm
Item 4 happened on 3-6-2012 5 pm
Item 5 happened on 5-6-2012 5 pm
I want to group the items that have gaps less than 24 Hours. In this case, Items 1, 2 and 3 belong to a group, Item 4 belong to a group and Item 5 belong to another group. I tried the following code:
Time = []
All_Traps = []
Traps = []
Dic_Traps = defaultdict(list)
Traps_CSV = csv.reader(open("D:/Users/d774911/Desktop/Telstra Internship/Working files/Traps_Generic_Features.csv"))
for rows in Traps_CSV:
All_Traps.append(rows)
All_Traps.sort(key=lambda x: x[9])
for length in xrange(len(All_Traps)):
if length == (len(All_Traps) - 1):
break
Node_Name_1 = All_Traps[length][2]
Node_Name_2 = All_Traps[length + 1][2]
Event_Type_1 = All_Traps[length][5]
Event_Type_2 = All_Traps[length + 1][5]
Time_1 = All_Traps[length][9]
Time_2 = All_Traps[length + 1][9]
Difference = datetime.strptime(Time_2[0:19], '%Y-%m-%dT%H:%M:%S') - datetime.strptime(Time_1[0:19], '%Y-%m-%dT%H:%M:%S')
if Node_Name_1 == Node_Name_2 and \
Event_Type_1 == Event_Type_2 and \
float(Difference.seconds) / (60*60) < 24:
Dic_Traps[length].append(All_Traps[Length])
But I am missing some items. Ideas?
For sorted list you may use groupby. Here is a simplified example (you should convert your date strings to datetime objects), it should give the main idea:
from itertools import groupby
import datetime
SRC_DATA = [
(1, datetime.datetime(2015, 06, 20, 1)),
(2, datetime.datetime(2015, 06, 20, 4)),
(3, datetime.datetime(2015, 06, 20, 5)),
(4, datetime.datetime(2015, 06, 21, 1)),
(5, datetime.datetime(2015, 06, 22, 1)),
(6, datetime.datetime(2015, 06, 22, 4)),
]
for group_date, group in groupby(SRC_DATA, key=lambda X: X[1].date()):
print "Group {}: {}".format(group_date, list(group))
Output:
$ python python_groupby.py
Group 2015-06-20: [(1, datetime.datetime(2015, 6, 20, 1, 0)), (2, datetime.datetime(2015, 6, 20, 4, 0)), (3, datetime.datetime(2015, 6, 20, 5, 0))]
Group 2015-06-21: [(4, datetime.datetime(2015, 6, 21, 1, 0))]
Group 2015-06-22: [(5, datetime.datetime(2015, 6, 22, 1, 0)), (6, datetime.datetime(2015, 6, 22, 4, 0))]
First of all, change those horrible cased variable names. Python has its own convention of naming variables, classes, methods and so on. It's called snake case.
Now, on to what you need to do:
import datetime as dt
import pprint
ts_dict = {}
with open('timex.dat', 'r+') as f:
for line in f.read().splitlines():
if line:
item = line.split('happened')[0].strip().split(' ')[1]
timestamp_string = line.split('on')[-1].split('pm')[0]
datetime_stamp = dt.datetime.strptime(timestamp_string.strip(), "%d-%m-%Y %H")
ts_dict[item] = datetime_stamp
This is a hackish way of giving you this:
item_timestamp_dict= {
'1': datetime.datetime(2012, 6, 1, 1, 0),
'2': datetime.datetime(2012, 6, 1, 4, 0),
'3': datetime.datetime(2012, 6, 1, 6, 0),
'4': datetime.datetime(2012, 6, 3, 5, 0),
'5': datetime.datetime(2012, 6, 5, 5, 0)}
A dictionary of item # as key, and their datetime timestamp as value.
You can use the datetime timestamp values' item_timestamp_dict['1'].hour values to do your calculation.
EDIT: It can be optimized a lot.

Splice list of date objects in Python

Is there a simple way to splice a list of date objects:
spliced = sortedDates[startDate:endDate]
print spliced
or does this requires an enumeration?
Example:
sortedDates = [July 1 2012, July 2 2012, July 3 2012, July 4, 2012]
spliced = sortedDates[July 2 2012:July 4 2012]
Assuming you have a list sortedDates that contains datetime object, and 2 datetime minD and maxD objects to define your boundaries:
filtered = [d for d in sortedDates if minD < d < maxD]
Or, more efficient since it takes advantage of the sorted nature of the list to use a binary search:
from bisect import bisect_left, bisect_right
filtered = sortedDates[bisect_right(sortedDates, minD):bisect_left(sortedDates, maxD)]
You can create a range of dates by using a list comprehension:
start_date = datetime.datetime(2014, 1, 1)
end_date = datetime.datetime(2014, 1, 5)
nb_days = (end_date - start_date).days + 1 # + 1 because range is exclusive
splice = [start_date + datetime.timedelta(days=x) for x in range(nb_days)]
With the example given:
>>> [start_date + datetime.timedelta(days=x) for x in range(0, nb_days)]
[datetime.datetime(2014, 1, 1, 0, 0), datetime.datetime(2014, 1, 2, 0, 0),
datetime.datetime(2014, 1, 3, 0, 0), datetime.datetime(2014, 1, 4, 0, 0),
datetime.datetime(2014, 1, 5, 0, 0)]
If you want to return a sublist of an existing range list, you can also construct it via a comprehension list, again:
splice = [x for x in original_list if start_date < x < end_date]
pandas is terrific for date handling.
import pandas as pd
from datetime import datetime
dr = pd.date_range('2012-07-01', '2012-08-01')
dr[(dr >= datetime(2012,7,4)) & (dr <= datetime(2012,7,8))]

Sorting datetime objects while ignoring the year?

I have a list of birthdays stored in datetime objects. How would one go about sorting these in Python using only the month and day arguments?
For example,
[
datetime.datetime(1983, 1, 1, 0, 0)
datetime.datetime(1996, 1, 13, 0 ,0)
datetime.datetime(1976, 2, 6, 0, 0)
...
]
Thanks! :)
You can use month and day to create a value that can be used for sorting:
birthdays.sort(key = lambda d: (d.month, d.day))
l.sort(key = lambda x: x.timetuple()[1:3])
If the dates are stored as strings—you say they aren't, although it looks like they are—you might use dateutil's parser:
>>> from dateutil.parser import parse
>>> from pprint import pprint
>>> bd = ['February 6, 1976','January 13, 1996','January 1, 1983']
>>> bd = [parse(i) for i in bd]
>>> pprint(bd)
[datetime.datetime(1976, 2, 6, 0, 0),
datetime.datetime(1996, 1, 13, 0, 0),
datetime.datetime(1983, 1, 1, 0, 0)]
>>> bd.sort(key = lambda d: (d.month, d.day)) # from sth's answer
>>> pprint(bd)
[datetime.datetime(1983, 1, 1, 0, 0),
datetime.datetime(1996, 1, 13, 0, 0),
datetime.datetime(1976, 2, 6, 0, 0)]
If your dates are in different formats, you might give fuzzy parsing a shot:
>>> bd = [parse(i,fuzzy=True) for i in bd] # replace line 4 above with this line

Categories

Resources