Format These Dates And Get Time Passed - python

I have a Python list of dates and I'm using min and max to find the most recent and the oldest (first, is that the best method?), but also I need to format the dates into something where I can figure out the current time and subtract the oldest date in the list so I can say something like "In the last 27 minutes..." where I can state how many days, hours, or minutes have past since the oldest. Here is my list (the dates change obviously depending on what I'm pulling) so you can see the current format. How do I get the info I need?
[u'Sun Oct 06 18:00:55 +0000 2013', u'Sun Oct 06 17:57:41 +0000 2013', u'Sun Oct 06 17:55:44 +0000 2013', u'Sun Oct 06 17:54:10 +0000 2013', u'Sun Oct 06 17:35:58 +0000 2013', u'Sun Oct 06 17:35:58 +0000 2013', u'Sun Oct 06 17:35:25 +0000 2013', u'Sun Oct 06 17:34:39 +0000 2013', u'Sun Oct 06 17:34:39 +0000 2013', u'Sun Oct 06 17:34:39 +0000 2013', u'Sun Oct 06 17:30:35 +0000 2013', u'Sun Oct 06 17:25:28 +0000 2013', u'Sun Oct 06 17:24:04 +0000 2013', u'Sun Oct 06 17:24:04 +0000 2013', u'Sun Oct 06 17:22:08 +0000 2013', u'Sun Oct 06 17:22:08 +0000 2013', u'Sun Oct 06 17:21:00 +0000 2013', u'Sun Oct 06 17:18:49 +0000 2013', u'Sun Oct 06 17:18:49 +0000 2013', u'Sun Oct 06 17:15:29 +0000 2013', u'Sun Oct 06 17:15:29 +0000 2013', u'Sun Oct 06 17:13:35 +0000 2013', u'Sun Oct 06 17:12:18 +0000 2013', u'Sun Oct 06 17:12:00 +0000 2013', u'Sun Oct 06 17:07:34 +0000 2013', u'Sun Oct 06 17:03:59 +0000 2013']

You won't get the oldest and newest date/time entries from your list with the entries by using min and max - "Fri" will come before "Mon", for example. So you'll want to put things into a data structure that represents date/time stamps properly.
Fortunately, Python's datetime module comes with a method to convert lots of date/time stamp strings into a proper representation - datetime.datetime.strptime. See the guide for how to use it.
Once that's done you can use min and max and then timedelta to compute the difference.
from datetime import datetime
# Start with the initial list
A = [u'Sun Oct 06 18:00:55 +0000 2013', u'Sun Oct 06 17:57:41 +0000 2013', u'Sun Oct 06 17:55:44 +0000 2013', u'Sun Oct 06 17:54:10 +0000 2013', u'Sun Oct 06 17:35:58 +0000 2013', u'Sun Oct 06 17:35:58 +0000 2013', u'Sun Oct 06 17:35:25 +0000 2013', u'Sun Oct 06 17:34:39 +0000 2013', u'Sun Oct 06 17:34:39 +0000 2013', u'Sun Oct 06 17:34:39 +0000 2013', u'Sun Oct 06 17:30:35 +0000 2013', u'Sun Oct 06 17:25:28 +0000 2013', u'Sun Oct 06 17:24:04 +0000 2013', u'Sun Oct 06 17:24:04 +0000 2013', u'Sun Oct 06 17:22:08 +0000 2013', u'Sun Oct 06 17:22:08 +0000 2013', u'Sun Oct 06 17:21:00 +0000 2013', u'Sun Oct 06 17:18:49 +0000 2013', u'Sun Oct 06 17:18:49 +0000 2013', u'Sun Oct 06 17:15:29 +0000 2013', u'Sun Oct 06 17:15:29 +0000 2013', u'Sun Oct 06 17:13:35 +0000 2013', u'Sun Oct 06 17:12:18 +0000 2013', u'Sun Oct 06 17:12:00 +0000 2013', u'Sun Oct 06 17:07:34 +0000 2013', u'Sun Oct 06 17:03:59 +0000 2013']
# This is the format string the date/time stamps are using
# On Python 3.3 on Windows you can use this format
# s_format = "%a %b %d %H:%M:%S %z %Y"
# However, Python 2.7 on Windows doesn't work with that. If all of your date/time stamps use the same timezone you can do:
s_format = "%a %b %d %H:%M:%S +0000 %Y"
# Convert the text list into datetime objects
A = [datetime.strptime(d, s_format) for d in A]
# Get the extremes
oldest = min(A)
newest = max(A)
# If you substract oldest from newest you get a timedelta object, which can give you the total number of seconds between them. You can use this to calculate days, hours, and minutes.
delta = int((newest - oldest).total_seconds())
delta_days, rem = divmod(delta, 86400)
delta_hours, rem = divmod(rem, 3600)
delta_minutes, delta_seconds = divmod(rem, 60)

your question can be divided into three pieces:
A)
how to read string formated dates
B)
how to sort list of dates in python
C)
how to calculate the difference between two dates

Related

Group python dataframe and display all correspond values for each unique key in a dictionary

I have the following dataset
id
date
7510
15 Jun 2020
7510
16 Jun 2020
7512
15 Jun 2020
7512
07 Jul 2020
7520
15 Jun 2020
7520
16 Aug 2020
I need to convert this to a dictionary which is quite straight forward, but need each unique id as a key and all corresponding values as values to the unique key.
for example;
dictionary = {7510: ["15 Jun 2020", "16 Jun 2020"], 7512: ["15 Jun 2020", "07 Jul 2020"],
7520: ["15 Jun 2020", "16 Aug 2020"] }
Try this:
df.groupby('id')['date'].agg(list).to_dict()
Output:
{7510: ['15 Jun 2020', '16 Jun 2020'],
7512: ['15 Jun 2020', '07 Jul 2020'],
7520: ['15 Jun 2020', '16 Aug 2020']}

Sort a list of dictionaries of dates by value

I'm trying to sort the values with current year.
Current year values should show first.
mdlist = [{0:'31 Jan 2022', 1:'', 2:'10 Feb 2022'},
{0:'10 Feb 2021', 1:'20, Feb 2021', 2:''},
{0:'10 Feb 2022', 1:'10 Feb 2022', 2:'10 Feb 2022'}]
mdlist = sorted(mdlist, key=lambda d:d[0])
but it is not working as expected
expected output:
mdlist = [{0:'31 Jan 2022', 1:'', 2:'10 Feb 2022'},
{0:'10 Feb 2022', 1:'10 Feb 2022', 2:'10 Feb 2022'},
{0:'10 Feb 2021', 1:'20 Feb 2021', 2:''}]
Maybe you could leverage the fact that these are datetimes by using the datetime module and sort it by the years in descending order and the month-days in ascending order:
from datetime import datetime
def sorting_key(dct):
ymd = datetime.strptime(dct[0], "%d %b %Y")
return -ymd.year, ymd.month, ymd.day
mdlist.sort(key=sorting_key)
Output:
[{0: '31 Jan 2022', 1: '', 2: '10 Feb 2022'},
{0: '10 Feb 2022', 1: '10 Feb 2022', 2: '10 Feb 2022'},
{0: '10 Feb 2021', 1: '20 Feb 2021', 2: ''}]
Use a key function that returns 0 if the year is 2022, 1 otherwise. This will sort all the 2022 dates first.
firstyear = '2022'
mdlist = sorted(mdlist, key=lambda d: 0 if d:d[0].split()[-1] == firstyear else 1)

How do I get an output of all the lines when analysing logfile?

The script should be able to run and analyze a logfile when typing the following in the terminal:
python loganalyzer.py [filepath_to_logfile] [action]
The action specified determines what the script outputs.
When typing error or notice as action, the output is:
date : message
date : message
date : message
date : message
But the script doesn't output all lines... why?
For example, when asking for notices the output I get is:
[Mon Dec 05 14:01:48 2005] : workerEnv.init() ok /etc/httpd/conf/workers2.properties
[Mon Dec 05 14:11:40 2005] : jk2_init() Found child 6115 in scoreboard slot 10
[Mon Dec 05 14:11:45 2005] : workerEnv.init() ok /etc/httpd/conf/workers2.properties
[Mon Dec 05 15:31:06 2005] : jk2_init() Found child 6260 in scoreboard slot 7
[Mon Dec 05 15:31:09 2005] : jk2_init() Found child 6261 in scoreboard slot 8
[Mon Dec 05 15:31:10 2005] : workerEnv.init() ok /etc/httpd/conf/workers2.properties
[Mon Dec 05 15:40:59 2005] : jk2_init() Found child 6276 in scoreboard slot 6
[Mon Dec 05 15:41:32 2005] : workerEnv.init() ok /etc/httpd/conf/workers2.properties
[Mon Dec 05 15:45:42 2005] : jk2_init() Found child 6285 in scoreboard slot 8
[Mon Dec 05 15:45:44 2005] : workerEnv.init() ok /etc/httpd/conf/workers2.properties
[Mon Dec 05 15:50:53 2005] : jk2_init() Found child 6294 in scoreboard slot 7
[Mon Dec 05 15:51:18 2005] : jk2_init() Found child 6296 in scoreboard slot 6
[Mon Dec 05 15:51:20 2005] : workerEnv.init() ok /etc/httpd/conf/workers2.properties
[Mon Dec 05 15:55:31 2005] : jk2_init() Found child 6302 in scoreboard slot 8
[Mon Dec 05 15:55:32 2005] : workerEnv.init() ok /etc/httpd/conf/workers2.properties
[Mon Dec 05 16:01:17 2005] : jk2_init() Found child 6310 in scoreboard slot 6
[Mon Dec 05 16:02:00 2005] : jk2_init() Found child 6316 in scoreboard slot 7
It's only 17 of 34 lines.
Here is all of my code:
import argparse
error = {}
notice = {}
log_file = 'test.log'
# Functions
def load():
with open('test.log') as logfile:
for line in logfile:
parts = line.split('[error]')
if len(parts) == 2:
error[parts[0]] = parts[1]
parts = line.split('[notice]')
if len(parts) == 2:
notice[parts[0]] = parts[1]
def errors():
for date, info in error.items():
print(date + ' : ' + info)
def notices():
for date, info in notice.items():
print(date + ' : ' + info)
def statistics():
file = open('test.log', 'r')
error_counter = 0-1
content = file.read()
errors = content.split('[error]')
for error in errors:
if error:
error_counter += 1
print('Errors: ' + str(error_counter))
notice_counter = 0-1
notices = content.split('[notice]')
for notice in notices:
if notice:
notice_counter += 1
print('Notices: ' + str(notice_counter))
# Run
if __name__ == '__main__':
load()
notices()
# parser = argparse.ArgumentParser()
# parser.add_argument('logfile', help = 'Run program by typing python.exe logglooker.py [path_to_logfile] [action]')
# parser.add_argument('notice', help = 'Use this action to output all the date and message of all notice entries.')
# parser.add_argument('error', help = 'Use this action to output all the data and message of all error entries.')
# parser.add_argument('statistics', help = 'Use this action to output how many errors and notices it is in the logfile.')
# args = parser.parse_args()
# log_file = args.logfile
I haven't figure out how to get argparse to work properly yet, thats why the code looks a little weird.
And the test.log to analyze is this.
[Mon Dec 05 14:01:48 2005] [notice] workerEnv.init() ok /etc/httpd/conf/workers2.properties
[Mon Dec 05 14:01:48 2005] [error] mod_jk child workerEnv in error state 7
[Mon Dec 05 14:01:48 2005] [notice] workerEnv.init() ok /etc/httpd/conf/workers2.properties
[Mon Dec 05 14:01:48 2005] [error] mod_jk child workerEnv in error state 9
[Mon Dec 05 14:01:48 2005] [notice] workerEnv.init() ok /etc/httpd/conf/workers2.properties
[Mon Dec 05 14:01:48 2005] [error] mod_jk child workerEnv in error state 8
[Mon Dec 05 14:11:40 2005] [notice] jk2_init() Found child 6115 in scoreboard slot 10
[Mon Dec 05 14:11:43 2005] [error] [client 141.154.18.244] Directory index forbidden by rule: /var/www/html/
[Mon Dec 05 14:11:45 2005] [notice] workerEnv.init() ok /etc/httpd/conf/workers2.properties
[Mon Dec 05 14:11:45 2005] [error] mod_jk child workerEnv in error state 7
[Mon Dec 05 15:31:06 2005] [notice] jk2_init() Found child 6259 in scoreboard slot 6
[Mon Dec 05 15:31:06 2005] [notice] jk2_init() Found child 6260 in scoreboard slot 7
[Mon Dec 05 15:31:09 2005] [notice] jk2_init() Found child 6261 in scoreboard slot 8
[Mon Dec 05 15:31:10 2005] [notice] workerEnv.init() ok /etc/httpd/conf/workers2.properties
[Mon Dec 05 15:31:10 2005] [error] mod_jk child workerEnv in error state 6
[Mon Dec 05 15:31:10 2005] [notice] workerEnv.init() ok /etc/httpd/conf/workers2.properties
[Mon Dec 05 15:31:10 2005] [error] mod_jk child workerEnv in error state 6
[Mon Dec 05 15:31:10 2005] [notice] workerEnv.init() ok /etc/httpd/conf/workers2.properties
[Mon Dec 05 15:31:10 2005] [error] mod_jk child workerEnv in error state 6
[Mon Dec 05 15:40:59 2005] [notice] jk2_init() Found child 6277 in scoreboard slot 7
[Mon Dec 05 15:40:59 2005] [notice] jk2_init() Found child 6276 in scoreboard slot 6
[Mon Dec 05 15:41:32 2005] [notice] jk2_init() Found child 6280 in scoreboard slot 7
[Mon Dec 05 15:41:32 2005] [notice] jk2_init() Found child 6278 in scoreboard slot 8
[Mon Dec 05 15:41:32 2005] [notice] jk2_init() Found child 6279 in scoreboard slot 6
[Mon Dec 05 15:41:32 2005] [notice] workerEnv.init() ok /etc/httpd/conf/workers2.properties
[Mon Dec 05 15:41:32 2005] [error] mod_jk child workerEnv in error state 7
[Mon Dec 05 15:41:32 2005] [notice] workerEnv.init() ok /etc/httpd/conf/workers2.properties
[Mon Dec 05 15:41:32 2005] [error] mod_jk child workerEnv in error state 6
[Mon Dec 05 15:41:32 2005] [notice] workerEnv.init() ok /etc/httpd/conf/workers2.properties
[Mon Dec 05 15:41:32 2005] [error] mod_jk child workerEnv in error state 7
[Mon Dec 05 15:45:42 2005] [notice] jk2_init() Found child 6285 in scoreboard slot 8
[Mon Dec 05 15:45:44 2005] [notice] workerEnv.init() ok /etc/httpd/conf/workers2.properties
[Mon Dec 05 15:45:44 2005] [error] mod_jk child workerEnv in error state 6
[Mon Dec 05 15:50:53 2005] [notice] jk2_init() Found child 6293 in scoreboard slot 6
[Mon Dec 05 15:50:53 2005] [notice] jk2_init() Found child 6294 in scoreboard slot 7
[Mon Dec 05 15:51:18 2005] [notice] jk2_init() Found child 6297 in scoreboard slot 7
[Mon Dec 05 15:51:18 2005] [notice] jk2_init() Found child 6295 in scoreboard slot 8
[Mon Dec 05 15:51:18 2005] [notice] jk2_init() Found child 6296 in scoreboard slot 6
[Mon Dec 05 15:51:20 2005] [notice] workerEnv.init() ok /etc/httpd/conf/workers2.properties
[Mon Dec 05 15:51:20 2005] [error] mod_jk child workerEnv in error state 7
[Mon Dec 05 15:51:20 2005] [notice] workerEnv.init() ok /etc/httpd/conf/workers2.properties
[Mon Dec 05 15:51:20 2005] [error] mod_jk child workerEnv in error state 7
[Mon Dec 05 15:51:20 2005] [notice] workerEnv.init() ok /etc/httpd/conf/workers2.properties
[Mon Dec 05 15:51:20 2005] [error] mod_jk child workerEnv in error state 6
[Mon Dec 05 15:55:31 2005] [notice] jk2_init() Found child 6302 in scoreboard slot 8
[Mon Dec 05 15:55:32 2005] [notice] workerEnv.init() ok /etc/httpd/conf/workers2.properties
[Mon Dec 05 15:55:32 2005] [error] mod_jk child workerEnv in error state 6
[Mon Dec 05 16:01:17 2005] [notice] jk2_init() Found child 6310 in scoreboard slot 6
[Mon Dec 05 16:02:00 2005] [notice] jk2_init() Found child 6315 in scoreboard slot 6
[Mon Dec 05 16:02:00 2005] [notice] jk2_init() Found child 6316 in scoreboard slot 7

list' object has no attribute 'date'

I am new to python and was trying to sort the dates in a list. Below is the code that I wrote and getting
following error on the below line
#### date_object = datetime_object.date() ## list' object has no attribute 'date'
from datetime import datetime,date
lst_dates = ['01 Apr 2017', '01 Apr 2018', '01 Aug 2017', '01 Aug 2018', '01 Dec 2017', '01 Dec 2018', '01 Feb 2017', '01 Feb 2018', '01 Jan 2017', '01 Jan 2018']
datetime_object = sorted(lst_dates, key=lambda x: datetime.strptime(x, '%d %b %Y'))
date_object = datetime_object.date()
print(date_object)
Please assist in helping me understand what the issues is. Thanks
Python don't have the list.date() function, with below code list of dates can be sorted.
from datetime import datetime
lst_dates = ['01 Apr 2017', '01 Apr 2018', '01 Aug 2017', '01 Aug 2018', '01 Dec 2017', '01 Dec 2018', '01 Feb 2017', '01 Feb 2018', '01 Jan 2017', '01 Jan 2018']
lst_dates.sort(key=lambda date: datetime.strptime(date, "%d %b %Y"))
print(lst_dates)
The problem with you code is on line #3 when you are writing
datetime_object = sorted(lst_dates, key=lambda x: datetime.strptime(x, '%d %b %Y'))
sorted function in Python returns a new python list object. If you want to check then run
type(datetime_object)
So in order to achieve what you want here you need to iterate over that list. Your final code would be something like this
from datetime import datetime,date
lst_dates = ['01 Apr 2017', '01 Apr 2018', '01 Aug 2017', '01 Aug 2018', '01 Dec 2017', '01 Dec 2018', '01 Feb 2017', '01 Feb 2018', '01 Jan 2017', '01 Jan 2018']
datetime_obj_list = sorted(lst_dates, key=lambda x: datetime.strptime(x, '%d %b %Y'))
for datetime_object in datetime_obj_list:
datetime_object = datetime.strptime(datetime_object, "%d %b %Y")
print(datetime_object.date())
UPDATE:
Here's a working sample of the code https://ideone.com/YRDQR7
the problem is on 4th line
it should be date_object = datetime.date()
This works just fine:
from datetime import datetime,date
lst_dates = ['01 Apr 2017', '01 Apr 2018', '01 Aug 2017', '01 Aug 2018', '01 Dec 2017', '01 Dec 2018', '01 Feb 2017', '01 Feb 2018', '01 Jan 2017', '01 Jan 2018']
datetime_object = sorted(lst_dates, key=lambda x: datetime.strptime(x, '%d %b %Y'))
#date_object = datetime_object.date() # <<-- remove this line
print(datetime_object)
testing:
>>> from datetime import datetime,date
>>> lst_dates = ['01 Apr 2017', '01 Apr 2018', '01 Aug 2017', '01 Aug 2018', '01 Dec 2017', '01 Dec 2018', '01 Feb 2017', '01 Feb 2018', '01 Jan 2017', '01 Jan 2018']
>>> datetime_object = sorted(lst_dates, key=lambda x: datetime.strptime(x, '%d %b %Y'))
>>> print(datetime_object)
['01 Jan 2017', '01 Feb 2017', '01 Apr 2017', '01 Aug 2017', '01 Dec 2017', '01 Jan 2018', '01 Feb 2018', '01 Apr 2018', '01 Aug 2018', '01 Dec 2018']
>>>

How to have Counter count the correct strings

I am trying to get the counter to count which date appears most in the code below.
from collections import Counter
with open('dates.json', 'rb') as f:
data = f.readlines()
c = Counter(data)
print (c.most_common()[:10])
the JSON data is stored as a list like
["Sun Aug 07 01:50:13 +0000 2016", "Sun Aug 07 01:50:13 +0000 2016", "Sun Aug 07 01:50:13 +0000 2016", "Sun Aug 07 01:50:13 +0000 2016", "Sun Aug 07 01:50:13 +0000 2016", "Sun Aug 07 01:50:13 +0000 2016", "Sun Aug 07 01:50:13 +0000 2016", "Sun Aug 07 01:50:13 +0000 2016", "Sun Aug 07 01:50:13 +0000 2016"]
I would expect the output to be something similar to this (grabbed from another program)
[('Sun Aug 07 02:29:45 +0000 2016', 4), ('Sun Aug 07 02:31:05 +0000 2016', 4), ('Sun Aug 07 02:31:04 +0000 2016', 3), ('Sun Aug 07 02:31:08 +0000 2016', 3), ('Sun Aug 07 02:31:22 +0000 2016', 3)]
But this is my output
[(48, 72), (32, 53), (49, 27), (34, 18), (117, 18), (58, 18), (65, 9), (51, 9), (103, 9), (43, 9)]
I dont really understand what its counting there
Instead of readlines(), you should use json.load() to load the JSON data into a Python list:
import json
with open('dates.json', 'r') as f:
data = json.load(f)

Categories

Resources