Python - Convert Month Name to Integer - python

How can I convert 'Jan' to an integer using Datetime? When I try strptime, I get an error time data 'Jan' does not match format '%m'

You have an abbreviated month name, so use %b:
>>> from datetime import datetime
>>> datetime.strptime('Jan', '%b')
datetime.datetime(1900, 1, 1, 0, 0)
>>> datetime.strptime('Aug', '%b')
datetime.datetime(1900, 8, 1, 0, 0)
>>> datetime.strptime('Jan 15 2015', '%b %d %Y')
datetime.datetime(2015, 1, 15, 0, 0)
%m is for a numeric month.
However, if all you wanted to do was map an abbreviated month to a number, just use a dictionary. You can build one from calendar.month_abbr:
import calendar
abbr_to_num = {name: num for num, name in enumerate(calendar.month_abbr) if num}
Demo:
>>> import calendar
>>> abbr_to_num = {name: num for num, name in enumerate(calendar.month_abbr) if num}
>>> abbr_to_num['Jan']
1
>>> abbr_to_num['Aug']
8

This is straightforward enough that you could consider just using a dictionary, then you have fewer dependencies anyway.
months = dict(Jan=1, Feb=2, Mar=3, ...)
print(months['Jan'])
>>> 1

Off the cuff-
Did you try %b?

from calendar import month_abbr
month = "Jun"
for k, v in enumerate(month_abbr):
if v == month:
month = k
break
print(month)
6
You will get the number of month 6

Related

dateutils default to the last occurence of recognized part, not next

I am using dateutils.parser.parse to parse date strings which might contain partial information. If some information is not present, parse can take a default keyword argument from which it will fill any missing fields. This default defaults to datetime.datetime.today().
For a case like dateutil.parser.parse("Thursday"), this means it will return the date of the next Thursday. However, I need it to return the date of the last Thursday (including today, if today happens to be a Thursday).
So, assuming today == datetime.datetime(2018, 2, 20) (a Tuesday), I would like to get all of these asserts to be true:
from dateutil import parser
from datetime import datetime
def parse(date_str, default=None):
# this needs to be modified
return parser.parse(date_str, default=default)
today = datetime(2018, 2, 20)
assert parse("Tuesday", default=today) == today # True
assert parse("Thursday", default=today) == datetime(2018, 2, 15) # False
assert parse("Jan 31", default=today) == datetime(2018, 1, 31) # True
assert parse("December 10", default=today) == datetime(2017, 12, 10) # False
Is there an easy way to achieve this? With the current parse function only the first and third assert would pass.
Here's your modified code (code.py):
#!/usr/bin/env python3
import sys
from dateutil import parser
from datetime import datetime, timedelta
today = datetime(2018, 2, 20)
data = [
("Tuesday", today, today),
("Thursday", datetime(2018, 2, 15), today),
("Jan 31", datetime(2018, 1, 31), today),
("December 10", datetime(2017, 12, 10), today),
]
def parse(date_str, default=None):
# this needs to be modified
return parser.parse(date_str, default=default)
def _days_in_year(year):
try:
datetime(year, 2, 29)
except ValueError:
return 365
return 366
def parse2(date_str, default=None):
dt = parser.parse(date_str, default=default)
if default is not None:
weekday_strs = [day_str.lower() for day_tuple in parser.parserinfo.WEEKDAYS for day_str in day_tuple]
if date_str.lower() in weekday_strs:
if dt.weekday() > default.weekday():
dt -= timedelta(days=7)
else:
if (dt.month > today.month) or ((dt.month == today.month) and (dt.day > today.day)):
dt -= timedelta(days=_days_in_year(dt.year))
return dt
def print_stats(parse_func):
print("\nPrinting stats for \"{:s}\"".format(parse_func.__name__))
for triple in data:
d = parse_func(triple[0], default=triple[2])
print(" [{:s}] [{:s}] [{:s}] [{:s}]".format(triple[0], str(d), str(triple[1]), "True" if d == triple[1] else "False"))
if __name__ == "__main__":
print("Python {:s} on {:s}\n".format(sys.version, sys.platform))
print_stats(parse)
print_stats(parse2)
Notes:
I changed the structure of the code "a bit", to parametrize it, so if a change is needed (e.g. a new example to be added) the changes should be minimal
Instead of asserts, I added a function (print_stats) that prints the results (instead raising AssertError and exiting the program if things don't match)
Takes an argument (parse_func) which is a function that does the parsing (e.g. parse)
Uses some globally declared data (data) together with the (above) function
data - is a list of triples, where each triple contains:
Text to be converted
Expected datetime ([Python 3.Docs]: datetime Objects) to be yielded by the conversion
default argument to be passed to the parsing function (parse_func)
parse2 function (an improved version of parse):
Accepts 2 types of date strings:
Weekday name
Month / Day (unordered)
Does the regular parsing, and if the converted object comes after the one passed as the default argument (that is determined by comparing the appropriate attributes of the 2 objects), it subtracts a period (take a look at [Python 3.Docs]: timedelta Objects):
"Thursday" comes after "Tuesday", so it subtracts the number of days in a week (7)
"December 10" comes after "February 20", so it subtracts the number of days in the year*
weekday_strs: I'd better explain it by example:
>>> parser.parserinfo.WEEKDAYS
[('Mon', 'Monday'), ('Tue', 'Tuesday'), ('Wed', 'Wednesday'), ('Thu', 'Thursday'), ('Fri', 'Friday'), ('Sat', 'Saturday'), ('Sun', 'Sunday')]
>>> [day_str.lower() for day_tuple in parser.parserinfo.WEEKDAYS for day_str in day_tuple]
['mon', 'monday', 'tue', 'tuesday', 'wed', 'wednesday', 'thu', 'thursday', 'fri', 'friday', 'sat', 'saturday', 'sun', 'sunday']
Flattens parser.parserinfo.WEEKDAYS
Converts strings to lowercase (for simplifying comparisons)
_days_in_year* - as you probably guessed, returns the number of days in an year (couldn't simply subtract 365 because leap years might mess things up):
>>> dt = datetime(2018, 3, 1)
>>> dt
datetime.datetime(2018, 3, 1, 0, 0)
>>> dt - timedelta(365)
datetime.datetime(2017, 3, 1, 0, 0)
>>> dt = datetime(2016, 3, 1)
>>> dt
datetime.datetime(2016, 3, 1, 0, 0)
>>> dt - timedelta(365)
datetime.datetime(2015, 3, 2, 0, 0)
Output:
(py35x64_test) E:\Work\Dev\StackOverflow\q048884480>"e:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" code.py
Python 3.5.4 (v3.5.4:3f56838, Aug 8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32
Printing stats for "parse"
[Tuesday] [2018-02-20 00:00:00] [2018-02-20 00:00:00] [True]
[Thursday] [2018-02-22 00:00:00] [2018-02-15 00:00:00] [False]
[Jan 31] [2018-01-31 00:00:00] [2018-01-31 00:00:00] [True]
[December 10] [2018-12-10 00:00:00] [2017-12-10 00:00:00] [False]
Printing stats for "parse2"
[Tuesday] [2018-02-20 00:00:00] [2018-02-20 00:00:00] [True]
[Thursday] [2018-02-15 00:00:00] [2018-02-15 00:00:00] [True]
[Jan 31] [2018-01-31 00:00:00] [2018-01-31 00:00:00] [True]
[December 10] [2017-12-10 00:00:00] [2017-12-10 00:00:00] [True]

Convert integer date format to human readable format

I have a date time format where dates are represented as integers from 1/1/1900 .
For example: 1 is 1/1/1900 and 42998 is 20/9/2017.
How can I convert this format to a human readable format like dd/mm/yyyy ? I checked datetime documentation but I did not find any way to do this. I want to do this either on python 3 or 2.7.
Thanks for any suggestion.
You can define your dates as offsets from your basetime and construct a datetime:
In[22]:
import datetime as dt
dt.datetime(1900,1,1) + dt.timedelta(42998)
Out[22]: datetime.datetime(2017, 9, 22, 0, 0)
Once it's a datetime object you can convert this to a str via strftime using whatever format you desire:
In[24]:
(dt.datetime(1900,1,1) + dt.timedelta(42998-1)).strftime('%d/%m/%Y')
Out[24]: '21/09/2017'
So you can define a user func to do this:
In[27]:
def dtToStr(val):
base = dt.datetime(1900,1,1)
return (base + dt.timedelta(val-1)).strftime('%d/%m/%Y')
dtToStr(42998)
Out[27]: '21/09/2017'
import datetime
base_date = datetime.datetime(1900, 1, 1)
convert = lambda x: base_date + datetime.timedelta(days=x-1)
>>> convert(42998)
datetime.datetime(2017, 9, 21, 0, 0)
You can use a datetime object to do that.
import datetime
d = datetime.datetime(1900, 1, 1, 0, 0)
d + datetime.timedelta(days = 42998)
>> datetime.datetime(2017, 9, 22, 0, 0)

Time tuple to a datetime

I am trying to convert a timestamp tuple from dpkt to a datetime instance.
The timestamp looks like (seconds, microseconds). This is what I am currently doing, but it seems overkill:
from datetime import datetime as dt
ts = (1296770576, 247792)
ts_list = [str(item) for item in ts]
ts_list[1] = ts_list[1].zfill(6) #make sure we have 6 digits
ts_str = ".".join(ts_list)
ts_float = float(ts_str)
ts_dt = dt.fromtimestamp(ts_float)
Is there a simpler way?
Just use the seconds part, then update the datetime object with the microseconds part, using the .replace() method:
dt.fromtimestamp(ts[0]).replace(microsecond=ts[1])
Demo:
>>> from datetime import datetime as dt
>>> ts = (1296770576, 247792)
>>> dt.fromtimestamp(ts[0]).replace(microsecond=ts[1])
datetime.datetime(2011, 2, 3, 23, 2, 56, 247792)
If you did ever have to convert your (seconds, microseconds) tuple to a float timestamp, just use floating-point division instead:
>>> ts_float = float(ts[0]) + float(ts[1]) / 1000000
>>> dt.fromtimestamp(ts_float)
datetime.datetime(2011, 2, 3, 23, 2, 56, 247792)

Extracting time from a line in Python [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Converting string into datetime
I got log entries like:
2013-01-09 06:13:51,464 DEBUG module 159 Djang...
What is the shortest (best) way to extract the date from this string?
Do you need to keep the microsecond?
>>> import re
>>> log = "2013-01-09 06:13:51,464 DEBUG module"
>>> p = re.compile("\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d*")
>>> time_str = p.findall(log)[0]
>>> time_str
'2013-01-09 06:13:51,464'
>>> from datetime import datetime
>>> date_time = datetime.strptime(time_str, '%Y-%m-%d %H:%M:%S,%f')
>>> date_time
datetime.datetime(2013, 1, 9, 6, 13, 51, 464000)
from datetime import datetime
val = '2013-01-09 06:13:51,464'.split(',')[0] # Remove milliseconds
date_object = datetime.strptime(val, '%Y-%m-%d %H:%M:%S')
>>> a = "2013-01-09 06:13:51,464 DEBUG module"
>>> a = a.split(" ")
>>> date,time = a[0], a[1]
>>> date = date.split("-")
>>> time = time.split(",")[0].split(":")
>>> date
['2013', '01', '09']
>>> time
['06', '13', '51']
>>> args_list = [int(i) for i in date]
>>> args_list.extend( [int(i) for i in time])
>>> args_list
[2013, 1, 9, 6, 13, 51]
>>> import datetime
>>> datetime.datetime(*args_list)
datetime.datetime(2013, 1, 9, 6, 13, 51)

How do I find missing dates in a list of sorted dates?

In Python how do I find all the missing days in a sorted list of dates?
using sets
>>> from datetime import date, timedelta
>>> d = [date(2010, 2, 23), date(2010, 2, 24), date(2010, 2, 25),
date(2010, 2, 26), date(2010, 3, 1), date(2010, 3, 2)]
>>> date_set = set(d[0] + timedelta(x) for x in range((d[-1] - d[0]).days))
>>> missing = sorted(date_set - set(d))
>>> missing
[datetime.date(2010, 2, 27), datetime.date(2010, 2, 28)]
>>>
Sort the list of dates and iterate over it, remembering the previous entry. If the difference between the previous and current entry is more than one day, you have missing days.
Here's one way to implement it:
from datetime import date, timedelta
from itertools import tee, izip
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = tee(iterable)
b.next()
return izip(a, b)
def missing_dates(dates):
for prev, curr in pairwise(sorted(dates)):
i = prev
while i + timedelta(1) < curr:
i += timedelta(1)
yield i
dates = [ date(2010, 1, 8),
date(2010, 1, 2),
date(2010, 1, 5),
date(2010, 1, 1),
date(2010, 1, 7) ]
for missing in missing_dates(dates):
print missing
Output:
2010-01-03
2010-01-04
2010-01-06
Performance is O(n*log(n)) where n is the number of days in the span when the input is unsorted. As your list is already sorted, it will run in O(n).
>>> from datetime import datetime, timedelta
>>> date_list = [datetime(2010, 2, 23),datetime(2010, 2, 24),datetime(2010, 2, 25),datetime(2010, 2, 26),datetime(2010, 3, 1),datetime(2010, 3, 2)]
>>>
>>> date_set=set(date_list) # for faster membership tests than list
>>> one_day = timedelta(days=1)
>>>
>>> test_date = date_list[0]
>>> missing_dates=[]
>>> while test_date < date_list[-1]:
... if test_date not in date_set:
... missing_dates.append(test_date)
... test_date += one_day
...
>>> print missing_dates
[datetime.datetime(2010, 2, 27, 0, 0), datetime.datetime(2010, 2, 28, 0, 0)]
This also works for datetime.date objects, but the OP says the list is datetime.datetime objects
USING A FOR LOOP
The imports you'll need:
import datetime
from datetime import date, timedelta
Let's say you have a sorted list called dates with several missing dates in it.
First select the first and last date:
start_date = dates[0]
end_date = dates[len(dates)-1]
Than count the number of days between these two dates:
numdays = (end_date - start_date).days
Than create a new list with all dates between start_date and end_date:
all_dates = []
for x in range (0, (numdays+1)):
all_dates.append(start_date + datetime.timedelta(days = x))
Than check with dates are in all_dates but not in dates by using a for loop with range and adding these dates to dates_missing:
dates_missing = []
for i in range (0, len(all_dates)):
if (all_dates[i] not in dates):
dates_missing.append(all_dates[i])
else:
pass
Now you'll have a list called dates_missing with all the missing dates.
Put the dates in a set and then iterate from the first date to the last using datetime.timedelta(), checking for containment in the set each time.
Here's an example for a pandas dataframe with a date column. If it's an index, then change df.Date to df.index.
import pandas as pd
df.Date = pd.to_datetime(df.Date) # ensure datetime format of date column
min_dt = df.Date.min() # get lowest date
max_dt = df.Date.max() # get highest date
dt_range = pd.date_range(min_dt, max_dt) # get all requisite dates in range
missing_dts = [d for d in dt_range if d not in df.Date] # list missing
print("There are {n} missing dates".format(n=len(missing_dts)))
import datetime
DAY = datetime.timedelta(days=1)
# missing dates: a list of [start_date, end)
missing = [(d1+DAY, d2) for d1, d2 in zip(dates, dates[1:]) if (d2 - d1) > DAY]
def date_range(start_date, end, step=DAY):
d = start_date
while d < end:
yield d
d += step
missing_dates = [d for d1, d2 in missing for d in date_range(d1, d2)]
Using a list comprehension
>>> from datetime import date, timedelta
>>> d = [date(2010, 2, 23),date(2010, 2, 24),date(2010, 2, 25),date(2010, 2, 26),date(2010, 3, 1),date(2010, 3, 2)]
>>> date_set=set(d)
>>> missing = [x for x in (d[0]+timedelta(x) for x in range((d[-1]-d[0]).days)) if x not in date_set]
>>> missing
[datetime.date(2010, 2, 27), datetime.date(2010, 2, 28)]
A good way of getting this done in Python is as follows. You need not worry about efficiency unless you have dates from multiple years in your list and this code always needs to run as per user interaction and yield output immediately.
Get missing dates from one list (sorted or not)
Create a function that gives you all the dates from start_date to end_date. And use it.
import datetime
def get_dates(start_date, end_date):
span_between_dates = range(end_date - start_date).days
for index in span_between_dates + 1:
# +1 is to make start and end dates inclusive.
yield start_date + datetime.timedelta(index)
my_date_list = ['2017-03-05', '2017-03_07', ...]
# Edit my_date_list as per your requirement.
start_date = min(my_date_list)
end_date = max(my_date_list)
for current_date in get_dates(start_date, end_date)
if date not in my_date_list:
print date
Get missing or overlapping dates between two date ranges.
get_dates function should be defined.
my_other_date_list = [] # your other date range
start_date = min(my_date_list)
end_date = max(my_date_list)
for current_date in get_dates(start_date, end_date)
if (date in my_date_range) and (date in my_other_date_list):
print ('overlapping dates between 2 lists:')
print date
elif (date in my_date_range) and (date not in my_other_date_list):
print ('missing dates:')
print date

Categories

Resources