How to extract multiple time from same string in Python? - python

I'm trying to extract time from single strings where in one string there will be texts other than only time. An example is s = 'Dates : 12/Jul/2019 12/Aug/2019, Loc : MEISHAN BRIDGE, Time : 06:00 17:58'.
I've tried using datefinder module like this :
from datetime import datetime as dt
import datefinder as dfn
for m in dfn.find_dates(s):
print(dt.strftime(m, "%H:%M:%S"))
Which gives me this :
17:58:00
In this case the time "06:00" is missed out. Now if I try without datefinder with only datetime module like this :
dt.strftime(s, "%H:%M")
It notifies me that the input must be a datetime object already, not a string with the following error :
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: descriptor 'strftime' requires a 'datetime.date' object but received a 'str'
So I tried to use dateutil module to parse this string s to a datetime object with this :
from dateutil.parser import parse
parse(s)
but, now it now says that my string is not in proper format (which in most cases will not be in any fixed format), showing me this error :
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/michael/anaconda3/envs/sec_img/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 1358, in parse
return DEFAULTPARSER.parse(timestr, **kwargs)
File "/home/michael/anaconda3/envs/sec_img/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 649, in parse
raise ValueError("Unknown string format:", timestr)
ValueError: ('Unknown string format:', '12/Jul/2019 12/Aug/2019 MEISHAN BRIDGE 06:00 17:58')
I have thought of getting the time with regex like
import re
p = r"\d{2}\:\d{2}"
times = [i.group() for i in re.finditer(p, s)]
# Gives me ['06:00', '17:58']
But doing this way will need me to check again whether this regex matched chunks are actually time or not because even "99:99" could be regex matched rightly and told as time wrongly. Is there any work around without regex to get all the times from a single string?
Please note that the string might contain or might not contain any date, but it will contain a time always. Even if it contains date, the date format might be anything on earth and also this string might or might not contain other irrelevant texts.

I don't see many options here, so I would go with a heuristic. I would run the following against the whole dataset and extend the config/regexes until it covers all/most of the cases:
import re
import logging
from datetime import datetime as dt
s = 'Dates : 12/Jul/2019 12/08/2019, Loc : MEISHAN BRIDGE, Time : 06:00 17:58:59'
SUPPORTED_DATE_FMTS = {
re.compile(r"(\d{2}/\w{3}/\d{4})"): "%d/%b/%Y",
re.compile(r"(\d{2}/\d{2}/\d{4})"): "%d/%m/%Y",
re.compile(r"(\d{2}/\w{3}\w+/\d{4})"): "%d/%B/%Y",
# Capture more here
}
SUPPORTED_TIME_FMTS = {
re.compile(r"((?:[0-1][0-9]|2[0-4]):[0-5][0-9])[^:]"): "%H:%M",
re.compile(r"((?:[0-1][0-9]|2[0-4]):[0-5][0-9]:[0-5][0-9])"): "%H:%M:%S",
# Capture more here
}
def extract_supported_dt(config, s):
"""
Loop thru the given config (keys are regexes, values are date/time format)
and attempt to gather all valid data.
"""
valid_data = []
for regex, fmt in config.items():
# Extract what you think looks like date
valid_ish_data = regex.findall(s)
if not valid_ish_data:
continue
print("Checking " + str(valid_ish_data))
# validate it
for d in valid_ish_data:
try:
valid_data.append(dt.strptime(d, fmt))
except ValueError:
pass
return valid_data
# Handle dates
dates = extract_supported_dt(SUPPORTED_DATE_FMTS, s)
# Handle times
times = extract_supported_dt(SUPPORTED_TIME_FMTS, s)
print("Found dates: ")
for date in dates:
print("\t" + str(date.date()))
print("Found times: ")
for t in times:
print("\t" + str(t.time()))
Example output:
Checking ['12/Jul/2019']
Checking ['12/08/2019']
Checking ['06:00']
Checking ['17:58:59']
Found dates:
2019-07-12
2019-08-12
Found times:
06:00:00
17:58:59
This is a trial and error approach but I do not think there is an alternative in your case. Thus my goal here is to make it as easy as possible to extend support with more date/time formats as opposed to try to find a solution that covers 100% of the data day-1. This way, the more data you run against the more complete your config will be.
One thing to note is that you will have to detect strings that appear to have no dates and log them somewhere. Later you will need to manually revise and see if something that was missed could be captured.
Now, assuming that your data are being generated by another system, sooner or later you will be able to match 100% of it. If the data input is from human, then you will probably never manage to get 100%! (people tend to make spelling mistakes and sometimes import random stuff... date=today :) )

How to extract multiple time from same string in Python?
If you need only time this regex should work fine
r"[0-2][0-9]\:[0-5][0-9]"
If there could be spaces in time like 23 : 59 use this
r"[0-2][0-9]\s*\:\s*[0-5][0-9]"

Use Regex But Something Like This,
(?=[0-1])[0-1][0-9]\:[0-5][0-9]|(?=2)[2][0-3]\:[0-5][0-9]
This Matched
00:00, 00:59 01:00 01:59 02:00 02: 59
09:00 10:00 11:59 20:00 21:59 23:59
Not work for
99:99 23:99 01:99
Check Here Dude if it works for You
Check on Repl.it

you could use dictionaries:
my_dict = {}
for i in s.split(', '):
m = i.strip().split(' : ', 1)
my_dict[m[0]] = m[1].split()
my_dict
Out:
{'Dates': ['12/Jul/2019', '12/Aug/2019'],
'Loc': ['MEISHAN', 'BRIDGE'],
'Time': ['06:00', '17:58']}

Related

Detecting date format and converting them to MM-DD-YYYY using Python3

I am trying to convert the date formats and make them uniform throughout the document using Python 3.6.
Here is the sample of the dates in my document:(There can be other formats as the document is large.)
9/21/1989
19640430
6/27/1980
5/11/1987
Mar 12 1951
2 aug 2015
I have checked the datetime lbrary. But could not understand hoow to detect and change the format of the dates automatically. Here is what I have checked till now:
>>> from datetime import datetime
>>> oldformat = '20140716'
>>> datetimeobject = datetime.strptime(oldformat,'%Y%m%d')
>>> newformat = datetimeobject.strftime('%m-%d-%Y')
>>> print (newformat)
07-16-2014
But I am not getting how I can make the program detect the date patterns automatically and convert them to one single uniform pattern of dates as mm/dd/yyyy
Kindly, suggest what I need to do, so as to achieve my goal using Python 3.6.
There is no universal Python way of doing this, but I'd recommend using regex to identify the type and then converting it correctly:
Example Python
import re
from datetime import datetime
with open("in.txt","r") as fi, open("out.txt","w") as fo:
for line in fi:
line = line.strip()
dateObj = None
if re.match(r"^\d{8}$", line):
dateObj = datetime.strptime(line,'%Y%m%d')
elif re.match(r"^\d{1,2}/", line):
dateObj = datetime.strptime(line,'%m/%d/%Y')
elif re.match(r"^[a-z]{3}", line, re.IGNORECASE):
dateObj = datetime.strptime(line,'%b %d %Y')
elif re.match(r"^\d{1,2} [a-z]{3}", line, re.IGNORECASE):
dateObj = datetime.strptime(line,'%d %b %Y')
fo.write(dateObj.strftime('%m-%d-%Y') + "\n")
Example Input
9/21/1989
19640430
6/27/1980
5/11/1987
Mar 12 1951
2 aug 2015
Example Output
09-21-1989
04-30-1964
06-27-1980
05-11-1987
03-12-1951
08-02-2015
I have tried using the dateutil library in my code to detect the date strings in any format. and then used the datetime library to convert it into the appropriate format.
Here is the code:
>>> import dateutil.parser
>>> yourdate = dateutil.parser.parse("May 24 2016")
>>>
>>> print(yourdate)
2016-05-24 00:00:00
>>> from datetime import datetime
>>> oldformat = yourdate
>>> datetimeobject = datetime.strptime(str(oldformat),'%Y-%m-%d %H:%M:%S')
>>> newformat = datetimeobject.strftime('%m-%d-%Y')
>>> print (newformat)
05-24-2016
This works.
See the image of the output:
(There can be other formats as the document is large.)
Unfortunately, Python does not provide "guess what I mean" functionality (although you might be able to repurpose GNU date for that, as it is quite flexible). You will have to make a list of all of the formats you want to support, and then try each in turn (using datetime.strptime() as you've shown) until one of them works.
Python does not try to guess because, in an international context, it is not generally possible to divine what the user wants. In the US, 2/3/1994 means "February 3rd, 1994," but in Europe the same string means "The 2nd of March, 1994." Python deliberately abstains from this confusion.

ValueError: Unknown string format in Python?

I'm trying to parse a basic iso formatted datetime string in Python, but I'm having a hard time doing that. Consider the following example:
>>> import json
>>> from datetime import datetime, date
>>> import dateutil.parser
>>> date_handler = lambda obj: obj.isoformat()
>>> the_date = json.dumps(datetime.now(), default=date_handler)
>>> print the_date
"2017-02-18T22:14:09.915727"
>>> print dateutil.parser.parse(the_date)
Traceback (most recent call last):
File "<input>", line 1, in <module>
print dateutil.parser.parse(the_date)
File "/usr/local/lib/python2.7/site-packages/dateutil/parser.py", line 1168, in parse
return DEFAULTPARSER.parse(timestr, **kwargs)
File "/usr/local/lib/python2.7/site-packages/dateutil/parser.py", line 559, in parse
raise ValueError("Unknown string format")
ValueError: Unknown string format
I've also tried parsing this using the regular strptime:
>>> print datetime.strptime(the_date, '%Y-%m-%dT%H:%M:%S')
# removed rest of the error output
ValueError: time data '"2017-02-18T22:11:58.125703"' does not match format '%Y-%m-%dT%H:%M:%S'
>>> print datetime.strptime(the_date, '%Y-%m-%dT%H:%M:%S.%f')
# removed rest of the error output
ValueError: time data '"2017-02-18T22:11:58.125703"' does not match format '%Y-%m-%dT%H:%M:%S.%f'
Does anybody know how on earth I can parse this fairly simple datetime format?
note the error message:
ValueError: time data '"2017-02-18T22:11:58.125703"'
There are single quotes + double quotes which means that the string actually contains double quotes. That's because json serialization adds double quotes to strings.
you may want to strip the quotes around your string:
datetime.strptime(the_date.strip('"'), '%Y-%m-%dT%H:%M:%S.%f')
or, maybe less "hacky", de-serialize using json.loads:
datetime.strptime(json.loads(the_date), '%Y-%m-%dT%H:%M:%S.%f')

formatting date, time and string for filename

I want to create a csv file with a filename of the following format:
"day-month-year hour:minute-malware_scan.csv"
Example:" 6-8-2016 21:45-malware_scan.csv"
The first part of the filename is formed by the actual date and time at file creation time, instead "-malware_scan.csv" is a fixed string.
I know that in order to get the date and time I should use the time or datetime module and the strftime() function for formatting.
At first I tried with:
t = datetime.datetime.now()
formatted_time = t.strftime(%d-%m-%y %H:%M)
filename = formatted_time + "-malware_scan.csv"
with open(filename, "a") as f:
...............
I didn't get the expected result, so I tried another way:
i = datetime.datetime.now()
file_to_open = "{day}-{month}-{year} {hour}:{minute}-malware_scan.csv".format(day = i.day, month = i.month, year = i.year, hour = i.hour, minute = i.minute)
with open(file_to_open, "a") as f:
.......................
Also using the code above I don't get the expected result.
I get a filename of this kind: "6-8-2016 21". Day, month, year and hour is displayed but the minutes and the rest of the string (-malware_scan.csv) isn't diplayed.
I'm focusing only on the filename with this question, not on the csv writing itself, whose code is omitted.
The : character is not allowed for filenames on PC. You could discard the : separator entirely:
>>> from datetime import datetime
>>> t = datetime.now()
>>> formatted_time = t.strftime('%d-%m-%y %H%M')
>>> formatted_time
'06-08-16 2226'
>>> datetime.strptime(formatted_time, '%d-%m-%y %H%M')
datetime.datetime(2016, 8, 6, 22, 26)
Or replace that character with an underscore or hyphen.
Thanks to Moses Koledoye for spotting the problem. I was thinking I made a mistake in the Python code, but actually the problem was the characters of the filename.
According to MSDN the following are reserved characters that cannot be used in a filename on Windows:
< (less than)
> (greater than)
: (colon)
" (double quote)
/ (forward slash)
\ (backslash)
| (vertical bar or pipe)
? (question mark)
* (asterisk)

How to convert a string representing a time range into a struct_time object?

You can parse a string representing a time, in Python, by using the strptime method. There are numerous working examples on stackoverflow:
Converting string into datetime
However, what if your string represented a time range, as opposed to a specific time; how could you parse the string using the strptime method?
For example, let’s say you have a user input a start and finish time.
studyTime = input("Please enter your study period (start time – finish time)")
You could prompt, or even force, the user to enter the time in a specific format.
studyTime = input("Please enter your study period (hh:mm - hh:mm): ")
Let’s say the user enters 03:00 PM – 05:00 PM. How can we then parse this string using strptime?
formatTime = datetime.datetime.strptime(studyTime, "%I:%M %p")
The above formatTime would only work on a single time, i.e. 03:00 PM, not a start – finish time, 03:00 – 05:00. And the following would mean excess format data and a ValueError would be raised.
formatTime = datetime.datetime.strptime(studyTime, “%I:%M %p - %I:%M %p”)
Of course there are alternatives, such as having the start and finish times as separate strings. However, my question is specifically, is there a means to parse one single string, that contains more than one time representation, using something akin to the below.
formatTime = datetime.datetime.strptime(studyTime, “%I:%M %p - %I:%M %p”)
strptime() can only parse a single datetime string representation.
You have to split the input string by - and load each item with strptime():
>>> from datetime import datetime
>>>
>>> s = "03:00 PM - 05:00 PM"
>>> [datetime.strptime(item, "%I:%M %p") for item in s.split(" - ")]
[datetime.datetime(1900, 1, 1, 15, 0), datetime.datetime(1900, 1, 1, 17, 0)]
Also checked the popular third-parties: dateutil, delorean and arrow - don't think they provide a datetime range parsing functionality. The dateutil's fuzzy_with_tokens() looked promising, but it is throwing errors:
>>> from dateutil.parser import parse
>>> s = "03:00 PM - 05:00 PM"
>>> parse(s, fuzzy_with_tokens=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/dateutil/parser.py", line 1008, in parse
return DEFAULTPARSER.parse(timestr, **kwargs)
File "/Users/user/.virtualenvs/so/lib/python2.7/site-packages/dateutil/parser.py", line 390, in parse
res, skipped_tokens = self._parse(timestr, **kwargs)
TypeError: 'NoneType' object is not iterable
which probably means it is not supposed to parse multiple datetimes too.

Formatting Time as %d-%m-%y

am trying to print orig_time as 6/9/2013 and running into following error..can anyone provide inputs on what is wrong here
Code:
orig_time="2013-06-09 00:00:00"
Time=(orig_time.strftime('%m-%d-%Y'))
print Time
Error:-
Traceback (most recent call last):
File "date.py", line 2, in <module>
Time=(orig_time.strftime('%m-%d-%Y'))
AttributeError: 'str' object has no attribute 'strftime'
You cannot use strftime on a string as it is not a method of string, one way to do this is by using the datetime module:
>>> from datetime import datetime
>>> orig_time="2013-06-09 00:00:00"
#d is a datetime object
>>> d = datetime.strptime(orig_time, '%Y-%m-%d %H:%M:%S')
Now you can use either string formatting:
>>> "{}/{}/{}".format(d.month,d.day,d.year)
'6/9/2013'
or datetime.datetime.strftime:
>>> d.strftime('%m-%d-%Y')
'06-09-2013'
>>> import time
>>> time.strftime("%m-%d-%y",time.strptime("2013-06-09 00:00:00","%Y-%m-%d %H:%M:%S"))
'06-09-13'
but its more of a pain to remove leading zeros ... If you need that use the other answer
also its unclear what you want, since your title says one thing (and your format string), but it does not match what you say is your expected output in the question
import time
now1 = time.strftime("%d-%m-%y", time.localtime())
print(now1)
If you'd prefer, you can change the "d-m-y" as you want and can use a static time.
import time
orig_time="2013-06-09 00:00:00"
Time= time.strftime(orig_time, '%m-%d-%Y')
print Time

Categories

Resources