Extracting time with regex from a string

Extracting time with regex from a string - python

I have scraped some data and there are some hours that have time in 12 hours format. The string is like this: Mon - Fri:,10:00 am - 7:00 pm. So i need to extract the times 10:00 am and 7:00 pm and then convert them to 24 hour format. Then the final string I want to make is like this:
Mon - Fri:,10:00 - 19:00
Any help would be appreciated in this regard. I have tried the following:
import re
txt = 'Mon - Fri:,10:00 am - 7:00 pm'
data = re.findall(r'\s(\d{2}\:\d{2}\s?(?:AM|PM|am|pm))', txt)
print(data)
But this regex and any other that I tried to use didn't do the task.

Your regex enforces a whitespace before the leading digit which prevents ,10:00 am from matching and requires two digits before the colon which fails to match 7:00 pm. r"(?i)(\d?\d:\d\d (?:a|p)m)" seems like the most precise option.
After that, parse the match using datetime.strptime and convert it to military using the "%H:%M" format string. Any invalid times like 10:67 will raise a nice error (if you anticipate strings that should be ignored, adjust the regex to strictly match 24-hour times).
import re
from datetime import datetime
def to_military_time(x):
return datetime.strptime(x.group(), "%I:%M %p").strftime("%H:%M")
txt = "Mon - Fri:,10:00 am - 7:00 pm"
data = re.sub(r"(?i)(\d?\d:\d\d (?:a|p)m)", to_military_time, txt)
print(data) # => Mon - Fri:,10:00 - 19:00

Your regex looks only for two digit hours (\d{2}) with white space before them (\s). The following captures also one digit hours, with a possible comma instead of the space.
data = re.findall(r'[\s,](\d{1,2}\:\d{2}\s?(?:AM|PM|am|pm))', txt)
However, you might want to consider all punctuation as valid:
data = re.findall(r'[\s!"#$%&\'\(\)*+,-./:;\<=\>?#\[\\\]^_`\{|\}~](\d{1,2}\:\d{2}\s?(?:AM|PM|am|pm))', txt)

Regex need to change like here.
import re
text = 'Mon - Fri:,10:00 am - 7:00 pm'
result = re.match(r'\D* - \D*:,([\d\s\w:]+) - ([\d\s\w:]+)', text)
print(result.group(1))
# it will print 10:00 am
print(result.group(2))
# it will print 7:00 pm
You need some thing like '+' and '*' to tell regex to get multiple word, if you only use \s it will only match one character.
You can learn more regex here.
https://regexr.com/
And here you can try regex online.
https://regex101.com/

Why not use the time module?
import time
data = "Mon - Fri:,10:00 am - 7:00 pm"
parts = data.split(",")
days = parts[0]
hours = parts[1]
parts = hours.split("-")
t1 = time.strptime(parts[0].strip(), "%I:%M %p")
t2 = time.strptime(parts[1].strip(), "%I:%M %p")
result = days + "," + time.strftime("%H:%M", t1) + " - " + time.strftime("%H:%M", t2)
Output:
Mon - Fri:,10:00 - 19:00

Related

Python: Extract time frame from string including time format (am/pm)

Writing Python code. Below are some sample strings containing day and time range
'Mon-Thu, Sun 11:30 am - 9 pm'
'Sat-Sun 9:30 am - 9:30 pm'
'Sat 5 pm - 9 pm'
I want to extract the time range from above including (am/pm) which will be
11:30am-9pm
9:30am-9:30pm
5pm-9pm
which I'm later planning to convert into 24hr format.
Things I tried which didn't work
# try-1
re.findall(r"(?i)(\d?\d:\d\d (?:a|p)m)", st)
# try-2
re.findall(r"(?i)((\d?\d:\d\d|\d) (?:a|p)m)", st)
I'm not good at regex. Any help using regex or any other way will be appreciated.

You were on the right track. Consider this version:
inp = ['Mon-Thu, Sun 11:30 am - 9 pm', 'Sat-Sun 9:30 am - 9:30 pm', 'Sat 5 pm - 9 pm']
output = [re.findall(r'\d{1,2}(?::\d{2})? [ap]m - \d{1,2}(?::\d{2})? [ap]m', x)[0] for x in inp]
print(output) # ['11:30 am - 9 pm', '9:30 am - 9:30 pm', '5 pm - 9 pm']

You can use
\b\d?\d(?::\d\d)?\s*[ap]m\s*-\s*\d?\d(?::\d\d)?\s*[ap]m\b
See the regex demo. Details:
(?i) - case insensitive flag
\b - a word boundary
\d?\d - one or two digits
(?::\d\d)? - an optional occurrence of : and two digits
\s* - zero or more whitespaces
[ap]m - a or p and then m
\s*-\s* - - enclosed with zero or more whitespaces
\d?\d(?::\d\d)?\s*[ap]m - same pattern as above
\b - word boundary
See a Python demo:
import re
text = '''Mon-Thu, Sun 11:30 am - 9 pm'
'Sat-Sun 9:30 am - 9:30 pm'
'Sat 5 pm - 9 pm'''
time_rx = r'\d?\d(?::\d\d)?\s*[ap]m'
matches = re.findall(fr'\b{time_rx}\s*-\s*{time_rx}\b', text)
for match in matches:
print("".join(match.split()))
Output:
11:30am-9pm
9:30am-9:30pm
5pm-9pm

I don't know much about regex, but I can suggest this:
str = 'Mon-Thu, Sun 11:30 am - 9 pm'
for i in range(len(str)):
if str[i].isdigit():
new_str = str[i:]
break
print(new_str)
This code is fine if you write code without them.

How to convert strings with UTC-# at the end to a DateTimeField in Python?

I have a list of strings formatted as follows: '2/24/2021 3:37:04 PM UTC-6'
How would I convert this?
I have tried
datetime.strptime(my_date_object, '%m/%d/%Y %I:%M:%s %p %Z')
but I get an error saying "unconverted data remains: -6"
Is this because of the UTC-6 at the end?

The approach that #MrFuppes mentioned in their comment is the easiest way to do this.
Ok seems like you need to split the string on 'UTC' and parse the offset separately. You can then set the tzinfo from a timedelta
input_string = '2/24/2021 3:37:04 PM UTC-6'
try:
dtm_string, utc_offset = input_string.split("UTC", maxsplit=1)
except ValueError:
# Couldn't split the string, so no "UTC" in the string
print("Warning! No timezone!")
dtm_string = input_string
utc_offset = "0"
dtm_string = dtm_string.strip() # Remove leading/trailing whitespace '2/24/2021 3:37:04 PM'
utc_offset = int(utc_offset) # Convert utc offset to integer -6
tzinfo = tzinfo = datetime.timezone(datetime.timedelta(hours=utc_offset))
result_datetime = datetime.datetime.strptime(dtm_string, '%m/%d/%Y %I:%M:%S %p').replace(tzinfo=tzinfo)
print(result_datetime)
# prints 2021-02-24 15:37:04-06:00
Alternatively, you can avoid using datetime.strptime if you extract the relevant components pretty easily with regular expressions
rex = r"(\d{1,2})\/(\d{1,2})\/(\d{4}) (\d{1,2}):(\d{2}):(\d{2}) (AM|PM) UTC(\+|-)(\d{1,2})"
input_string = '2/24/2021 3:37:04 PM UTC-6'
r = re.findall(rex, input_string)
# gives: [('2', '24', '2021', '3', '37', '04', 'PM', '-', '6')]
mm = int(r[0][0])
dd = int(r[0][1])
yy = int(r[0][2])
hrs = int(r[0][3])
mins = int(r[0][4])
secs = int(r[0][5])
if r[0][6].upper() == "PM":
hrs = hrs + 12
tzoffset = int(f"{r[0][7]}{r[0][8]}")
tzinfo = datetime.timezone(datetime.timedelta(hours=tzoffset))
result_datetime = datetime.datetime(yy, mm, dd, hrs, mins, secs, tzinfo=tzinfo)
print(result_datetime)
# prints 2021-02-24 15:37:04-06:00
The regular expression (\d{1,2})\/(\d{1,2})\/(\d{4}) (\d{1,2}):(\d{2}):(\d{2}) (AM|PM) UTC(\+|-)(\d{1,2})
Demo
Explanation:
(\d{1,2}): One or two digits. Surrounding parentheses indicate that this is a capturing group. A similar construct is used to get the month, date and hours, and UTC offset
\/: A forward slash
(\d{4}): Exactly four digits. Also a capturing group. A similar construct is used for minutes and seconds.
(AM|PM): Either "AM" or "PM"
UTC(\+|-)(\d{1,2}): "UTC", followed by a plus or minus sign, followed by one or two digits.

Extract strings that follow changing time strings

So I've been trying to extract the strings that follow the "dot" character in the text file, but only for lines that follow the pattern as below, that is, after the date and time:
09 May 2018 10:37AM • 6PR, Perth (Mornings)
The problem is for each of those lines, the date and time would change so the only common pattern is that there would be AM or PM right before the "dot".
However, if I search for "AM" or "PM" it wouldn't recognize the lines because the "AM" and "PM" are attached to the time.
This is my current code:
for i,s in enumerate(open(file)):
for words in ['PM','AM']:
if re.findall(r'\b' + words + r'\b', s):
source=s.split('•')[0]
Any idea how to get around this problem? Thank you.

I guess your regex is the problem here.
for i, s in enumerate(open(file)):
if re.findall(r'\d{2}[AP]M', s):
source = s.split('•')[0]
# 09 May 2018 10:37AM

If you are trying to extract the datetime try using regex.
Ex:
import re
s = "09 May 2018 10:37AM • 6PR, Perth (Mornings)"
m = re.search("(?P<datetime>\d{2}\s+(January|February|March|April|May|June|July|August|September|October|November|December)\s+\d{4}\s+\d{2}\:\d{2}(AM|PM))", s)
if m:
print m.group("datetime")
Output:
09 May 2018 10:37AM

Split string using regular expression in python

I have such string
Sale: \t\t\t5 Jan \u2013 10 Jan
I want to extract the start and the end of the sale. Very straightforward approach would be to make several spilts, but I want to that using regular expressions.
As the result I want to get
start = "5 Jan"
end = "10 Jan"
Is it possible to do that using regex?

This should help.
import re
s = "Sale: \t\t\t5 Jan \u2013 10 Jan"
f = re.findall(r"\d+ \w{3}", s)
print f
Output:
['5 Jan', '10 Jan']

This may not be an optimised one but works assuming the string pattern remains the same.
import re
s = 'Sale: \t\t\t5 Jan \u2013 10 Jan'
start, end = re.search(r'Sale:(.*)', s).group(1).strip().replace('\u2013', ',').split(', ')
# start <- 5 Jan
# end <- 10 Jan

Convert certain numbers in a sentence such as date, time, phone number from numbers to words in Python

I am kind of new to Python so I apologize for my lacks. I have a code in python perfected with other users' help (thank you) that converts a date from numbers into words using dictionaries for days,months,years, like 3.6.2015 => march.third.two thousand fifteen using:
date = raw_input("Give date: ")
I want to input a sentence such as: "today is 3.6.2015, it is 10:00 o'clock and it's rainy" and from it I do not know how to search through the sentence for the date, or time, or phone number and to that date and time to apply the conversion.
If someone can please help, thank you.

You could use regular expressions:
import re
s = "today is 3.6.2015, it is 10:00 o'clock and it's rainy"
mat = re.search(r'(\d{1,2}\.\d{1,2}\.\d{4})', s)
date = mat.group(1)
print date # 3.6.2015
Note, if there's nothing matching this regular expression in the input text, an AttributeError will be raised, that you'll either have to prevent (e.g. if mat:) or handle.
EDIT
Assuming you can turn your conversion code into a function, you could use re.sub:
import re
def your_function(num_string):
# Whatever your function does
words_string = "march.third.two thousand fifteen"
return words_string
s = "today is 3.6.2015, it is 10:00 o'clock and it's rainy"
date = re.sub(r'(\d{1,2}\.\d{1,2}\.\d{4})', your_function, s)
print date
# today is march.third.two thousand fifteen, it is 10:00 o'clock and it's rainy
Just modify your_function to change the 3.6.2015 into march.third.two thousand fifteen.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extracting time with regex from a string - python

Related

Python: Extract time frame from string including time format (am/pm)

How to convert strings with UTC-# at the end to a DateTimeField in Python?

Extract strings that follow changing time strings

Split string using regular expression in python

Convert certain numbers in a sentence such as date, time, phone number from numbers to words in Python

Categories

Resources