Python: Extract time frame from string including time format (am/pm) - python

Writing Python code. Below are some sample strings containing day and time range
'Mon-Thu, Sun 11:30 am - 9 pm'
'Sat-Sun 9:30 am - 9:30 pm'
'Sat 5 pm - 9 pm'
I want to extract the time range from above including (am/pm) which will be
11:30am-9pm
9:30am-9:30pm
5pm-9pm
which I'm later planning to convert into 24hr format.
Things I tried which didn't work
# try-1
re.findall(r"(?i)(\d?\d:\d\d (?:a|p)m)", st)
# try-2
re.findall(r"(?i)((\d?\d:\d\d|\d) (?:a|p)m)", st)
I'm not good at regex. Any help using regex or any other way will be appreciated.

You were on the right track. Consider this version:
inp = ['Mon-Thu, Sun 11:30 am - 9 pm', 'Sat-Sun 9:30 am - 9:30 pm', 'Sat 5 pm - 9 pm']
output = [re.findall(r'\d{1,2}(?::\d{2})? [ap]m - \d{1,2}(?::\d{2})? [ap]m', x)[0] for x in inp]
print(output) # ['11:30 am - 9 pm', '9:30 am - 9:30 pm', '5 pm - 9 pm']

You can use
\b\d?\d(?::\d\d)?\s*[ap]m\s*-\s*\d?\d(?::\d\d)?\s*[ap]m\b
See the regex demo. Details:
(?i) - case insensitive flag
\b - a word boundary
\d?\d - one or two digits
(?::\d\d)? - an optional occurrence of : and two digits
\s* - zero or more whitespaces
[ap]m - a or p and then m
\s*-\s* - - enclosed with zero or more whitespaces
\d?\d(?::\d\d)?\s*[ap]m - same pattern as above
\b - word boundary
See a Python demo:
import re
text = '''Mon-Thu, Sun 11:30 am - 9 pm'
'Sat-Sun 9:30 am - 9:30 pm'
'Sat 5 pm - 9 pm'''
time_rx = r'\d?\d(?::\d\d)?\s*[ap]m'
matches = re.findall(fr'\b{time_rx}\s*-\s*{time_rx}\b', text)
for match in matches:
print("".join(match.split()))
Output:
11:30am-9pm
9:30am-9:30pm
5pm-9pm

I don't know much about regex, but I can suggest this:
str = 'Mon-Thu, Sun 11:30 am - 9 pm'
for i in range(len(str)):
if str[i].isdigit():
new_str = str[i:]
break
print(new_str)
This code is fine if you write code without them.

Related

How to split a string but keep multiple delimiters with the original chunk

Say my string is
st = 'Walking happened at 8 am breakfast happened at 9am baseball happened at 12 pm lunch happened at 1pm'
I would like to split on 'am' or 'pm', but I want those deliminters to be a part of the original chunk.
So the desired result is
splitlist = ['Walking happened at 8 am',
'breakfast happened at 9am',
'baseball happened at 12 pm',
'lunch happened at 1pm']
There are many solutions for keeping the delimiter, but keeping it as a separate item in the list like this one
In Python, how do I split a string and keep the separators?
You can use a lookbehind:
import re
splitlist = re.split(r'(?<=[ap]m)\s+', st)
Output:
['Walking happened at 8 am',
'breakfast happened at 9am',
'baseball happened at 12 pm',
'lunch happened at 1pm']
If you want to ensure having a word boundary or a digit before am/pm (i.e not splitting after words such as "program"):
import re
splitlist = re.split(r'(?:(?<=\d[ap]m)|(?<=\b[ap]m))\s+', st)
Example:
['Walking happened at 8 am',
'breakfast happened at 9am',
'baseball happened at 12 pm',
'beginning of program happened at 1pm']

Extracting time with regex from a string

I have scraped some data and there are some hours that have time in 12 hours format. The string is like this: Mon - Fri:,10:00 am - 7:00 pm. So i need to extract the times 10:00 am and 7:00 pm and then convert them to 24 hour format. Then the final string I want to make is like this:
Mon - Fri:,10:00 - 19:00
Any help would be appreciated in this regard. I have tried the following:
import re
txt = 'Mon - Fri:,10:00 am - 7:00 pm'
data = re.findall(r'\s(\d{2}\:\d{2}\s?(?:AM|PM|am|pm))', txt)
print(data)
But this regex and any other that I tried to use didn't do the task.
Your regex enforces a whitespace before the leading digit which prevents ,10:00 am from matching and requires two digits before the colon which fails to match 7:00 pm. r"(?i)(\d?\d:\d\d (?:a|p)m)" seems like the most precise option.
After that, parse the match using datetime.strptime and convert it to military using the "%H:%M" format string. Any invalid times like 10:67 will raise a nice error (if you anticipate strings that should be ignored, adjust the regex to strictly match 24-hour times).
import re
from datetime import datetime
def to_military_time(x):
return datetime.strptime(x.group(), "%I:%M %p").strftime("%H:%M")
txt = "Mon - Fri:,10:00 am - 7:00 pm"
data = re.sub(r"(?i)(\d?\d:\d\d (?:a|p)m)", to_military_time, txt)
print(data) # => Mon - Fri:,10:00 - 19:00
Your regex looks only for two digit hours (\d{2}) with white space before them (\s). The following captures also one digit hours, with a possible comma instead of the space.
data = re.findall(r'[\s,](\d{1,2}\:\d{2}\s?(?:AM|PM|am|pm))', txt)
However, you might want to consider all punctuation as valid:
data = re.findall(r'[\s!"#$%&\'\(\)*+,-./:;\<=\>?#\[\\\]^_`\{|\}~](\d{1,2}\:\d{2}\s?(?:AM|PM|am|pm))', txt)
Regex need to change like here.
import re
text = 'Mon - Fri:,10:00 am - 7:00 pm'
result = re.match(r'\D* - \D*:,([\d\s\w:]+) - ([\d\s\w:]+)', text)
print(result.group(1))
# it will print 10:00 am
print(result.group(2))
# it will print 7:00 pm
You need some thing like '+' and '*' to tell regex to get multiple word, if you only use \s it will only match one character.
You can learn more regex here.
https://regexr.com/
And here you can try regex online.
https://regex101.com/
Why not use the time module?
import time
data = "Mon - Fri:,10:00 am - 7:00 pm"
parts = data.split(",")
days = parts[0]
hours = parts[1]
parts = hours.split("-")
t1 = time.strptime(parts[0].strip(), "%I:%M %p")
t2 = time.strptime(parts[1].strip(), "%I:%M %p")
result = days + "," + time.strftime("%H:%M", t1) + " - " + time.strftime("%H:%M", t2)
Output:
Mon - Fri:,10:00 - 19:00

Find date from image/text

I have dates like this and I need regex to find these types of dates
12-23-2019
29 10 2019
1:2:2018
9/04/2019
22.07.2019
here's what I did
first I removed all spaces from the text and here's what it looks like
12-23-2019291020191:02:2018
and this is my regex
re.findall(r'((\d{1,2})([.\/-])(\d{2}|\w{3,9})([.\/-])(\d{4}))',new_text)
it can find 12-23-2019 , 9/04/2019 , 22.07.2019 but cannot find 29 10 2019 and 1:02:2018
You may use
(?<!\d)\d{1,2}([.:/ -])(?:\d{1,2}|\w{3,})\1\d{4}(?!\d)
See the regex demo
Details
(?<!\d) - no digit right before
\d{1,2} - 1 or 2 digits
([.:/ -]) - a dot, colon, slash, space or hyphen (captured in Group 1)
(?:\d{1,2}|\w{3,}) - 1 or 2 digits or 3 or more word chars
\1 - same value as in Group 1
\d{4} - four digits
(?!\d) - no digit allowed right after
Python sample usage:
import re
text = 'Aaaa 12-23-2019, bddd 29 10 2019 <=== 1:2:2018'
pattern = r'(?<!\d)\d{1,2}([.:/ -])(?:\d{1,2}|\w{3,})\1\d{4}(?!\d)'
results = [x.group() for x in re.finditer(pattern, text)]
print(results) # => ['12-23-2019', '29 10 2019', '1:2:2018']

Remove words until a specific character is reached

I'm new to python and am having difficulties to remove words in a string
9 - Saturday, 19 May 2012
above is my string I would like to remove all string to
19 May 2012
so I could easily convert it to sql date
here is the could that I tried
new_s = re.sub(',', '', '9 - Saturday, 19 May 2012')
But it only remove the "," in the String. Any help?
You can use string.split(',')
and you will get
['9 - Saturday', '19 May 2012']
You are missing the .* (matching any number of chars) before the , (and a space after it which you probably also want to remove:
>>> new_s = re.sub('.*, ', '', '9 - Saturday, 19 May 2012')
>>> new_s
'19 May 2012'
Your regex is matching a single comma only hence that is the only thing it removes.
You may use a negated character class i.e. [^,]* to match everything until you match a comma and then match comma and trailing whitespace to remove it like this:
>>> print re.sub('[^,]*, *', '', '9 - Saturday, 19 May 2012')
19 May 2012
Regex is great, but for this you could also use .split()
test_string = "9 - Saturday, 19 May 2012"
splt_string = test_string.split(",")
out_String = splt_string[1]
print(out_String)
Outputs:
19 May 2012
If the leading ' ' is a propblem, you can remedy this with out_String.lstrip()
try this
a = "9 - Saturday, 19 May 2012"
f = a.find("19 May 2012")
b = a[f:]
print(b)

Split string using regular expression in python

I have such string
Sale: \t\t\t5 Jan \u2013 10 Jan
I want to extract the start and the end of the sale. Very straightforward approach would be to make several spilts, but I want to that using regular expressions.
As the result I want to get
start = "5 Jan"
end = "10 Jan"
Is it possible to do that using regex?
This should help.
import re
s = "Sale: \t\t\t5 Jan \u2013 10 Jan"
f = re.findall(r"\d+ \w{3}", s)
print f
Output:
['5 Jan', '10 Jan']
This may not be an optimised one but works assuming the string pattern remains the same.
import re
s = 'Sale: \t\t\t5 Jan \u2013 10 Jan'
start, end = re.search(r'Sale:(.*)', s).group(1).strip().replace('\u2013', ',').split(', ')
# start <- 5 Jan
# end <- 10 Jan

Categories

Resources