How to search for regex pattern python - python

I am trying to create an error handler for an exercise where I check for correct input format. I looked at the docs and SO for examples but I am still here. I believe I am looking for: (there have been a few variations tried as well)
check_time = re.compile('^[0-1][0-9]:[0-5][0-9] ([A|a]|[P|p][M|m])')
but my test cases are failing.
Code calling for input from user:
import re
class CivilianTime:
def __init__(self):
# no error handling yet
self.civ_time = input('Enter the time in (XX:XX A/PM) format.\n')
check_time = re.compile('1[0-2]:[0-5][0-9] AM | 1[0-2]:[0-5][0-9] PM')
if check_time != self.civ_time:
self.civ_time = input('Enter the time in (XX:XX A/PM) format.\n')
# if PM, strip time to numerical values and add 1200
# if AM, strip time to numerical values
def time_converter(self):
if self.civ_time[-2] == 'P':
strip_time = self.civ_time.strip(" PM")
strip_time = strip_time.replace(':', '')
strip_time = int(strip_time) + 1200
print(strip_time)
else:
strip_time = self.civ_time.strip(' AM')
strip_time = strip_time.replace(':', '')
print(strip_time)
c = CivilianTime()
c.time_converter()
Result:
Enter the time in (XX:XX A/PM) format.
1212 am
Enter the time in (XX:XX A/PM) format.
1212pm
1212pm
I want to see it ask for the time again when the input is not in the desired format. It's running the function even when there's no space.
Unless there's a way for me to use in.

You are mis-reading the docs,
https://docs.python.org/3/library/re.html
You are on the right track. When you use or|`, you have to rewrite the entire expression. So first match 1 hour at a time and simply test all the cases in multiple lines of code. Dont try to one liner it until you completely understand regex.
12:00 AM and 11:00 AM and 10:00 AM = 1[0-2]:[0-5][0-9] AM
Now to match that for PM you have to or | the entire expression.
So, matcher = '1[0-2]:[0-5][0-9] AM | 1[0-2]:[0-5][0-9] PM'
Now match the remaining time with what you have learned! Hint: the rest of the hours start with 0.

Related

Regex for date of birth with maximum age

I am looking for a regular expression in Python which mathes for date of birth is the given format: dd.mm.YYYY
For example:
31.12.1999 (31st of December)
02.07.2021
I have provided more examples in the demo.
Dates which are older as 01.01.1920 should not match!!!
Try:
^(?:[0-2][1-9]|3[01])\.(?:0[1-9]|1[12])\.(?:19[2-9]\d|[2-9]\d{3})$
See Regex Demo
Be aware that this will not catch dates like 31.02.2021, that is, it is not sophisticated enough to know how many days are in any given month and it is hopeless to try to come up with a regex that can do that because February is problematic because the regex can't compute which years are leap years.
This will also allow future dates such as 01.01.3099 (you do want this to be work for the future, no?).
Update
You really need to be using the datetime class from the datetime package and, if you want to insist that the date and month fields contain two digits, a regex just to ensure the format:
import re
from datetime import datetime, date
validated = False # assume not validated
s = '31.03.2019'
m = re.fullmatch(r'\d{2}\.\d{2}\.\d{4}', s)
if m:
# we have ensured the correct number of digits:
try:
d = datetime.strptime(s, '%d.%m.%Y').date()
if d >= date(1920, 1, 1):
validated = True
except ValueError:
pass
print(validated)
As I said, it can be done with a very convoluted regex. However, I do not actually recommend using this, I just had fun writing it as a challenge. You should in reality use a very permissive regex and validate the ranges in code.
Demo.
# Easy dates, those <= 28th, valid for all months/years.
(0[1-9]|1[0-9]|2[0-8])\.(0[1-9]|1[0-2])\.(19[2-9][0-9]|2[0-9][0-9][0-9])
|
# Validate the 29th of Februari for 1920-1999.
29\.02\.19([3579][26]|[2468][048])
|
# Validate the 29th of Februari for 2000-2999.
29\.02\.((2[0-9])(0[48]|[13579][26]|[2468][048])|2000|2400|2800)
|
# Validate 29th and 30th.
(29|30)\.(01|0[3-9]|1[0-2])\.(19[2-9][0-9]|2[0-9][0-9][0-9])
|
# Validate 31st.
31\.(01|03|05|07|08|10|12)\.(19[2-9][0-9]|2[0-9][0-9][0-9])
\d\{2}.\d\{2}.\d{4}
Validating the value of the dates should be done at the application level.

extract 'date-related' string from a sentence-python

I am new to python, and want to find all 'date-related' words in a sentence, such as date, Monday, Tuesday, last week, next week, tomorrow, yesterday, today, etc.
For example:
input: 'Yesterday I went shopping'
return: 'Yesterday'
input: 'I will start working on Tuesday'
return: 'Tuesday'
input: 'My birthday is 1998-12-12'
return: '1998-12-12'
I find that python package 'datefinder' can find these words, but it will automatically change these words to standard datetime. However, I only want to extract these words, is there any other method or package that can do this?
Thanks for your help!
This is how I would do the logic for it, as far as getting the numbers from a string that contains digits as well I'm not sure, I would create and input that would specifically ask for digits then as I did firstSentence.lower() I would then do firstSentence = int(firstSentence) to ensure only ints passed
firstSentence = raw_input('Tell me something: ')
firstSentence = firstSentence.lower()
if 'yesterday' in firstSentence:
#now pass a function that returns date/time
pass
elif 'tuesday' in firstSentence:
#now pass a function that returns date/time
pass
else:
print 'No day found'

In Python, how to parse a string representing a set of keyword arguments such that the order does not matter

I'm writing a class RecurringInterval which - based on the dateutil.rrule object - represents a recurring interval in time. I have defined a custom, human-readable __str__ method for it and would like to also define a parse method which (similar to the rrulestr() function) parses the string back into an object.
Here is the parse method and some test cases to go with it:
import re
from dateutil.rrule import FREQNAMES
import pytest
class RecurringInterval(object):
freq_fmt = "{freq}"
start_fmt = "from {start}"
end_fmt = "till {end}"
byweekday_fmt = "by weekday {byweekday}"
bymonth_fmt = "by month {bymonth}"
#classmethod
def match_pattern(cls, string):
SPACES = r'\s*'
freq_names = [freq.lower() for freq in FREQNAMES] + [freq.title() for freq in FREQNAMES] # The frequencies may be either lowercase or start with a capital letter
FREQ_PATTERN = '(?P<freq>{})?'.format("|".join(freq_names))
# Start and end are required (their regular expressions match 1 repetition)
START_PATTERN = cls.start_fmt.format(start=SPACES + r'(?P<start>.+?)')
END_PATTERN = cls.end_fmt.format(end=SPACES + r'(?P<end>.+?)')
# The remaining tokens are optional (their regular expressions match 0 or 1 repetitions)
BYWEEKDAY_PATTERN = cls.optional(cls.byweekday_fmt.format(byweekday=SPACES + r'(?P<byweekday>.+?)'))
BYMONTH_PATTERN = cls.optional(cls.bymonth_fmt.format(bymonth=SPACES + r'(?P<bymonth>.+?)'))
PATTERN = SPACES + FREQ_PATTERN \
+ SPACES + START_PATTERN \
+ SPACES + END_PATTERN \
+ SPACES + BYWEEKDAY_PATTERN \
+ SPACES + BYMONTH_PATTERN \
+ SPACES + "$" # The character '$' is needed to make the non-greedy regular expressions parse till the end of the string
return re.match(PATTERN, string).groupdict()
#staticmethod
def optional(pattern):
'''Encloses the given regular expression in an optional group (i.e., one that matches 0 or 1 repetitions of the original regular expression).'''
return '({})?'.format(pattern)
'''Tests'''
def test_match_pattern_with_byweekday_and_bymonth():
string = "Weekly from 2017-11-03 15:00:00 till 2017-11-03 16:00:00 by weekday Monday, Tuesday by month January, February"
groups = RecurringInterval.match_pattern(string)
assert groups['freq'] == "Weekly"
assert groups['start'].strip() == "2017-11-03 15:00:00"
assert groups['end'].strip() == "2017-11-03 16:00:00"
assert groups['byweekday'].strip() == "Monday, Tuesday"
assert groups['bymonth'].strip() == "January, February"
def test_match_pattern_with_bymonth_and_byweekday():
string = "Weekly from 2017-11-03 15:00:00 till 2017-11-03 16:00:00 by month January, February by weekday Monday, Tuesday "
groups = RecurringInterval.match_pattern(string)
assert groups['freq'] == "Weekly"
assert groups['start'].strip() == "2017-11-03 15:00:00"
assert groups['end'].strip() == "2017-11-03 16:00:00"
assert groups['byweekday'].strip() == "Monday, Tuesday"
assert groups['bymonth'].strip() == "January, February"
if __name__ == "__main__":
# pytest.main([__file__])
pytest.main([__file__+"::test_match_pattern_with_byweekday_and_bymonth"]) # This passes
# pytest.main([__file__+"::test_match_pattern_with_bymonth_and_byweekday"]) # This fails
Although the parser works if you specify the arguments in the 'right' order, it is 'inflexible' in that it doesn't allow the optional arguments to be given in arbitrary order. This is why the second test fails.
What would be a way to make the parser parse the 'optional' fields in any order, such that both tests pass? (I was thinking of making an iterator with all permutations of the regular expressions and trying re.match on each one, but this does not seem like an elegant solution).
At this point, your language is getting complex enough that it's time to ditch regular expressions and learn how to use a proper parsing library. I threw this together using pyparsing, and I've annotated it heavily to try and explain what's going on, but if anything's unclear do ask and I'll try to explain.
from pyparsing import Regex, oneOf, OneOrMore
# Boring old constants, I'm sure you know how to fill these out...
months = ['January', 'February']
weekdays = ['Monday', 'Tuesday']
frequencies = ['Daily', 'Weekly']
# A datetime expression is anything matching this regex. We could split it down
# even further to get day, month, year attributes in our results object if we felt
# like it
datetime_expr = Regex(r'(\d{4})-(\d\d?)-(\d\d?) (\d{2}):(\d{2}):(\d{2})')
# A from or till expression is the word "from" or "till" followed by any valid datetime
from_expr = 'from' + datetime_expr.setResultsName('from_')
till_expr = 'till' + datetime_expr.setResultsName('till')
# A range expression is a from expression followed by a till expression
range_expr = from_expr + till_expr
# A weekday is any old weekday
weekday_expr = oneOf(weekdays)
month_expr = oneOf(months)
frequency_expr = oneOf(frequencies)
# A by weekday expression is the words "by weekday" followed by one or more weekdays
by_weekday_expr = 'by weekday' + OneOrMore(weekday_expr).setResultsName('weekdays')
by_month_expr = 'by month' + OneOrMore(month_expr).setResultsName('months')
# A recurring interval, then, is a frequency, followed by a range, followed by
# a weekday and a month, in any order
recurring_interval = frequency_expr + range_expr + (by_weekday_expr & by_month_expr)
# Let's parse!
if __name__ == '__main__':
res = recurring_interval.parseString('Daily from 1111-11-11 11:00:00 till 1111-11-11 12:00:00 by weekday Monday by month January February')
# Note that setResultsName causes everything to get packed neatly into
# attributes for us, so we can pluck all the bits and pieces out with no
# difficulty at all
print res
print res.from_
print res.till
print res.weekdays
print res.months
You have many options here, each with different downsides.
One approach would be to use a repeated alternation, like (by weekday|by month)*:
(?P<freq>Weekly)?\s+from (?P<start>.+?)\s+till (?P<end>.+?)(?:\s+by weekday (?P<byweekday>.+?)|\s+by month (?P<bymonth>.+?))*$
This will match strings of the form week month and month week, but also week week or month week month etc.
Another option would be use lookaheads, like (?=.*by weekday)?(?=.*by month)?:
(?P<freq>Weekly)?\s+from (?P<start>.+?)\s+till (?P<end>.+?(?=$| by))(?=.*\s+by weekday (?P<byweekday>.+?(?=$| by))|)(?=.*\s+by month (?P<month>.+?(?=$| by))|)
However, this requires a known delimiter (I used " by") to know how far to match. Also, it'll silently ignore any extra characters (meaning it'll match strings of the form by weekday [some gargabe] by month).

Convert certain numbers in a sentence such as date, time, phone number from numbers to words in Python

I am kind of new to Python so I apologize for my lacks. I have a code in python perfected with other users' help (thank you) that converts a date from numbers into words using dictionaries for days,months,years, like 3.6.2015 => march.third.two thousand fifteen using:
date = raw_input("Give date: ")
I want to input a sentence such as: "today is 3.6.2015, it is 10:00 o'clock and it's rainy" and from it I do not know how to search through the sentence for the date, or time, or phone number and to that date and time to apply the conversion.
If someone can please help, thank you.
You could use regular expressions:
import re
s = "today is 3.6.2015, it is 10:00 o'clock and it's rainy"
mat = re.search(r'(\d{1,2}\.\d{1,2}\.\d{4})', s)
date = mat.group(1)
print date # 3.6.2015
Note, if there's nothing matching this regular expression in the input text, an AttributeError will be raised, that you'll either have to prevent (e.g. if mat:) or handle.
EDIT
Assuming you can turn your conversion code into a function, you could use re.sub:
import re
def your_function(num_string):
# Whatever your function does
words_string = "march.third.two thousand fifteen"
return words_string
s = "today is 3.6.2015, it is 10:00 o'clock and it's rainy"
date = re.sub(r'(\d{1,2}\.\d{1,2}\.\d{4})', your_function, s)
print date
# today is march.third.two thousand fifteen, it is 10:00 o'clock and it's rainy
Just modify your_function to change the 3.6.2015 into march.third.two thousand fifteen.

Find and replace logic in Python

In python I need a logic for below scenario I am using split function to this.
I have string which contains input as show below.
"ID674021384 25/01/1986 heloo hi thanks 5 minutes and 25-01-1988."
"ID909900000 25-01-1986 hello 10 minutes."
And output should be as shown below which replace date format to "date" and time format to "time".
"ID674021384 date hello hi thanks time date."
"ID909900000 date hello time."
And also I need a count of date and time for each Id as show below
ID674021384 DATE:2 TIME:1
ID909900000 DATE:1 TIME:1
>>> import re
>>> from collections import defaultdict
>>> lines = ["ID674021384 25/01/1986 heloo hi thanks 5 minutes and 25-01-1988.", "ID909900000 25-01-1986 hello 10 minutes."]
>>> pattern = '(?P<date>\d{1,2}[/-]\d{1,2}[/-]\d{4})|(?P<time>\d+ minutes)'
>>> num_occurences = {line:defaultdict(int) for line in lines}
>>> def repl(matchobj):
num_occurences[matchobj.string][matchobj.lastgroup] += 1
return matchobj.lastgroup
>>> for line in lines:
text_id = line.split(' ')[0]
new_text = re.sub(pattern,repl,line)
print new_text
print '{0} DATE:{1[date]} Time:{1[time]}'.format(text_id, num_occurences[line])
print ''
ID674021384 date heloo hi thanks time and date.
ID674021384 DATE:2 Time:1
ID909900000 date hello time.
ID909900000 DATE:1 Time:1
For parsing similar lines of text, like log files, I often use regular expressions using the re module. Though split() would work well also for separating fields which don't contain spaces and the parts of the date, using regular expressions allows you to also make sure the format matches what you expect, and if need be warn you of a weird looking input line.
Using regular expressions, you could get the individual fields of the date and time and construct date or datetime objects from them (both from the datetime module). Once you have those objects, you can compare them to other similar objects and write new entries, formatting the dates as you like. I would recommend parsing the whole input file (assuming you're reading a file) and writing a whole new output file instead of trying to alter it in place.
As for keeping track of the date and time counts, when your input isn't too large, using a dictionary is normally the easiest way to do it. When you encounter a line with a certain ID, find the entry corresponding to this ID in your dictionary or add a new one to it if not. This entry could itself be a dictionary using dates and times as keys and whose values is the count of each encountered.
I hope this answer will guide you on the way to a solution even though it contains no code.
You could use a couple of regular expressions:
import re
txt = 'ID674021384 25/01/1986 heloo hi thanks 5 minutes and 25-01-1988.'
retime = re.compile('([0-9]+) *minutes')
redate = re.compile('([0-9]+[/-][0-9]+[/-][0-9]{4})')
# find all dates in 'txt'
dates = redate.findall(txt)
print dates
# find all times in 'txt'
times = retime.findall(txt)
print times
# replace dates and times in orignal string:
newtxt = txt
for adate in dates:
newtxt = newtxt.replace(adate, 'date')
for atime in times:
newtxt = newtxt.replace(atime, 'time')
The output looks like this:
Original string:
ID674021384 25/01/1986 heloo hi thanks 5 minutes and 25-01-1988.
Found dates:['25/01/1986', '25-01-1988']
Found times: ['5']
New string:
ID674021384 date heloo hi thanks time minutes and date.
Dates and times found:
ID674021384 DATE:2 TIME:1
Chris

Categories

Resources