I am looking for a regular expression in Python which mathes for date of birth is the given format: dd.mm.YYYY
For example:
31.12.1999 (31st of December)
02.07.2021
I have provided more examples in the demo.
Dates which are older as 01.01.1920 should not match!!!
Try:
^(?:[0-2][1-9]|3[01])\.(?:0[1-9]|1[12])\.(?:19[2-9]\d|[2-9]\d{3})$
See Regex Demo
Be aware that this will not catch dates like 31.02.2021, that is, it is not sophisticated enough to know how many days are in any given month and it is hopeless to try to come up with a regex that can do that because February is problematic because the regex can't compute which years are leap years.
This will also allow future dates such as 01.01.3099 (you do want this to be work for the future, no?).
Update
You really need to be using the datetime class from the datetime package and, if you want to insist that the date and month fields contain two digits, a regex just to ensure the format:
import re
from datetime import datetime, date
validated = False # assume not validated
s = '31.03.2019'
m = re.fullmatch(r'\d{2}\.\d{2}\.\d{4}', s)
if m:
# we have ensured the correct number of digits:
try:
d = datetime.strptime(s, '%d.%m.%Y').date()
if d >= date(1920, 1, 1):
validated = True
except ValueError:
pass
print(validated)
As I said, it can be done with a very convoluted regex. However, I do not actually recommend using this, I just had fun writing it as a challenge. You should in reality use a very permissive regex and validate the ranges in code.
Demo.
# Easy dates, those <= 28th, valid for all months/years.
(0[1-9]|1[0-9]|2[0-8])\.(0[1-9]|1[0-2])\.(19[2-9][0-9]|2[0-9][0-9][0-9])
|
# Validate the 29th of Februari for 1920-1999.
29\.02\.19([3579][26]|[2468][048])
|
# Validate the 29th of Februari for 2000-2999.
29\.02\.((2[0-9])(0[48]|[13579][26]|[2468][048])|2000|2400|2800)
|
# Validate 29th and 30th.
(29|30)\.(01|0[3-9]|1[0-2])\.(19[2-9][0-9]|2[0-9][0-9][0-9])
|
# Validate 31st.
31\.(01|03|05|07|08|10|12)\.(19[2-9][0-9]|2[0-9][0-9][0-9])
\d\{2}.\d\{2}.\d{4}
Validating the value of the dates should be done at the application level.
Related
I am trying to create an error handler for an exercise where I check for correct input format. I looked at the docs and SO for examples but I am still here. I believe I am looking for: (there have been a few variations tried as well)
check_time = re.compile('^[0-1][0-9]:[0-5][0-9] ([A|a]|[P|p][M|m])')
but my test cases are failing.
Code calling for input from user:
import re
class CivilianTime:
def __init__(self):
# no error handling yet
self.civ_time = input('Enter the time in (XX:XX A/PM) format.\n')
check_time = re.compile('1[0-2]:[0-5][0-9] AM | 1[0-2]:[0-5][0-9] PM')
if check_time != self.civ_time:
self.civ_time = input('Enter the time in (XX:XX A/PM) format.\n')
# if PM, strip time to numerical values and add 1200
# if AM, strip time to numerical values
def time_converter(self):
if self.civ_time[-2] == 'P':
strip_time = self.civ_time.strip(" PM")
strip_time = strip_time.replace(':', '')
strip_time = int(strip_time) + 1200
print(strip_time)
else:
strip_time = self.civ_time.strip(' AM')
strip_time = strip_time.replace(':', '')
print(strip_time)
c = CivilianTime()
c.time_converter()
Result:
Enter the time in (XX:XX A/PM) format.
1212 am
Enter the time in (XX:XX A/PM) format.
1212pm
1212pm
I want to see it ask for the time again when the input is not in the desired format. It's running the function even when there's no space.
Unless there's a way for me to use in.
You are mis-reading the docs,
https://docs.python.org/3/library/re.html
You are on the right track. When you use or|`, you have to rewrite the entire expression. So first match 1 hour at a time and simply test all the cases in multiple lines of code. Dont try to one liner it until you completely understand regex.
12:00 AM and 11:00 AM and 10:00 AM = 1[0-2]:[0-5][0-9] AM
Now to match that for PM you have to or | the entire expression.
So, matcher = '1[0-2]:[0-5][0-9] AM | 1[0-2]:[0-5][0-9] PM'
Now match the remaining time with what you have learned! Hint: the rest of the hours start with 0.
I have a list of strings and wish to find exact phases.
So far my code finds the month and year only, but the whole phase including “- Recorded” is needed, like “March 2016 - Recorded”.
How can it add on the “- Recorded” to the regex?
import re
texts = [
"Shawn Dookhit took annual leave in March 2016 - Recorded The report",
"Soondren Armon took medical leave in February 2017 - Recorded It was in",
"David Padachi took annual leave in May 2016 - Recorded It says",
"Jack Jagoo",
"Devendradutt Ramgolam took medical leave in August 2016 - Recorded Day back",
"Kate Dudhee",
"Vinaye Ramjuttun took annual leave in - Recorded Answering"
]
regex = re.compile('(?P<month>[a-zA-Z]+)\s+(?P<year>\d{4})\s')
for t in texts:
try:
m = regex.search(t)
print m.group()
except:
print "keyword's not found"
You got 2 named groups here: month and year which takes month and year from your strings. To get - Recorded into recorded named group you can do this:
regex = re.compile('(?P<month>[a-zA-Z]+)\s+(?P<year>\d{4})\s(?P<recorded>- Recorded)')
Or if you can just add - Recorded to your regex without named group:
regex = re.compile('(?P<month>[a-zA-Z]+)\s+(?P<year>\d{4})\s- Recorded')
Or you can add named group other with hyphen and one capitalized word:
regex = re.compile('(?P<month>[a-zA-Z]+)\s+(?P<year>\d{4})\s(?P<other>- [A-Z][a-z]+)')
I think first or third option is preferable because you already got named groups. Also i recommend you to use this web site http://pythex.org/, it really helps to construct regex :).
Use a list comprehension with the corrected regex:
regex = re.compile('(?P<month>[a-zA-Z]+)\s+(?P<year>\d{4})\s* - Recorded')
matches = [match.groups() for text in texts for match in [regex.search(text)] if match]
print(matches)
# [('March', '2016'), ('February', '2017'), ('May', '2016'), ('August', '2016')]
I am kind of new to Python so I apologize for my lacks. I have a code in python perfected with other users' help (thank you) that converts a date from numbers into words using dictionaries for days,months,years, like 3.6.2015 => march.third.two thousand fifteen using:
date = raw_input("Give date: ")
I want to input a sentence such as: "today is 3.6.2015, it is 10:00 o'clock and it's rainy" and from it I do not know how to search through the sentence for the date, or time, or phone number and to that date and time to apply the conversion.
If someone can please help, thank you.
You could use regular expressions:
import re
s = "today is 3.6.2015, it is 10:00 o'clock and it's rainy"
mat = re.search(r'(\d{1,2}\.\d{1,2}\.\d{4})', s)
date = mat.group(1)
print date # 3.6.2015
Note, if there's nothing matching this regular expression in the input text, an AttributeError will be raised, that you'll either have to prevent (e.g. if mat:) or handle.
EDIT
Assuming you can turn your conversion code into a function, you could use re.sub:
import re
def your_function(num_string):
# Whatever your function does
words_string = "march.third.two thousand fifteen"
return words_string
s = "today is 3.6.2015, it is 10:00 o'clock and it's rainy"
date = re.sub(r'(\d{1,2}\.\d{1,2}\.\d{4})', your_function, s)
print date
# today is march.third.two thousand fifteen, it is 10:00 o'clock and it's rainy
Just modify your_function to change the 3.6.2015 into march.third.two thousand fifteen.
I have a batch of raw text files. Each file begins with Date>>month.day year News garbage.
garbage is a whole lot of text I don't need, and varies in length. The words Date>> and News always appear in the same place and do not change.
I want to copy month day year and insert this data into a CSV file, with a new line for every file in the format day month year.
How do I copy month day year into separate variables?
I tryed to split a string after a known word and before a known word. I'm familiar with string[x:y], but I basically want to change x and y from numbers into actual words (i.e. string[Date>>:News])
import re, os, sys, fnmatch, csv
folder = raw_input('Drag and drop the folder > ')
for filename in os.listdir(folder):
# First, avoid system files
if filename.startswith("."):
pass
else:
# Tell the script the file is in this directory and can be written
file = open(folder+'/'+filename, "r+")
filecontents = file.read()
thestring = str(filecontents)
print thestring[9:20]
An example text file:
Date>>January 2. 2012 News 122
5 different news agencies have reported the story of a man washing his dog.
Here's a solution using the re module:
import re
s = "Date>>January 2. 2012 News 122"
m = re.match("^Date>>(\S+)\s+(\d+)\.\s+(\d+)", s)
if m:
month, day, year = m.groups()
print("{} {} {}").format(month, day, year)
Outputs:
January 2 2012
Edit:
Actually, there's another nicer (imo) solution using re.split described in the link Robin posted. Using that approach you can just do:
month, day, year = re.split(">>| |\. ", s)[1:4]
You can use the string method .split(" ") to separate the output into a list of variables split at the space character. Because year and month.day will always be in the same place you can access them by their position in the output list. To separate month and day use the .split function again, but this time for .
Example:
list = theString.split(" ")
year = list[1]
month= list[0].split(".")[0]
day = list[0].split(".")[1]
You could use string.split:
x = "A b c"
x.split(" ")
Or you could use regular expressions (which I see you import but don't use) with groups. I don't remember the exact syntax off hand, but the re is something like r'(.*)(Date>>)(.*). This re searches for the string "Date>>" in between two strings of any other type. The parentheses will capture them into numbered groups.
In python I need a logic for below scenario I am using split function to this.
I have string which contains input as show below.
"ID674021384 25/01/1986 heloo hi thanks 5 minutes and 25-01-1988."
"ID909900000 25-01-1986 hello 10 minutes."
And output should be as shown below which replace date format to "date" and time format to "time".
"ID674021384 date hello hi thanks time date."
"ID909900000 date hello time."
And also I need a count of date and time for each Id as show below
ID674021384 DATE:2 TIME:1
ID909900000 DATE:1 TIME:1
>>> import re
>>> from collections import defaultdict
>>> lines = ["ID674021384 25/01/1986 heloo hi thanks 5 minutes and 25-01-1988.", "ID909900000 25-01-1986 hello 10 minutes."]
>>> pattern = '(?P<date>\d{1,2}[/-]\d{1,2}[/-]\d{4})|(?P<time>\d+ minutes)'
>>> num_occurences = {line:defaultdict(int) for line in lines}
>>> def repl(matchobj):
num_occurences[matchobj.string][matchobj.lastgroup] += 1
return matchobj.lastgroup
>>> for line in lines:
text_id = line.split(' ')[0]
new_text = re.sub(pattern,repl,line)
print new_text
print '{0} DATE:{1[date]} Time:{1[time]}'.format(text_id, num_occurences[line])
print ''
ID674021384 date heloo hi thanks time and date.
ID674021384 DATE:2 Time:1
ID909900000 date hello time.
ID909900000 DATE:1 Time:1
For parsing similar lines of text, like log files, I often use regular expressions using the re module. Though split() would work well also for separating fields which don't contain spaces and the parts of the date, using regular expressions allows you to also make sure the format matches what you expect, and if need be warn you of a weird looking input line.
Using regular expressions, you could get the individual fields of the date and time and construct date or datetime objects from them (both from the datetime module). Once you have those objects, you can compare them to other similar objects and write new entries, formatting the dates as you like. I would recommend parsing the whole input file (assuming you're reading a file) and writing a whole new output file instead of trying to alter it in place.
As for keeping track of the date and time counts, when your input isn't too large, using a dictionary is normally the easiest way to do it. When you encounter a line with a certain ID, find the entry corresponding to this ID in your dictionary or add a new one to it if not. This entry could itself be a dictionary using dates and times as keys and whose values is the count of each encountered.
I hope this answer will guide you on the way to a solution even though it contains no code.
You could use a couple of regular expressions:
import re
txt = 'ID674021384 25/01/1986 heloo hi thanks 5 minutes and 25-01-1988.'
retime = re.compile('([0-9]+) *minutes')
redate = re.compile('([0-9]+[/-][0-9]+[/-][0-9]{4})')
# find all dates in 'txt'
dates = redate.findall(txt)
print dates
# find all times in 'txt'
times = retime.findall(txt)
print times
# replace dates and times in orignal string:
newtxt = txt
for adate in dates:
newtxt = newtxt.replace(adate, 'date')
for atime in times:
newtxt = newtxt.replace(atime, 'time')
The output looks like this:
Original string:
ID674021384 25/01/1986 heloo hi thanks 5 minutes and 25-01-1988.
Found dates:['25/01/1986', '25-01-1988']
Found times: ['5']
New string:
ID674021384 date heloo hi thanks time minutes and date.
Dates and times found:
ID674021384 DATE:2 TIME:1
Chris