Python regex for date and numbers, find the date format - python

How to extract dates alone from text file using regex in Python 3?
Below is my current code:
import datetime
from datetime import date
import re
s = "birthday on 20/12/2018 and wedding aniversry on 04/01/1997 and dob is on
09/07/1897"
match = re.search(r'\d{2}/\d{2}/\d{4}', s)
date = datetime.datetime.strptime(match.group(), '%Y-%m-%d').date()
print (date)
Expected Output is
20/12/2018
04/01/1997
09/07/1897

Building on DirtyBit's answer. I found that if you make a minor change, it will pick up multiple date formats. Change the forward slash to a dot.
import re
s = "birthday on 20.12.2018 and wedding anniversary on 04-01-1997 and dob
is on 09/07/1897"
pattern = r'\d{2}.\d{2}.\d{4}'
print("\n".join(re.findall(pattern,s)))
Output
20.12.2018
04-01-1997
09/07/1897

You have an invalid date format near '%Y-%m-%d' since it should have been '%d/%m/%Y' looking at your provided date: birthday on 20/12/2018 (dd/mm/yyyy)
Change this:
date = datetime.datetime.strptime(match.group(), '%Y-%m-%d').date()
With this:
date = datetime.datetime.strptime(match.group(), '%d/%m/%Y').date()
Your Fix:
import datetime
from datetime import date
import re
s = "birthday on 20/12/2018"
match = re.search(r'\d{2}/\d{2}/\d{4}', s)
date = datetime.datetime.strptime(match.group(), '%d/%m/%Y').date()
print (date)
But:
Why get into all the trouble? When they're easier and elegant ways out there.
Using dparser:
import dateutil.parser as dparser
dt_1 = "birthday on 20/12/2018"
print("Date: {}".format(dparser.parse(dt_1,fuzzy=True).date()))
OUTPUT:
Date: 2018-12-20
EDIT:
With your edited question which now has multiple dates, you could extract them using regex:
import re
s = "birthday on 20/12/2018 and wedding aniversry on 04/01/1997 and dob is on 09/07/1897"
pattern = r'\d{2}/\d{2}/\d{4}'
print("\n".join(re.findall(pattern,s)))
OUTPUT:
20/12/2018
04/01/1997
09/07/1897
OR
Using dateutil:
from dateutil.parser import parse
for s in s.split():
try:
print(parse(s))
except ValueError:
pass
OUTPUT:
2018-12-20 00:00:00
1997-04-01 00:00:00
1897-09-07 00:00:00

You are doing everything right expect this line,
date = datetime.datetime.strptime(match.group(), '%d/%m/%Y').date()
You have to give the same format as your input has in datetime.strptime.
'%Y-%m-%d' >> 2018-12-20
'%d/%m/%Y' >> 20/12/2018
Edit
If you are not looking for datetime object. You can do like this
results = re.findall(r'\d{2}/\d{2}/\d{4}', s)
print('\n'.join(results))
Output
In [20]: results = re.findall(r'\d{2}/\d{2}/\d{4}', s)
In [21]: print('\n'.join(results))
20/12/2018
04/01/1997
09/07/1897

Related

extract date, month and year from string in python

I have this column where the string has date, month, year and also time information. I need to take the date, month and year only.
There is no space in the string.
The string is on this format:
date
Tuesday,August22022-03:30PMWIB
Monday,July252022-09:33PMWIB
Friday,January82022-09:33PMWIB
and I expect to get:
date
2022-08-02
2022-07-25
2022-01-08
How can I get the date, month and year only and change the format into yyyy-mm-dd in python?
thanks in advance
Use strptime from datetime library
var = "Tuesday,August22022-03:30PMWIB"
date = var.split('-')[0]
formatted_date = datetime.strptime(date, "%A,%B%d%Y")
print(formatted_date.date()) #this will get your output
Output:
2022-08-02
You can use the standard datetime library
from datetime import datetime
dates = [
"Tuesday,August22022-03:30PMWIB",
"Monday,July252022-09:33PMWIB",
"Friday,January82022-09:33PMWIB"
]
for text in dates:
text = text.split(",")[1].split("-")[0]
dt = datetime.strptime(text, '%B%d%Y')
print(dt.strftime("%Y-%m-%d"))
An alternative/shorter way would be like this (if you want the other date parts):
for text in dates:
dt = datetime.strptime(text[:-3], '%A,%B%d%Y-%I:%M%p')
print(dt.strftime("%Y-%m-%d"))
The timezone part is tricky and works only for UTC, GMT and local.
You can read more about the format codes here.
strptime() only accepts certain values for %Z:
any value in time.tzname for your machine’s locale
the hard-coded values UTC and GMT
You can convert to datetime object then get string back.
from datetime import datetime
datetime_object = datetime.strptime('Tuesday,August22022-03:30PM', '%A,%B%d%Y-%I:%M%p')
s = datetime_object.strftime("%Y-%m-%d")
print(s)
You can use the datetime library to parse the date and print it in your format. In your examples the day might not be zero padded so I added that and then parsed the date.
import datetime
date = 'Tuesday,August22022-03:30PMWIB'
date = date.split('-')[0]
if not date[-6].isnumeric():
date = date[:-5] + "0" + date[-5:]
newdate = datetime.datetime.strptime(date, '%A,%B%d%Y').strftime('%Y-%m-%d')
print(newdate)
# prints 2022-08-02

Extract 'WeekDay Month Date HH:MM:SS' from Filename?

I have files which contain the full date and time in its name. I want to extract the particular format and print that as my first line of CSV I am writing.
My file name is like below:
VIN5_2019-04-03_10-21-26_38
I want the first line of my CSV to print as below:
date Wed Apr 3 10:21:26.000 am 2019
My code is below:
import can
import csv
import datetime
import re
filename = open('C:\\Users\\xyz\\files\\time_linear_Hexa.csv', "w")
log1 = can.BLFReader('C:\\Users\\xyz\\blf files\\VIN5_2019-04-03_10-33-59_39.blf')
filename.write(re.search("([0-9]{4}\-[0-9]{2}\-[0-9]{2})", filename))
filename.write('base hex timestamps absolute\ninternal events logged \n// version 11.0.0 \n')
How can I achieve date and time in the same format as the file image like shown in the screenshot below?
Using regex and datetime module.
Ex:
import re
import datetime
s = "VIN5_2019-04-03_10-21-26_38"
m = re.search(r"[A-Z]+\d_(?P<date>\d{4}\-\d{2}\-\d{2})_(?P<time>\d{2}\-\d{2}\-\d{2})", s)
if m:
print(datetime.datetime.strptime("{} {}".format(m.group('date'), m.group('time')), "%Y-%m-%d %H-%M-%S").strftime("%a %b %d %H:%M:%S.%f %p %Y"))
Output:
Wed Apr 03 10:21:26.000000 AM 2019
You can use a lookahead and loosbehind to extract the datetime string, use strptime to convert it to a datetime object and then use strftime to formate it to the desired form.
import re
import datetime from datetime
s = "VIN5_2019-04-03_10-21-26_38"
res = re.search('(?<=\w{5})[\w-]*(?=_\d{2})', s)
dt_string = res[0]
dt = datetime.strptime(dt_string, '%Y-%m-%d_%H-%M-%S')
'{} {} {}'.format(dt.strftime('%a %b %d %H:%M:%S.%f')[:-3], dt.strftime('%p').lower(), dt.strftime('%Y'))

Python: Extract two dates from string

I have a string s which contains two dates in it and I am trying to extract these two dates in order to subtract them from each other to count the number of days in between. In the end I am aiming to get a string like this: s = "o4_24d_20170708_20170801"
At the company I work we can't install additional packages so I am looking for a solution using native python. Below is what I have so far by using the datetime package which only extracts one date: How can I get both dates out of the string?
import re, datetime
s = "o4_20170708_20170801"
match = re.search('\d{4}\d{2}\d{2}', s)
date = datetime.datetime.strptime(match.group(), '%Y%m%d').date()
print date
from datetime import datetime
import re
s = "o4_20170708_20170801"
pattern = re.compile(r'(\d{8})_(\d{8})')
dates = pattern.search(s)
# dates[0] is full match, dates[1] and dates[2] are captured groups
start = datetime.strptime(dates[1], '%Y%m%d')
end = datetime.strptime(dates[2], '%Y%m%d')
difference = end - start
print(difference.days)
will print
24
then, you could do something like:
days = 'd{}_'.format(difference.days)
match_index = dates.start()
new_name = s[:match_index] + days + s[match_index:]
print(new_name)
to get
o4_d24_20170708_20170801
import re, datetime
s = "o4_20170708_20170801"
match = re.findall('\d{4}\d{2}\d{2}', s)
for a_date in match:
date = datetime.datetime.strptime(a_date, '%Y%m%d').date()
print date
This will print:
2017-07-08
2017-08-01
Your regex was working correctly at regexpal

How to format date in python

I made a crawler using python.
But my crawler get date in this format:
s = page_ad.findAll('script')[25].text.replace('\'', '"')
s = re.search(r'\{.+\}', s, re.DOTALL).group() # get json data
s = re.sub(r'//.+\n', '', s) # replace comment
s = re.sub(r'\s+', '', s) # strip whitspace
s = re.sub(r',}', '}', s) # get rid of last , in the dict
dataLayer = json.loads(s)
print dataLayer["page"]["adDetail"]["adDate"]
2017-01-1412:28:07
I want only date without hours (2017-01-14), how get only date if not have white spaces?
use string subset:
>>> date ="2017-01-1412:28:07"
>>> datestr= date[:-8]
>>> datestr
'2017-01-14'
>>>
As this is not a standard date format, just slice the end.
st = "2017-01-1412:28:07"
res = st[:10]
print res
>>>2017-01-14
try this code:
In [2]: from datetime import datetime
In [3]: now = datetime.now()
In [4]: now.strftime('%Y-%m-%d')
Out[4]: '2017-01-24'
Update
I suggest you parse the date first into datetime object and then show the relevant information out of it.
for this a better approach would be using a library for this.
I use dateparser for this tasks, example usage:
import dateparser
date = dateparser.parse('12/12/12')
date.strftime('%Y-%m-%d')
Use datetime as follows to first convert it into a datetime object, and then format the output as required using the stftime() function:
from datetime import datetime
ad_date = dataLayer["page"]["adDetail"]["adDate"]
print datetime.strptime(ad_date, "%Y-%m-%d%H:%M:%S").strftime("%Y-%m-%d")
This will print:
2017-01-14
By using this method, it would give you the flexibility to display other items, for example adding %A to the end would give you the day of the week:
print datetime.strptime(ad_date, "%Y-%m-%d%H:%M:%S").strftime("%Y-%m-%d %A")
e.g.
2017-01-14 Saturday

find matching string in a file using Python

Using Python I would like to find strings in a file that matches this format YYYY-MM-DD
Here is how my sample file looks like
I want to find date 2016-01-01 ,2016-01-05
then I want to find 2016-01-17
then I want to find this date 2016-01-04
Output should be
2016-01-01
2016-01-05
2016-01-17
2016-01-04
below is the code I am using currently using but I can't find matching records , any help on this will be appreciated ?
#!/usr/bin/python
import sys
import csv
import re
pattern = re.compile("^([0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9])$")
for i, line in enumerate(open('C:\\Work\\scripts\\logs\\CSI.txt')):
for match in re.finditer(pattern, line):
print 'Found on line' % (i+1, match.groups())
I would remove ^( and $, because your dates don't seem separated :
re.compile("[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]")
You can use regex and datetime to get valid dates from string
import re
from datetime import datetime
string = "I want to find date 2016-01-01 ,2016-01-05"
pattern = re.complie("[\d]{4}-\d{2}-\d{2}")
raw_dates = pattern.findall(string)
parsed_dates = []
for date in raw_dates:
try:
d = datetime.strptime(date, "%Y-%m-%d")
parsed_dates.append(d)
except:
pass
print(parsed_dates)
output:
['2016-01-01', '2016-01-05']

Categories

Resources