Python regular expression to change date formatting - python

I have an array of strings representing dates like '2015-6-03' and I want to convert these to the format '2015-06-03'.
Instead of doing the replacement with an ugly loop, I'd like to use a regular expression. Something along the lines of:
str.replace('(-){1}(\d){1}(-){1}', '-0{my digit here}-')
Is something like this possible?

You don't have to retrieve the digit from the match. You can replace the hyphen before a single-digit month with -0.
Like this:
re.sub('-(?=\d-)', '-0', text)
Note that (?=\d-) is a non-capturing expression because the opening parenthesis is followed by the special sequence ?=. That's why only the hyphen gets replaced.
Test:
import re
text = '2015-09-03 2015-6-03 2015-1-03 2015-10-03'
re.sub('-(?=\d-)', '-0', text)
Result:
'2015-09-03 2015-06-03 2015-01-03 2015-10-03'

Yes, a regex will accomplish what you want
\d+-(\d)-\d+
and so to replace you would use something like
import re
target = "2015-6-05"
out = re.sub(r'\d+-(\d)-\d+','(0\\1)', target)

No need for regex, you can load it as datetime object and format the string as requested when you print it:
import datetime
s = '2015-6-03'
date_obj = datetime.datetime.strptime(s, '%Y-%m-%d')
print "%d-%02d-%02d" % (date_obj.year, date_obj.month, date_obj.day)
OUTPUT
2015-06-03

Something along the lines of...
import re
def replaceRegex(what, pattern, filler):
regex = re.compile(pattern)
match = regex.match(what)
if match != None:
from, to = match.span()
return what.replace(what[from : to], filler)
else:
return None
Might help you.

Related

How to find date with regular expresions

I have to find some date inside an string with a regular expresions in python
astring ='L2A_T21HUB_A023645_20210915T135520'
and i'm trying to get the part before the T with shape xxxxxxxx where every x is a number.
desiredOutput = '20210915'
I'm new in regex so I have no idea how to solve this
If the astring's format is consistent, meaning it will always have the same shape with respect to the date, you can split the string by '_' and get the last substring and get the date from there as such:
astring ='L2A_T21HUB_A023645_20210915T135520'
date_split = astring.split("_"). # --> ['L2A', 'T21HUB', 'A023645', '20210915T135520']
desiredOutput = date_split[3][:8] # --> [3] = '20210915T135520' [:8] gets first 8 chars
print(desiredOutput) # --> 20210915
If you wanted an actual datetime object
>>> from datetime import datetime
>>> astring = 'L2A_T21HUB_A023645_20210915T135520'
>>> date_str = astring.split('_')[-1]
>>> datetime.strptime(date_str, '%Y%m%dT%H%M%S')
datetime.datetime(2021, 9, 15, 13, 55, 20)
From that, you can use datetime.strftime to reformat to a new string, or you can use split('T')[0] to get the string you want.
The trouble with Regex is that there can be unexpected patterns that match your expected pattern and throw things off. However, if you know that only the date portion will ever have 8 sequential digits, you can do this:
import re
date_patt = re.compile('\d{8}')
date = date_patt.search(astring).group(0)
You can develop more robust patterns based on your knowledge of the formatting of the incoming strings. For instance, if you know that the date will always follow an underscore, you could use a look-behind assertion:
date_patt = re.compile(r'(?<=\_)\d{8}') # look for '_' before the date, but don't capture
Hope this helps. Regex can be finicky and may take some tweaking, but hope this sets you in the right direction.

How to extract string from a filename containing date using regex in python?

I am using python 2.7 version
I have a string as below:
flow_id="Livongo_Weekly_Enrollment_2019_03_19"
I am getting flow_id from a function, so now I want to extract Livongo_Weekly_Enrollment, I want something like below :
if flow_id contains "YYYY-MM-DD"
clean_flow_id=re.sub('\d{4}[_]\d{2}[_]\d{2}','',flow_id)
but the output of clean_flow_id is 'Livongo_Weekly_Enrollment_'
I don't want last '_'
So to conclude first I need to match if the string contains date like "YYYY_MM_DD" if yes then extract the name as given above, can anyone please help?
I cannot find a function or library to match flow_id with date format ("YYYY_MM_DD")
to get clean_flow_id I tried something like below :
clean_flow_id=re.sub('\d{4}[_]\d{2}[_]\d{2}','',flow_id)
clean_flow_id='Livongo_Weekly_Enrollment_'
import re
flow_id="Livongo_Weekly_Enrollment_2019_03_19"
if flow_id
Expected :
clean_flow_id=flow_id=Livongo_Weekly_Enrollment
I expect the output of clean_flow_id to be Livongo_Weekly_Enrollment, but the actual output is Livongo_Weekly_Enrollment_
Could you please in finding function to match date format in flow_id?
Maybe this simple expression returns the desired output:
(.*?)_[0-9]{4}_[0-9]{2}_[0-9]{2}\b
With re.sub you can simply replace it with \1.
Test with re.sub
import re
expression = r"(.*?)_[0-9]{4}_[0-9]{2}_[0-9]{2}\b"
string = """
Livongo_Weekly_Enrollment_2019_03_19
Livongo_Weekly_Enrollment_2019_01_01
Livongo_Weekly_Enrollment_2019_3_1
Livongo_Weekly_Enrollment_2019_03_1
Livongo_Weekly_Enrollment_2019_03_111
"""
print(re.sub(expression, r"\1", string))
Output
Livongo_Weekly_Enrollment
Livongo_Weekly_Enrollment
Livongo_Weekly_Enrollment_2019_3_1
Livongo_Weekly_Enrollment_2019_03_1
Livongo_Weekly_Enrollment_2019_03_111
Test with re.findall
import re
expression = r"(.*?)_[0-9]{4}_[0-9]{2}_[0-9]{2}\b"
string = """
Livongo_Weekly_Enrollment_2019_03_19
Livongo_Weekly_Enrollment_2019_01_01
Livongo_Weekly_Enrollment_2019_3_1
Livongo_Weekly_Enrollment_2019_03_1
Livongo_Weekly_Enrollment_2019_03_111
"""
print(re.findall(expression, string))
Output
['Livongo_Weekly_Enrollment', 'Livongo_Weekly_Enrollment']
If you wish to explore/simplify/modify the expression, it's been
explained on the top right panel of
regex101.com. If you'd like, you
can also watch in this
link, how it would match
against some sample inputs.
Test if there is a date
import re
expression = r"^(?=.*\d{4}_\d{2}_\d{2}).*$"
string = "Livongo_Weekly_Enrollment_2019_03_19"
if re.search(expression, string):
print(f"There is at least one date in the {string}")
else:
print(f"Sorry! {string} has no date.")
Output
There is at least one date in the Livongo_Weekly_Enrollment_2019_03_19
Demo
You can use re.search in your if statement:
test = re.search('(.*)_\d{4}[_]\d{2}[_]\d{2}', flow_id)
if test:
your_match = test[1]
test[0] is the whole string
test[1] is the first parentheses
test[2] is the date (as a string)
--
Edit
def get_clean_flow_id(filename):
test = re.search('(.*)_\d{4}[_]\d{2}[_]\d{2}', filename)
if test:
return test[1]
else:
return None
Returns:
>>> get_clean_flow_id("Livongo_Weekly_Enrollment_2019_03_19.csv")
'Livongo_Weekly_Enrollment'
>>> get_clean_flow_id("Omada_weekly_fle_20190319120301.csv")
>>> get_clean_flow_id("tivity_weekly_fle_20190319120301.json")
>>>

Python Regex: Remove the parts of the string that does not match regex pattern

I want to remove parts of the string that does not match the format that I want. Example:
import re
string = 'remove2017abcdremove'
pattern = re.compile("((20[0-9]{2})([a-zA-Z]{4}))")
result = pattern.search(string)
if result:
print('1')
else:
print('0')
It returns "1" so I can find the matching format inside the string however I also want to remove the parts that says "remove" on it.
I want it to return:
desired_output = '2017abcd'
You need to identify group from search result, which is done through calling a group():
import re
string = 'remove2017abcdremove'
pattern = re.compile("(20[0-9]{2}[a-zA-Z]{4})")
string = pattern.search(string).group()
# 2017abcd

How to use regex to find the middle of a string

I'm trying to get certain results out of the response from Blogger. I wanna get my blog names. How would I go about something like that with Regex? I've tried Googling my issue but none of the answers helped me in my case unfortunately.
So my response looks something like this:
\\x22http://emyblog.blogspot.com/
So it's always starting with the \\x22http:// and ending with .blogspot.com/
I've tried the following re:
regEx = re.findall(b"""\x22http://(.*)\.blogspot\.com""", r)
But unfortunately it returned an empty list. Any idea's on how to solve this problem?
Thanks,
Use a raw string, otherwise \\x22 is interpreted as the character " instead of a literal string. Not sure that the re.findall method is the good method, re.search should suffice.
Assuming your byte-string is:
>>> r = rb'\\x22http://emyblog.blogspot.com/'
With byte-strings:
>>> res = re.search(rb'\\x22http://(.*)\.blogspot\.com/', r)
>>> res.group(1)
b'emyblog'
With normal strings:
>>> res = re.search(r'\\\\x22http://(.*)\.blogspot\.com/', r.decode('utf-8'))
>>> res.group(1)
'emyblog'
use r'' (string is taken as raw string literal) instead of b''
import re
pattern = re.compile(r'\x22http://(.*)\.blogspot\.com')
match = pattern.match('\x22http://emyblog.blogspot.com/')
match.group(1)
# 'emyblog'
This seems to be working!
import re
text = "\x22http://emyblog.blogspot.com/"
regex = re.compile('\x22http://(.*)\.blogspot\.com')
print regex.findall(text)

Check python string format?

I have a bunch of strings but I only want to keep the ones with this format:
x/x/xxxx xx:xx
What is the easiest way to check if a string meets this format? (Assuming I want to check by if it has 2 /'s and a ':' )
try with regular expresion:
import re
r = re.compile('.*/.*/.*:.*')
if r.match('x/x/xxxx xx:xx') is not None:
print 'matches'
you can tweak the expression to match your needs
Use time.strptime to parse from string to time struct. If the string doesn't match the format it raises ValueError.
If you use regular expressions with match you must also account for the end being too long. Without testing the length in this code it is possible to slip any non-newline character at the end. Here is code modified from other answers.
import re
r = re.compile('././.{4} .{2}:.{2}')
s = 'x/x/xxxx xx:xx'
if len(s) == 14:
if r.match(s):
print 'matches'

Categories

Resources