Regex operation to modify a string

Regex operation to modify a string - python

At the moment I have a string similar to:
mytime = '143456.45674'
That string is giving time in :
%HH%MM:%SS . something else
I am only interested in HH:MM:SS format so I could do:
mynewTime = mytime[0:2]+":"+mytime[2:4]+":"+mytime[4:6]
'14:34:56'
It is a bit ugly and I was wondering if there was a more elegant/efficient way of doing it. Regex perhaps?

Looks like you're looking for a combination of strptime and strftime:
import datetime
mytime = '143456.45674'
ts = datetime.datetime.strptime(mytime, '%H%M%S.%f')
print ts.strftime('%H:%M:%S')
# 14:34:56

If you want to do it in regex
>>> import re
>>> val=re.sub(r"\..*$", "", "143456.45674")
>>> re.sub(r"(?<=\d)(?=(\d{2})+$)", ":", val )
'14:34:56'

A regex version for fun :)
re.sub(r"(\d{2})(\d{2})(\d{2})\.\d+", r"\1:\2:\3", mytime)

Related

How to create a datetime object from a string?

I tried to be smart and create a one liner which can extract the datetime of my_string and make a datetime of it. However, it did not work quiet well.
my_string = 'London_XX65TR_20211116_112413.txt'
This is my code:
datetime= datetime.datetime.strptime(my_string .split('_')[2],'%Y%m%d_%H%M%S')
This is my output:
ValueError: time data '20211116' does not match format '%Y%m%d_%H%M%S'

You could use the maxsplit argument in str.split:
>>> from datetime import datetime
>>> region, code, date_time = my_string[:-4].split('_', maxsplit=2)
>>> datetime.strptime(date_time, "%Y%m%d_%H%M%S")
datetime.datetime(2021, 11, 16, 11, 24, 13)
Which means only split at, at most maxsplit occurrences of the _ characters from the left, leave the rest as is.
For this particular case, instead of my_string[:-4], you could use my_string.rstrip('.txt'), it is not advised in general, because it may strip some useful information as well. Whereas, from Python 3.9+ you could use str.removesuffix:
>>> my_string = 'London_XX65TR_20211116_112413.txt'
>>> region, code, date_time = my_string.removesuffix('.txt').split('_', maxsplit=2)
>>> datetime.strptime(date_time, "%Y%m%d_%H%M%S")
datetime.datetime(2021, 11, 16, 11, 24, 13)

You could use re.findall here:
from datetime import datetime
my_string = 'London_XX65TR_20211116_112413.txt'
ts = re.findall(r'_(\d{8}_\d{6})\.', my_string)[0]
dt = datetime.strptime(ts, '%Y%m%d_%H%M%S')
print(dt) # 2021-11-16 11:24:13
This approach uses a regex to extract the timestamp from the input string. The rest of your logic was already correct.

The Method you are following is correct. It's just you are not considering the HH:MIN:Sec part and need to append that before formatting,
my_string = 'London_XX65TR_20211116_112413.txt'
my_date = (my_string .split('_')[2]+my_string .split('_')[3]).replace(".txt","")
datetime= datetime.datetime.strptime(my_date,'%Y%m%d%H%M%S')
print(datetime) # 2021-11-16 11:24:13

Your code does not work because my_string .split('_') gives ['London', 'XX65TR', '20211116', '112413.txt'] so in strptime('20211116', '%Y%m%d_%H%M%S') return an error.
You should either :
limit the format to `'%Y%m%d', loosing the HMS
find another way to get the whole substring matching the format
The first part of the alternative is trivial so lets go for the second one using regex.
import regex as re
datetime = datetime.datetime.strptime(re.search(r'\d{8}_\d{6}', my_string)[0],'%Y%m%d_%H%M%S')

from datetime import datetime
date_time_str = '18/09/19 01:55:19'
date_time_obj = datetime.strptime(date_time_str, '%d/%m/%y %H:%M:%S')
print ("The type of the date is now", type(date_time_obj))
print ("The date is", date_time_obj)

Python regular expression to change date formatting

I have an array of strings representing dates like '2015-6-03' and I want to convert these to the format '2015-06-03'.
Instead of doing the replacement with an ugly loop, I'd like to use a regular expression. Something along the lines of:
str.replace('(-){1}(\d){1}(-){1}', '-0{my digit here}-')
Is something like this possible?

You don't have to retrieve the digit from the match. You can replace the hyphen before a single-digit month with -0.
Like this:
re.sub('-(?=\d-)', '-0', text)
Note that (?=\d-) is a non-capturing expression because the opening parenthesis is followed by the special sequence ?=. That's why only the hyphen gets replaced.
Test:
import re
text = '2015-09-03 2015-6-03 2015-1-03 2015-10-03'
re.sub('-(?=\d-)', '-0', text)
Result:
'2015-09-03 2015-06-03 2015-01-03 2015-10-03'

Yes, a regex will accomplish what you want
\d+-(\d)-\d+
and so to replace you would use something like
import re
target = "2015-6-05"
out = re.sub(r'\d+-(\d)-\d+','(0\\1)', target)

No need for regex, you can load it as datetime object and format the string as requested when you print it:
import datetime
s = '2015-6-03'
date_obj = datetime.datetime.strptime(s, '%Y-%m-%d')
print "%d-%02d-%02d" % (date_obj.year, date_obj.month, date_obj.day)
OUTPUT
2015-06-03

Something along the lines of...
import re
def replaceRegex(what, pattern, filler):
regex = re.compile(pattern)
match = regex.match(what)
if match != None:
from, to = match.span()
return what.replace(what[from : to], filler)
else:
return None
Might help you.

Using regular expression to extract string

I need to extract the IP address from the following string.
>>> mydns='ec2-54-196-170-182.compute-1.amazonaws.com'
The text to the left of the dot needs to be returned. The following works as expected.
>>> mydns[:18]
'ec2-54-196-170-182'
But it does not work in all cases. For e.g.
mydns='ec2-666-777-888-999.compute-1.amazonaws.com'
>>> mydns[:18]
'ec2-666-777-888-99'
How to I use regular expressions in python?

No need for regex... Just use str.split
mydns.split('.', 1)[0]
Demo:
>>> mydns='ec2-666-777-888-999.compute-1.amazonaws.com'
>>> mydns.split('.', 1)[0]
'ec2-666-777-888-999'

If you wanted to use regex for this:
Regex String
ec2-([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3}).*
Alternative (EC2 Agnostic):
.*\b([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3}).*
Replacement String
Regular: \1.\2.\3.\4
Reverse: \4.\3.\2.\1
Python code
import re
subject = 'ec2-54-196-170-182.compute-1.amazonaws.com'
result = re.sub("ec2-([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3})-([0-9]{1,3}).*", r"\1.\2.\3.\4", subject)
print result

This regex will match (^[^.]+:
So Try this:
import re
string = "ec2-54-196-170-182.compute-1.amazonaws.com"
ip = re.findall('^[^.]+',string)[0]
print ip
Output:
ec2-54-196-170-182
Best thing is this will match even if the instance was ec2,ec3 so this regex is actually very much similar to the code of #mgilson

How can I format date in ISO using Python?

I have some dates which format is d/m/yyyy (e.g. 2/28/1987).
I would like to have it in the ISO format : 1987-02-28
I think we can do it that way, but it seems a little heavy:
str_date = '2/28/1987'
arr_str = re.split('/', str_date)
iso_date = arr_str[2]+'-'+arr_str[0][:2]+'-'+arr_str[1]
Is there another way to do it with Python?

You could use the datetime module:
datetime.datetime.strptime(str_date, '%m/%d/%Y').date().isoformat()
or, as running code:
>>> import datetime
>>> str_date = '2/28/1987'
>>> datetime.datetime.strptime(str_date, '%m/%d/%Y').date().isoformat()
'1987-02-28'

I would use the datetime module to parse it:
>>> from datetime import datetime
>>> date = datetime.strptime('2/28/1987', '%m/%d/%Y')
>>> date.strftime('%Y-%m-%d')
'1987-02-28'

What you have is very nearly how I would write it, the only major improvement I can suggest is using plain old split instead of re.split, since you do not need a regular expression:
arr_str = str_date.split('/')
If you needed to do anything more complicated like that I would recommend time.strftime, but that's significantly more expensive than string bashing.

Python Regex replace

Hey I'm trying to figure out a regular expression to do the following.
Here is my string
Place,08/09/2010,"15,531","2,909",650
I need to split this string by the comma's. Though due to the comma's used in the numerical data fields the split doesn't work correctly. So I want to remove the comma's in the numbers before running splitting the string.
Thanks.

new_string = re.sub(r'"(\d+),(\d+)"', r'\1.\2', original_string)
This will substitute the , inside the quotes with a . and you can now just use the strings split method.

>>> from StringIO import StringIO
>>> import csv
>>> r = csv.reader(StringIO('Place,08/09/2010,"15,531","2,909",650'))
>>> r.next()
['Place', '08/09/2010', '15,531', '2,909', '650']

Another way of doing it using regex directly:
>>> import re
>>> data = "Place,08/09/2010,\"15,531\",\"2,909\",650"
>>> res = re.findall(r"(\w+),(\d{2}/\d{2}/\d{4}),\"([\d,]+)\",\"([\d,]+)\",(\d+)", data)
>>> res
[('Place', '08/09/2010', '15,531', '2,909', '650')]

You could parse a string of that format using pyparsing:
import pyparsing as pp
import datetime as dt
st='Place,08/09/2010,"15,531","2,909",650'
def line_grammar():
integer=pp.Word(pp.nums).setParseAction(lambda s,l,t: [int(t[0])])
sep=pp.Suppress('/')
date=(integer+sep+integer+sep+integer).setParseAction(
lambda s,l,t: dt.date(t[2],t[1],t[0]))
comma=pp.Suppress(',')
quoted=pp.Regex(r'("|\').*?\1').setParseAction(
lambda s,l,t: [int(e) for e in t[0].strip('\'"').split(',')])
line=pp.Word(pp.alphas)+comma+date+comma+quoted+comma+quoted+comma+integer
return line
line=line_grammar()
print(line.parseString(st))
# ['Place', datetime.date(2010, 9, 8), 15, 531, 2, 909, 650]
The advantage is you parse, convert, and validate in a few lines. Note that the ints are all converted to ints and the date to a datetime structure.

a = """Place,08/09/2010,"15,531","2,909",650""".split(',')
result = []
i=0
while i<len(a):
if not "\"" in a[i]:
result.append(a[i])
else:
string = a[i]
i+=1
while True:
string += ","+a[i]
if "\"" in a[i]:
break
i+=1
result.append(string)
i+=1
print result
Result:
['Place', '08/09/2010', '"15,531"', '"2,909"', '650']
Not a big fan of regular expressions unless you absolutely need them

If you need a regex solution, this should do:
r"(\d+),(?=\d\d\d)"
then replace with:
"\1"
It will replace any comma-delimited numbers anywhere in your string with their number-only equivalent, thus turning this:
Place,08/09/2010,"15,531","548,122,909",650
into this:
Place,08/09/2010,"15531","548122909",650
I'm sure there are a few holes to be found and places you don't want this done, and that's why you should use a parser!
Good luck!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Regex operation to modify a string - python

Looks like you're looking for a combination of strptime and strftime: import datetime mytime = '143456.45674' ts = datetime.datetime.strptime(mytime, '%H%M%S.%f') print ts.strftime('%H:%M:%S') # 14:34:56

If you want to do it in regex >>> import re >>> val=re.sub(r"\..*$", "", "143456.45674") >>> re.sub(r"(?<=\d)(?=(\d{2})+$)", ":", val ) '14:34:56'

A regex version for fun :) re.sub(r"(\d{2})(\d{2})(\d{2})\.\d+", r"\1:\2:\3", mytime)

Related

How to create a datetime object from a string?

Python regular expression to change date formatting

Using regular expression to extract string

How can I format date in ISO using Python?

Python Regex replace

Categories

Resources