Using 'Replace' Function in Python - python

I have an access table that has a bunch coordinate values in degrees minutes seconds and they are formatted like this:
90-12-28.15
I want to reformat it like this:
90° 12' 28.15"
essentially replacing the dashes with the degrees minutes and seconds characters and a space between the degrees and minutes and another one between the minutes and seconds.
I'm thinking about using the 'Replace' function, but I'm not sure how to replace the first instance of the dash with a degree character (°) and space and then detect the second instance of the dash and place the minute characters and a space and then finally adding the seconds character at the end.
Any help is appreciated.
Mike

While regular expressions and split() are fine solutions, doing this with replace() is rather easy.
lat = "90-12-28.15"
lat = lat.replace("-", "° ", 1)
lat = lat.replace("-", "' ", 1)
lat = lat + '"'
Or you can do it all on one line:
lat = lat.replace("-", "° ", 1).replace("-", "' ", 1) + '"'

I would just split your first string:
# -*- coding: utf-8 -*-
str = '90-12-28.15'
arr = str.split('-')
str2 = arr[0] +'° ' + arr[1] + '\'' +arr[2] +'"'
print str2

You might want to use Python's regular expressions module re, particularly re.sub(). Check the Python docs here for more information.
If you're not familiar with regular expressions, check out this tutorial here, also from the Python documentation.
import re
text = 'replace "-" in 90-12-28.15'
print(re.sub(r'(\d\d)-(\d\d)-(\d\d)\.(\d\d)', r'''\1° \2' \3.\4"''', text))
# use \d{1,2} instead of \d\d if single digits are allowed

The python "replace" string method should be easy to use. You can find the documentation here.
In your case, you can do something like this:
my_str = "90-12-28.15"
my_str = my_str.replace("-","°",1)# 1 means you are going to replace the first ocurrence only
my_str = my_str.replace("-","'",1)
my_str = my_str + "\""

Related

How to add text within two string delimiters

I want to add some text within two delimiters in a string.
Previous string:
'ABC [123]'
New string needs to be like this:
'ABC [123 sometext]'
How do I do that?
slightly more versatile I'd say, without using replace:
s = 'ABC [123]'
insert = 'sometext'
insert_after = '123'
delimiter = ' '
ix = s.index(insert_after)
if ix != -1:
s = s[:ix+len(insert_after)] + delimiter + insert + s[ix+len(insert_after):]
# or with an f-string:
# s = f"{s[:ix+len(insert_after)]}{delimiter}{insert}{s[ix+len(insert_after):]}"
print(s)
# ABC [123 sometext]
If the insert patterns get more complex, I'd also suggest to take a look at regex. If the pattern is simple however, not using regex should be the more efficient solution.
Most of these types of changes depend on prerequisite knowledge of string pattern.
In your case simple str.replace would do the trick.
varstr = 'ABC [123]';
varstr.replace(']',' sometext]');
You can profit a lot from str doc and diving into regex;
All the above answers are correct but if somehow you are trying to add a variable
variable_string = 'ABC [123]'
sometext = "The text you want to add"
variable_string = variable_string.replace("]", " " + sometext + "]")
print(variable_string)

RegEx for matching a datetime followed by spaces and any chars

I need to profile some data in a bucket, and have come across a bit of a dilemma.
This is the type of line in each file:
"2018-09-08 10:34:49 10.0 MiB path/of/a/directory"
What's required is to capture everything in bold while keeping in mind that some of the separators are tabs and other times they are spaces.
To rephrase, I need everything from the moment the date and time end (excluding the tab or space preceding it)
I tried something like this:
p = re.compile(r'^[\d\d\d\d.\d\d.\d\d\s\d\d:\d\d:\d\d].*')
for line in lines:
print(re.findall(line))
How do I solve this problem?
EDIT:
What if I wanted to also create new groups into that the newly matched string? Say I wanted to recreate it to -->
10MiB engagementName/folder/file/something.xlsx engagementName extensionType something.xlsx
RE-EDIT:
The path/to/directory generally points to a file(and all files have extensions). from the reformatted string you guys have been helping me with, is there a way to keep building on the regex pattern to allow me to "create" a new group through the filtering on the fileExtensionType(I suppose by searching the end of the string for somthing along the lines of .anything) and adding that result into the formatted regex string?
Don't bother with a regular expression. You know the format of the line. Just split it:
from datetime import datetime
for l in lines:
line_date, line_time, rest_of_line = l.split(maxsplit=2)
print([line_date, line_time, rest_of_line])
# ['2018-09-08', '10:34:49', '10.0 MiB path/of/a/directory']
Take special note of the use of the maxsplit argument. This prevents it from splitting the size or the path. We can do this because we know the date has one space in the middle and one space after it.
If the size will always have one space in the middle and one space following it, we can increase it to 4 splits to separate the size, too:
for l in lines:
line_date, line_time, size_quantity, size_units, line_path = l.split(maxsplit=4)
print([line_date, line_time, size_quantity, size_units, line_path])
# ['2018-09-08', '10:34:49', '10.0', 'MiB', 'path/of/a/directory']
Note that extra contiguous spaces and spaces in the path don't screw it up:
l = "2018-09-08 10:34:49 10.0 MiB path/of/a/direct ory"
line_date, line_time, size_quantity, size_units, line_path = l.split(maxsplit=4)
print([line_date, line_time, size_quantity, size_units, line_path])
# ['2018-09-08', '10:34:49', '10.0', 'MiB', 'path/of/a/direct ory']
You can concatenate parts back together if needed:
line_size = size_quantity + ' ' + size_units
If you want the timestamp for something, you can parse it:
# 'T' could be anything, but 'T' is standard for the ISO 8601 format
timestamp = datetime.strptime(line_date + 'T' + line_time, '%Y-%m-%dT%H:%M:%S')
You might not need an expression to do so, a string split would suffice. However, if you wish to do so, you might not want to bound your expression from very beginning. You can simply use this expression:
(:[0-9]+\s+)(.*)$
You can even slightly modify it to this expression which is just a bit faster:
:([0-9]+\s+)(.*)$
Graph
The graph shows how the expression works:
Example Test:
# -*- coding: UTF-8 -*-
import re
string = "2018-09-08 10:34:49 10.0 MiB path/of/a/directory"
expression = r'(:[0-9]+\s+)(.*)$'
match = re.search(expression, string)
if match:
print("YAAAY! \"" + match.group(2) + "\" is a match 💚 ")
else:
print('🙀 Sorry! No matches! Something is not right! Call 911 👮')
Output
YAAAY! "10.0 MiB path/of/a/directory" is a match 💚
JavaScript Performance Benchmark
This snippet is a JavaScript performance test with 10 million times repetition of your input string:
repeat = 10000000;
start = Date.now();
for (var i = repeat; i >= 0; i--) {
var string = "2018-09-08 10:34:49 10.0 MiB path/of/a/directory";
var regex = /(.*)(:[0-9]+\s+)(.*)/g;
var match = string.replace(regex, "$3");
}
end = Date.now() - start;
console.log("YAAAY! \"" + match + "\" is a match 💚 ");
console.log(end / 1000 + " is the runtime of " + repeat + " times benchmark test. 😳 ");
Edit:
You might only capture the end of a timestamp, because your expression would have less boundaries, it becomes simple and faster, and in case there was unexpected instances, it would still work:
2019/12/15 10:00:00 **desired output**
2019-12-15 10:00:00 **desired output**
2019-12-15, 10:00:00 **desired output**
2019-12 15 10:00:00 **desired output**

Python inserting spaces in string

Alright, I'm working on a little project for school, a 6-frame translator. I won't go into too much detail, I'll just describe what I wanted to add.
The normal output would be something like:
TTCPTISPALGLAWS_DLGTLGFMSYSANTASGETLVSLYQLGLFEM_VVSYGRTKYYLICP_LFHLSVGFVPSD
The important part of this string are the M and the _ (the start and stop codons, biology stuff). What I wanted to do was highlight these like so:
TTCPTISPALGLAWS_DLGTLGF 'MSYSANTASGETLVSLYQLGLFEM_' VVSYGRTKYYLICP_LFHLSVGFVPSD
Now here is where (for me) it gets tricky, I got my output to look like this (adding a space and a ' to highlight the start and stop). But it only does this once, for the first start and stop it finds. If there are any other M....._ combinations it won't highlight them.
Here is my current code, attempting to make it highlight more than once:
def start_stop(translation):
index_2 = 0
while True:
if 'M' in translation[index_2::1]:
index_1 = translation[index_2::1].find('M')
index_2 = translation[index_1::1].find('_') + index_1
new_translation = translation[:index_1] + " '" + \
translation[index_1:index_2 + 1] + "' " +\
translation[index_2 + 1:]
else:
break
return new_translation
I really thought this would do it, guess not. So now I find myself being stuck.
If any of you are willing to try and help, here is a randomly generated string with more than one M....._ set:
'TTCPTISPALGLAWS_DLGTLGFMSYSANTASGETLVSLYQLGLFEM_VVSYGRTKYYLICP_LFHLSVGFVPSDGRRLTLYMPPARRLATKSRFLTPVISSG_DKPRHNPVARSQFLNPLVRPNYSISASKSGLRLVLSYTRLSLGINSLPIERLQYSVPAPAQITP_IPEHGNARNFLPEWPRLLISEPAPSVNVPCSVFVVDPEHPKAHSKPDGIANRLTFRWRLIG_VFFHNAL_VITHGYSRVDILLPVSRALHVHLSKSLLLRSAWFTLRNTRVTGKPQTSKT_FDPKATRVHAIDACAE_QQH_PDSGLRFPAPGSCSEAIRQLMI'
Thank you to anyone willing to help :)
Regular expressions are pretty handy here:
import re
sequence = "TTCP...."
highlighted = re.sub(r"(M\w*?_)", r" '\1' ", sequence)
# Output:
"TTCPTISPALGLAWS_DLGTLGF 'MSYSANTASGETLVSLYQLGLFEM_' VVSYGRTKYYLICP_LFHLSVGFVPSDGRRLTLY 'MPPARRLATKSRFLTPVISSG_' DKPRHNPVARSQFLNPLVRPNYSISASKSGLRLVLSYTRLSLGINSLPIERLQYSVPAPAQITP_IPEHGNARNFLPEWPRLLISEPAPSVNVPCSVFVVDPEHPKAHSKPDGIANRLTFRWRLIG_VFFHNAL_VITHGYSRVDILLPVSRALHVHLSKSLLLRSAWFTLRNTRVTGKPQTSKT_FDPKATRVHAIDACAE_QQH_PDSGLRFPAPGSCSEAIRQLMI"
Regex explanation:
We look for an M followed by any number of "word characters" \w* then an _, using the ? to make it a non-greedy match (otherwise it would just make one group from the first M to the last _).
The replacement is the matched group (\1 indicates "first group", there's only one), but surrounded by spaces and quotes.
You just require little slice of 'slice' module , you don't need any external module :
Python string have a method called 'index' just use it.
string_1='TTCPTISPALGLAWS_DLGTLGFMSYSANTASGETLVSLYQLGLFEM_VVSYGRTKYYLICP_LFHLSVGFVPSD'
before=string_1.index('M')
after=string_1[before:].index('_')
print('{} {} {}'.format(string_1[:before],string_1[before:before+after+1],string_1[before+after+1:]))
output:
TTCPTISPALGLAWS_DLGTLGF MSYSANTASGETLVSLYQLGLFEM_ VVSYGRTKYYLICP_LFHLSVGFVPSD

Parsing a MAC address with python

How can I convert a hex value "0000.0012.13a4" into "00:00:00:12:13:A4"?
text = '0000.0012.13a4'
text = text.replace('.', '').upper() # a little pre-processing
# chunk into groups of 2 and re-join
out = ':'.join([text[i : i + 2] for i in range(0, len(text), 2)])
print(out)
00:00:00:12:13:A4
import re
old_string = "0000.0012.13a4"
new_string = ':'.join(s for s in re.split(r"(\w{2})", old_string.upper()) if s.isalnum())
print(new_string)
OUTPUT
> python3 test.py
00:00:00:12:13:A4
>
Without modification, this approach can handle some other MAC formats that you might run into like, "00-00-00-12-13-a4"
Try following code
import re
hx = '0000.0012.13a4'.replace('.','')
print(':'.join(re.findall('..', hx)))
Output: 00:00:00:12:13:a4
There is a pretty simple three step solution:
First we strip those pesky periods.
step1 = hexStrBad.replace('.','')
Then, if the formatting is consistent:
step2 = step1[0:2] + ':' + step1[2:4] + ':' + step1[4:6] + ':' + step1[6:8] + ':' + step1[8:10] + ':' + step1[10:12]
step3 = step2.upper()
It's not the prettiest, but it will do what you need!
It's unclear what you're asking exactly, but if all you want is to make a string all uppercase, use .upper()
Try to clarify your question somewhat, because if you're asking about converting some weirdly formatted string into what looks like a MAC address, we need to know that to answer your question.

Replacing items in string, Python

I'm trying to define a function in python to replace some items in a string. My string is a string that contains degrees minutes seconds (i.e. 216-56-12.02)
I want to replace the dashes so I can get the proper symbols, so my string will look like 216° 56' 12.02"
I tried this:
def FindLabel ([Direction]):
s = [Direction]
s = s.replace("-","° ",1) #replace first instancwe of the dash in the original string
s = s.replace("-","' ") # replace the remaining dash from the last string
s = s + """ #add in the minute sign at the end
return s
This doesn't seem to work. I'm not sure what's going wrong. Any suggestions are welcome.
Cheers,
Mike
Honestly, I wouldn't bother with replacement. Just .split() it:
def find_label(direction):
degrees, hours, minutes = direction.split('-')
return u'{}° {}\' {}"'.format(degrees, hours, minutes)
You could condense it even more if you want:
def find_label(direction):
return u'{}° {}\' {}"'.format(*direction.split('-'))
If you want to fix your current code, see my comments:
def FindLabel(Direction): # Not sure why you put square brackets here
s = Direction # Or here
s = s.replace("-",u"° ",1)
s = s.replace("-","' ")
s += '"' # You have to use single quotes or escape the double quote: `"\""`
return s
You might have to specify the utf-8 encoding at the top of your Python file as well using a comment:
# This Python file uses the following encoding: utf-8
this is how i would do it by splitting into a list and then joining back:
s = "{}° {}' {}\"".format(*s.split("-"))

Categories

Resources