I have a location string with placeholders, used as '#'. Another string which are replacements for the placeholders. I want to replace them sequentially, (like format specifiers). What is the way to do it in Python?
location = '/tmp/#/dir1/#/some_dirx/dir/var/2/#/dir3'
replacements = 'xyz'
result = '/tmp/x/dir1/y/some_dirx/dir/var/2/z/dir3'
You should use the replace method of a string as follows:
for replacement in replacements:
location = location.replace('#', replacement, 1)
It is important you use the third argument, count, in order to replace that placeholder just once. Otherwise, it will replace every time you find your placeholder.
If your location string does not contains format specifiers ({}) you could do:
location = '/tmp/#/dir1/#/some_dirx/dir/var/2/#/dir3'
replacements='xyz'
print(location.replace("#", "{}").format(*replacements))
Output
/tmp/x/dir1/y/some_dirx/dir/var/2/z/dir3
As an alternative you could use the fact that repl in re.sub can be a function:
import re
from itertools import count
location = '/tmp/#/dir1/#/some_dirx/dir/var/2/#/dir3'
def repl(match, replacements='xyz', index=count()):
return replacements[next(index)]
print(re.sub('#', repl, location))
Output
/tmp/x/dir1/y/some_dirx/dir/var/2/z/dir3
Related
I have a string as follows where I tried to remove similar consecutive characters.
import re
input = "abccbcbbb";
for i in input :
input = re.sub("(.)\\1+", "",input);
print(input)
Now I need to let the user specify the value of k.
I am using the following python code to do it, but I got the error message TypeError: can only concatenate str (not "int") to str
import re
input = "abccbcbbb";
k=3
for i in input :
input= re.sub("(.)\\1+{"+(k-1)+"}", "",input)
print(input)
The for i in input : does not do what you need. i is each character in the input string, and your re.sub is supposed to take the whole input as a char sequence.
If you plan to match a specific amount of chars you should get rid of the + quantifier after \1. The limiting {min,} / {min,max} quantifier should be placed right after the pattern it modifies.
Also, it is more convenient to use raw string literals when defining regexps.
You can use
import re
input_text = "abccbcbbb";
k=3
input_text = re.sub(fr"(.)\1{{{k-1}}}", "", input_text)
print(input_text)
# => abccbc
See this Python demo.
The fr"(.)\1{{{k-1}}}" raw f-string literal will translate into (.)\1{2} pattern. In f-strings, you need to double curly braces to denote a literal curly brace and you needn't escape \1 again since it is a raw string literal.
If I were you, I would prefer to do it like suggested before. But since I've already spend time on answering this question here is my handmade solution.
The pattern described below creates a named group named "letter". This group updates iterative, so firstly it is a, then b, etc. Then it looks ahead for all the repetitions of the group "letter" (which updates for each letter).
So it finds all groups of repeated letters and replaces them with empty string.
import re
input = 'abccbcbbb'
result = 'abcbcb'
pattern = r'(?P<letter>[a-z])(?=(?P=letter)+)'
substituted = re.sub(pattern, '', input)
assert substituted == result
Just to make sure I have the question correct you mean to turn "abccbcbbb" into "abcbcb" only removing sequential duplicate characters. Is there a reason you need to use regex? you could likely do a simple list comprehension. I mean this is a really cut and dirty way to do it but you could just put
input = "abccbcbbb"
input = list(input)
previous = input.pop(0)
result = [previous]
for letter in input:
if letter != previous : result += letter
previous = letter
result = "".join(result)
and with a method like this, you could make it easier to read and faster with a bit of modification id assume.
I have filenames with the particular format as given
II.NIL.10.BHZ.M.2058.190.160877
II.NIL.10.BHA.M.2008.190.168857
II.NIL.10.BHB.M.2078.198.160857
.
.
.
I want to remove the BH?.M part with the value in a string variable in name.
name=['T','D','FG'.....]
expected output
II.NIL.10.BHT.2058.190.160877
II.NIL.10.BHD.2008.190.168857
II.NIL.10.BHFG.2078.198.160857
.
.
.
Is it possible with str.replace()?
You could use the built-in regex module (re) alongside the following pattern to effectively replace the content in your strings.
Pattern
'(?<=BH)[A-Z]+\.M'
This pattern looks behind (non-matching) to ensure to check for the substring 'BH', then matches on any uppercase character [A-Z] one or more times + followed by the substring '.M'.
Solution
The below solution uses re.sub() alongside the pattern outlined above to return a string with the substring matched by the pattern replaced with that defined here as replacement.
import re
original = 'II.NIL.10.BHB.M.2078.198.160857'
replacement = 'FG'
output = re.sub(r'(?<=BH)[A-Z]+\.M', replacement, original)
print(output)
Output
II.NIL.10.BHFG.2078.198.160857
Processing multiple files
To repeat this process for multiple files you could apply the above logic within a loop/comprehension, running the re.sub() function on each original/replacement pairing and storing/processing appropriately.
The below example uses the data from your original question alongside the above logic to create a list containing the results of each re.sub() operation by way of a dictionary mapping between the original filenames and substrings to be inserted using re.sub().
import re
originals = [
'II.NIL.10.BHZ.M.2058.190.160877',
'II.NIL.10.BHA.M.2008.190.168857',
'II.NIL.10.BHB.M.2078.198.160857'
]
replacements = ['T','D','FG']
mapping = {originals[i]: replacements[i] for i, _ in enumerate(originals)}
results = [re.sub(r'(?<=BH)[A-Z]+\.M', v, k) for k,v in mapping.items()]
for r in results:
print(r)
Output
II.NIL.10.BHT.2058.190.160877
II.NIL.10.BHD.2008.190.168857
II.NIL.10.BHFG.2078.198.160857
Nope, you cannot use str.replace with a wildcard. You will have to use regex with something such as the following
import re
filenames = ['II.NIL.10.BHA.M.2008.190.168857 ', 'II.NIL.10.BHB.M.2078.198.160857',
'II.NIL.10.BHC.M.2078.198.160857']
name = ['T','D','FG']
newfilenames = []
for i in range(len(filenames)):
newfilenames.append(re.sub(r'BH.?\.M', 'BH'+name[i], filenames[i]))
print(' '.join(newfilenames)) # outputs II.NIL.10.BHT.2008.190.168857 II.NIL.10.BHD.2078.198.160857 II.NIL.10.BHFG.2078.198.160857
You can use iter with next in the replacement lambda of re.sub:
import re
name = iter(['T','D','FG'])
s = """
II.NIL.10.BHZ.M.2058.190.160877
II.NIL.10.BHA.M.2008.190.168857
II.NIL.10.BHB.M.2078.198.160857
"""
result = re.sub('(?<=BH)\w\.\w', lambda x:f'{next(name)}', s)
Output:
II.NIL.10.BHT.2058.190.160877
II.NIL.10.BHD.2008.190.168857
II.NIL.10.BHFG.2078.198.160857
Here is the code i have until now :
dex = tree.xpath('//div[#class="cd-timeline-topic"]/text()')
names = filter(lambda n: n.strip(), dex)
table = str.maketrans(dict.fromkeys('?:,'))
for index, name in enumerate(dex, start = 0):
print('{}.{}'.format(index, name.strip().translate(table)))
The problem is that the output will print also strings with one special character "My name is/Richard". So what i need it's to replace that special character with a space and in the end the printing output will be "My name is Richard". Can anyone help me ?
Thanks!
Your call to dict.fromkeys() does not include the character / in its argument.
If you want to map all the special characters to None, just passing your list of special chars to dict.fromkeys() should be enough. If you want to replace them with a space, you could then iterate over the dict and set the value to for each key.
For example:
special_chars = "?:/"
special_char_dict = dict.fromkeys(special_chars)
for k in special_char_dict:
special_char_dict[k] = " "
You can do this by extending your translation table:
dex = ["My Name is/Richard????::,"]
table = str.maketrans({'?':None,':':None,',':None,'/':' '})
for index, name in enumerate(dex, start = 0):
print('{}.{}'.format(index, name.strip().translate(table)))
OUTPUT
0.My Name is Richard
You want to replace most special characters with None BUT forward slash with a space. You could use a different method to replace forward slashes as the other answers here do, or you could extend your translation table as above, mapping all the other special characters to None and forward slash to space. With this you could have a whole bunch of different replacements happen for different characters.
Alternatively you could use re.sub function following way:
import re
s = 'Te/st st?ri:ng,'
out = re.sub(r'\?|:|,|/',lambda x:' ' if x.group(0)=='/' else '',s)
print(out) #Te st string
Arguments meaning of re.sub is as follows: first one is pattern - it informs re.sub which substring to replace, ? needs to be escaped as otherwise it has special meaning there, | means: or, so re.sub will look for ? or : or , or /. Second argument is function which return character to be used in place of original substring: space for / and empty str for anything else. Third argument is string to be changed.
>>> a = "My name is/Richard"
>>> a.replace('/', ' ')
'My name is Richard'
To replace any character or sequence of characters from the string, you need to use `.replace()' method. So the solution to your answer is:
name.replace("/", " ")
here you can find details
How would I replace groups found using the python regex findall method without having to change the rest of the string too.
For example:
import re
repl1='k1'
repl2='k2'
pattern=re.compile('CN=Root,Model=.*,Vector=Reactions\[(.*)\],ParameterGroup=Parameters,Parameter=(.*),Reference=Value')
I want use the re.sub to replace ONLY the elements within the (.*) with repl1 and repl1 rather than having to change the rest of the string too.
-------edit -----
The output I want should look like this:
output = 'CN=Root,Model=.*,Vector=Reactions[k1],ParameterGroup=Parameters,Parameter=k2,Reference=Value')
But note I have left the '.*' in after model because this will change every time. I.e. this can be anything.
----------edit 2----------
The input is a simple one line which is almost exactly the same at pattern. For example:
input= 'CN=Root,Model=Model1,Vector=Reactions\[k10],ParameterGroup=Parameters,Parameter=k12,Reference=Value')
re.sub's argument repl can be a one-argument function, and in that case it is called with the match object as an argument. So, if you ensure that all parts of the pattern are in a group you should have all the information you need to replace the old string with the new one.
import re
repl1='k1'
repl2='k2'
pattern=re.compile('(CN=Root,Model=.*,Vector=Reactions\[)(.*)(\],ParameterGroup=Parameters,Parameter=)(.*)(,Reference=Value)')
target = 'CN=Root,Model=something,Vector=Reactions[somethingelse],ParameterGroup=Parameters,Parameter=1234,Reference=Value'
Now define a function that produces the matched string with groups 1 and 3 replaced with your desired values:
def repl(m):
g = list(m.groups())
g[1] = repl1
g[3] = repl2
return "".join(g)
Passing this function as the first argument to re.sub than achieves the desired transformation:
pattern.sub(repl, target)
gives the result
'CN=Root,Model=something,Vector=Reactions[k1],ParameterGroup=Parameters,Parameter=k2,Reference=Value'
I've been using the following python code to format an integer part ID as a formatted part number string:
pn = 'PN-{:0>9}'.format(id)
I would like to know if there is a way to use that same format string ('PN-{:0>9}') in reverse to extract the integer ID from the formatted part number. If that can't be done, is there a way to use a single format string (or regex?) to create and parse?
The parse module "is the opposite of format()".
Example usage:
>>> import parse
>>> format_string = 'PN-{:0>9}'
>>> id = 123
>>> pn = format_string.format(id)
>>> pn
'PN-000000123'
>>> parsed = parse.parse(format_string, pn)
>>> parsed
<Result ('123',) {}>
>>> parsed[0]
'123'
You might find simulating scanf interresting.
Here's a solution in case you don't want to use the parse module. It converts format strings into regular expressions with named groups. It makes a few assumptions (described in the docstring) that were okay in my case, but may not be okay in yours.
def match_format_string(format_str, s):
"""Match s against the given format string, return dict of matches.
We assume all of the arguments in format string are named keyword arguments (i.e. no {} or
{:0.2f}). We also assume that all chars are allowed in each keyword argument, so separators
need to be present which aren't present in the keyword arguments (i.e. '{one}{two}' won't work
reliably as a format string but '{one}-{two}' will if the hyphen isn't used in {one} or {two}).
We raise if the format string does not match s.
Example:
fs = '{test}-{flight}-{go}'
s = fs.format('first', 'second', 'third')
match_format_string(fs, s) -> {'test': 'first', 'flight': 'second', 'go': 'third'}
"""
# First split on any keyword arguments, note that the names of keyword arguments will be in the
# 1st, 3rd, ... positions in this list
tokens = re.split(r'\{(.*?)\}', format_str)
keywords = tokens[1::2]
# Now replace keyword arguments with named groups matching them. We also escape between keyword
# arguments so we support meta-characters there. Re-join tokens to form our regexp pattern
tokens[1::2] = map(u'(?P<{}>.*)'.format, keywords)
tokens[0::2] = map(re.escape, tokens[0::2])
pattern = ''.join(tokens)
# Use our pattern to match the given string, raise if it doesn't match
matches = re.match(pattern, s)
if not matches:
raise Exception("Format string did not match")
# Return a dict with all of our keywords and their values
return {x: matches.group(x) for x in keywords}
How about:
id = int(pn.split('-')[1])
This splits the part number at the dash, takes the second component and converts it to integer.
P.S. I've kept id as the variable name so that the connection to your question is clear. It is a good idea to rename that variable that it doesn't shadow the built-in function.
Use lucidity
import lucidty
template = lucidity.Template('model', '/jobs/{job}/assets/{asset_name}/model/{lod}/{asset_name}_{lod}_v{version}.{filetype}')
path = '/jobs/monty/assets/circus/model/high/circus_high_v001.abc'
data = template.parse(path)
print(data)
# Output
# {'job': 'monty',
# 'asset_name': 'circus',
# 'lod': 'high',
# 'version': '001',
# 'filetype': 'abc'}