Grouping the characters and performing substitution - python

I want to replace my string based on the values in my dictionary. I want to try this with regular expression.
d = { 't':'ch' , 'r' : 'gh'}
s = ' Text to replace '
m = re.search('#a pattern to just get each character ',s)
m.group() # this should get me 'T' 'e' 'x' 't' .....
# how can I replace each character in string S with its corresponding key: value in my dictionary? I looked at re.sub() but could figure out how it can be used here.
I want to generate an output -> Texch cho gheplace

Using re.sub:
>>> d = { 't':'ch' , 'r' : 'gh'}
>>> s = ' Text to replace '
>>> import re
>>> pattern = '|'.join(map(re.escape, d))
>>> re.sub(pattern, lambda m: d[m.group()], s)
' Texch cho gheplace '
The second argument to the re.sub can be a function. The return value of the function is used as a replacement string.

If there is no character in the values of the dictionary appear as a key in the dictionary, then its fairly simple. You can straight away use str.replace function, like this
for char in d:
s = s.replace(char, d[char])
print s # Texch cho gheplace
Even simpler, you can use the following and this will work even if the keys appear in any of the values in the dictionary.
s, d = ' Text to replace ', { 't':'ch' , 'r' : 'gh'}
print "".join(d.get(char, char) for char in s) # Texch cho gheplace

Related

python replace multiple double characters using a dictionary

I am looking to replace character pairs in a string using a dictionary in python.
It works for single characters but not doubles.
txt = "1122"
def processString6(txt):
dictionary = {'11': 'a', '22':'b'}
transTable = txt.maketrans(dictionary)
txt = txt.translate(transTable)
print(txt)
processString6(txt)
Error Message:
ValueError: string keys in translate table must be of length 1
Desired output:
ab
I'v also tried
s = ' 11 22 '
d = {' 11 ':'a', ' 22 ':'b'}
print( ''.join(d[c] if c in d else c for c in s))
but likewise it doesn't work
looking to use a dictionary as opposed to .replace() as
I just want to scan the string once
as .replace() does a scan for each key,value
You can use this piece of code to replace any length of strings:
import re
txt = "1122"
def processString6(txt):
dictionary = {'11': 'a', '22':'b'}
pattern = re.compile(
'|'.join(sorted(dictionary.keys(), key=len, reverse=True)))
result = pattern.sub(lambda x: dictionary[x.group()], txt)
return result
print(processString6(txt))

Python Combining f-string with r-string and curly braces in regex

Given a single word (x); return the possible n-grams that can be found in that word.
You can modify the n-gram value according as you want;
it is in the curly braces in the pat variable.
The default n-gram value is 4.
For example; for the word (x):
x = 'abcdef'
The possible 4-gram are:
['abcd', 'bcde', 'cdef']
def ngram_finder(x):
pat = r'(?=(\S{4}))'
xx = re.findall(pat, x)
return xx
The Question is:
How to combine the f-string with the r-string in the regex expression, using curly braces.
You can use this string to combine the n value into your regexp, using double curly brackets to create a single one in the output:
fr'(?=(\S{{{n}}}))'
The regex needs to have {} to make a quantifier (as you had in your original regex {4}). However f strings use {} to indicate an expression replacement so you need to "escape" the {} required by the regex in the f string. That is done by using {{ and }} which in the output create { and }. So {{{n}}} (where n=4) generates '{' + '4' + '}' = '{4}' as required.
Complete code:
import re
def ngram_finder(x, n):
pat = fr'(?=(\S{{{n}}}))'
return re.findall(pat, x)
x = 'abcdef'
print(ngram_finder(x, 4))
print(ngram_finder(x, 5))
Output:
['abcd', 'bcde', 'cdef']
['abcde', 'bcdef']

Replace multiple characters in a string

Is there a simple way in python to replace multiples characters by another?
For instance, I would like to change:
name1_22:3-3(+):Pos_bos
to
name1_22_3-3_+__Pos_bos
So basically replace all "(",")",":" with "_".
I only know to do it with:
str.replace(":","_")
str.replace(")","_")
str.replace("(","_")
You could use re.sub to replace multiple characters with one pattern:
import re
s = 'name1_22:3-3(+):Pos_bos '
re.sub(r'[():]', '_', s)
Output
'name1_22_3-3_+__Pos_bos '
Use a translation table. In Python 2, maketrans is defined in the string module.
>>> import string
>>> table = string.maketrans("():", "___")
In Python 3, it is a str class method.
>>> table = str.maketrans("():", "___")
In both, the table is passed as the argument to str.translate.
>>> 'name1_22:3-3(+):Pos_bos'.translate(table)
'name1_22_3-3_+__Pos_bos'
In Python 3, you can also pass a single dict mapping input characters to output characters to maketrans:
table = str.maketrans({"(": "_", ")": "_", ":": "_"})
Sticking to your current approach of using replace():
s = "name1_22:3-3(+):Pos_bos"
for e in ((":", "_"), ("(", "_"), (")", "__")):
s = s.replace(*e)
print(s)
OUTPUT:
name1_22_3-3_+___Pos_bos
EDIT: (for readability)
s = "name1_22:3-3(+):Pos_bos"
replaceList = [(":", "_"), ("(", "_"), (")", "__")]
for elem in replaceList:
print(*elem) # : _, ( _, ) __ (for each iteration)
s = s.replace(*elem)
print(s)
OR
repList = [':','(',')'] # list of all the chars to replace
rChar = '_' # the char to replace with
for elem in repList:
s = s.replace(elem, rChar)
print(s)
Another possibility is usage of so-called list comprehension combined with so-called ternary conditional operator following way:
text = 'name1_22:3-3(+):Pos_bos '
out = ''.join(['_' if i in ':)(' else i for i in text])
print(out) #name1_22_3-3_+__Pos_bos
As it gives list, I use ''.join to change list of characters (strs of length 1) into str.

Python removing delimiters from strings

I have 2 related questions/ issues.
def remove_delimiters (delimiters, s):
for d in delimiters:
ind = s.find(d)
while ind != -1:
s = s[:ind] + s[ind+1:]
ind = s.find(d)
return ' '.join(s.split())
delimiters = [",", ".", "!", "?", "/", "&", "-", ":", ";", "#", "'", "..."]
d_dataset_list = ['hey-you...are you ok?']
d_list = []
for d in d_dataset_list:
d_list.append(remove_delimiters(delimiters, d[1]))
print d_list
Output = 'heyyouare you ok'
What is the best way of avoiding strings being combined together when a delimiter is removed? For example, so that the output is hey you are you ok ?
There may be a number of different sequences of ..., for example .. or .......... etc. How does one go around implementing some form of rule, where if more than one . appear after each other, to remove it? I want to try and avoid hard-coding all sequences in my delimiters list. Thankyou
You could try something like this:
Given delimiters d, join them to a regular expression
>>> d = ",.!?/&-:;#'..."
>>> "["+"\\".join(d)+"]"
"[,\\.\\!\\?\\/\\&\\-\\:\\;\\#\\'\\.\\.\\.]"
Split the string using this regex with re.split
>>> s = 'hey-you...are you ok?'
>>> re.split("["+"\\".join(d)+"]", s)
['hey', 'you', '', '', 'are you ok', '']
Join all the non-empty fragments back together
>>> ' '.join(w for w in re.split("["+"\\".join(d)+"]", s) if w)
'hey you are you ok'
Also, if you just want to remove all non-word characters, you can just use the character group \W instead of manually enumerating all the delimiters:
>>> ' '.join(w for w in re.split(r"\W", s) if w)
'hey you are you ok'
So first of all, your function for removing delimiters could be simplified greatly by using the replace function (http://www.tutorialspoint.com/python/string_replace.htm)
This would help solve your first question. Instead of just removing them, replace with a space, then get rid of the spaces using the pattern you already used (split() treats consecutive delimiters as one)
A better function, which does this, would be:
def remove_delimiters (delimiters, s):
new_s = s
for i in delimiters: #replace each delimiter in turn with a space
new_s = new_s.replace(i, ' ')
return ' '.join(new_s.split())
to answer your second question, I'd say it's time for regular expressions
>>> import re
... ss = 'hey ... you are ....... what?'
... print re.sub('[.+]',' ',ss)
hey you are what?
>>>

How to strip characters from the right side of every word in Python?

Say, if I have a text like
text='a!a b! c!!!'
I want an outcome like this:
text='a!a b c'
So, if the end of each words is '!', I want to get rid of it. If there are multiple '!' in the end of a word, all of them will be eliminated.
print " ".join(word.rstrip("!") for word in text.split())
As an alternative to the split/strip approach
" ".join(x.rstrip("!") for x in text.split())
which won't preserve whitespace exactly, you could perhaps use a regex such as
re.sub(r"!+\B", "", text)
which blanks out all exclamations that aren't immediate followed by the start of a word.
import re
>>> testWord = 'a!a b! c!!!'
>>> re.sub(r'(!+)(?=\s|$)', '', testWord)
'a!a b c'
This preserves any extra spaces you may have in your string which does not happen with str.split()
Here's a non-regex, non-split based approach:
from itertools import groupby
def word_rstrip(s, to_rstrip):
words = (''.join(g) for k,g in groupby(s, str.isspace))
new_words = (w.rstrip(to_strip) for w in words)
return ''.join(new_words)
This works first by using itertools.groupby to group together contiguous characters based on whether or not they're whitespace:
>>> s = "a!a b! c!!"
>>> [''.join(g) for k,g in groupby(s, str.isspace)]
['a!a', ' ', 'b!', ' ', 'c!!']
Effectively, this is like a whitespace-preserving .split(). Once we've got this, we can use rstrip as we always would, and then recombine:
>>> [''.join(g).rstrip("!") for k,g in groupby(s, str.isspace)]
['a!a', ' ', 'b', ' ', 'c']
>>> ''.join(''.join(g).rstrip("!") for k,g in groupby(s, str.isspace))
'a!a b c'
We can also pass whatever we like:
>>> word_rstrip("a!! this_apostrophe_won't_vanish these_ones_will'''", "!'")
"a this_apostrophe_won't_vanish these_ones_will"

Categories

Resources