Python tokenize sentence with optional key/val pairs

Python tokenize sentence with optional key/val pairs - python

I'm trying to parse a sentence (or line of text) where you have a sentence and optionally followed some key/val pairs on the same line. Not only are the key/value pairs optional, they are dynamic. I'm looking for a result to be something like:
Input:
"There was a cow at home. home=mary cowname=betsy date=10-jan-2013"
Output:
Values = {'theSentence' : "There was a cow at home.",
'home' : "mary",
'cowname' : "betsy",
'date'= "10-jan-2013"
}
Input:
"Mike ordered a large hamburger. lastname=Smith store=burgerville"
Output:
Values = {'theSentence' : "Mike ordered a large hamburger.",
'lastname' : "Smith",
'store' : "burgerville"
}
Input:
"Sam is nice."
Output:
Values = {'theSentence' : "Sam is nice."}
Thanks for any input/direction. I know the sentences appear that this is a homework problem, but I'm just a python newbie. I know it's probably a regex solution, but I'm not the best regarding regex.

I'd use re.sub:
import re
s = "There was a cow at home. home=mary cowname=betsy date=10-jan-2013"
d = {}
def add(m):
d[m.group(1)] = m.group(2)
s = re.sub(r'(\w+)=(\S+)', add, s)
d['theSentence'] = s.strip()
print d
Here's more compact version if you prefer:
d = {}
d['theSentence'] = re.sub(r'(\w+)=(\S+)',
lambda m: d.setdefault(m.group(1), m.group(2)) and '',
s).strip()
Or, maybe, findall is a better option:
rx = '(\w+)=(\S+)|(\S.+?)(?=\w+=|$)'
d = {
a or 'theSentence': (b or c).strip()
for a, b, c in re.findall(rx, s)
}
print d

If your sentence is guaranteed to end on ., then, you could follow the following approach.
>>> testList = inputString.split('.')
>>> Values['theSentence'] = testList[0]+'.'
For the rest of the values, just do.
>>> for elem in testList[1].split():
key, val = elem.split('=')
Values[key] = val
Giving you a Values like so
>>> Values
{'date': '10-jan-2013', 'home': 'mary', 'cowname': 'betsy', 'theSentence': 'There was a cow at home.'}
>>> Values2
{'lastname': 'Smith', 'theSentence': 'Mike ordered a large hamburger.', 'store': 'burgerville'}
>>> Values3
{'theSentence': 'Sam is nice.'}

The first step is to do
inputStr = "There was a cow at home. home=mary cowname=betsy date=10-jan-2013"
theSentence, others = str.split('.')
You're going to then want to break up "others". Play around with split() (the argument you pass in tells Python what to split the string on), and see what you can do. :)

Assuming there could be only 1 dot, that divides the sentence and assignment pairs:
input = "There was a cow at home. home=mary cowname=betsy date=10-jan-2013"
sentence, assignments = input.split(". ")
result = {'theSentence': sentence + "."}
for item in assignments.split():
key, value = item.split("=")
result[key] = value
print result
prints:
{'date': '10-jan-2013',
'home': 'mary',
'cowname': 'betsy',
'theSentence': 'There was a cow at home.'}

Assuming = doesn't appear in the sentence itself. This seems to be more valid than assuming the sentence ends with a ..
s = "There was a cow at home. home=mary cowname=betsy date=10-jan-2013"
eq_loc = s.find('=')
if eq_loc > -1:
meta_loc = s[:eq_loc].rfind(' ')
s = s[:meta_loc]
metastr = s[meta_loc + 1:]
metadict = dict(m.split('=') for m in metastr.split())
else:
metadict = {}
metadict["theSentence"] = s

So as usual, there's a bunch of ways to do this. Here's a regexp based approach that looks for key=value pairs:
import re
sentence = "..."
values = {}
for match in re.finditer("(\w+)=(\S+)", sentence):
if not values:
# everything left to the first key/value pair is the sentence
values["theSentence"] = sentence[:match.start()].strip()
else:
key, value = match.groups()
values[key] = value
if not values:
# no key/value pairs, keep the entire sentence
values["theSentence"] = sentence
This assumes that the key is a Python-style identifiers, and that the value consists of one or more non-whitespace characters.

Supposing that the first period separates the sentence from the values, you can use something like this:
#! /usr/bin/python3
a = "There was a cow at home. home=mary cowname=betsy date=10-jan-2013"
values = (lambda s, tail: (lambda d, kv: (d, d.update (kv) ) ) ( {'theSentence': s}, {k: v for k, v in (x.split ('=') for x in tail.strip ().split (' ') ) } ) ) (*a.split ('.', 1) ) [0]
print (values)

Nobody posted a comprehensible one-liner. The question is answered, but gotta do it in one line, it's the Python way!
{"theSentence": sentence.split(".")[0]}.update({item.split("=")[0]: item.split("=")[1] for item in sentence.split(".")[1].split()})
Eh, not super elegant, but it's totally in one line. No imports even.

use the regular expression findall. the first capture group is the sentence. | is the or condition for the second capture group: one or more spaces, one or more characters, the equal sign, and one or more non space characters.
s = "There was a cow at home. home=mary cowname=betsy date=10-jan-2013"
all_matches = re.findall(r'([\w+\s]+\.{1})|((\s+\w+)=(\S+))',s)
d={}
for i in np.arange(len(all_matches)):
#print(all_matches[i])
if all_matches[i][0] != "":
d["theSentence"]=all_matches[i][0]
else:
d[all_matches[i][2]]=all_matches[i][3]
print(d)
output:
{'theSentence': 'There was a cow at home.', ' home': 'mary', ' cowname': 'betsy', ' date': '10-jan-2013'}

Related

How to use multiple patterns for multiple replacements with Python module re? [duplicate]

I would like to use the .replace function to replace multiple strings.
I currently have
string.replace("condition1", "")
but would like to have something like
string.replace("condition1", "").replace("condition2", "text")
although that does not feel like good syntax
what is the proper way to do this? kind of like how in grep/regex you can do \1 and \2 to replace fields to certain search strings

Here is a short example that should do the trick with regular expressions:
import re
rep = {"condition1": "", "condition2": "text"} # define desired replacements here
# use these three lines to do the replacement
rep = dict((re.escape(k), v) for k, v in rep.iteritems())
#Python 3 renamed dict.iteritems to dict.items so use rep.items() for latest versions
pattern = re.compile("|".join(rep.keys()))
text = pattern.sub(lambda m: rep[re.escape(m.group(0))], text)
For example:
>>> pattern.sub(lambda m: rep[re.escape(m.group(0))], "(condition1) and --condition2--")
'() and --text--'

You could just make a nice little looping function.
def replace_all(text, dic):
for i, j in dic.iteritems():
text = text.replace(i, j)
return text
where text is the complete string and dic is a dictionary — each definition is a string that will replace a match to the term.
Note: in Python 3, iteritems() has been replaced with items()
Careful: Python dictionaries don't have a reliable order for iteration. This solution only solves your problem if:
order of replacements is irrelevant
it's ok for a replacement to change the results of previous replacements
Update: The above statement related to ordering of insertion does not apply to Python versions greater than or equal to 3.6, as standard dicts were changed to use insertion ordering for iteration.
For instance:
d = { "cat": "dog", "dog": "pig"}
my_sentence = "This is my cat and this is my dog."
replace_all(my_sentence, d)
print(my_sentence)
Possible output #1:
"This is my pig and this is my pig."
Possible output #2
"This is my dog and this is my pig."
One possible fix is to use an OrderedDict.
from collections import OrderedDict
def replace_all(text, dic):
for i, j in dic.items():
text = text.replace(i, j)
return text
od = OrderedDict([("cat", "dog"), ("dog", "pig")])
my_sentence = "This is my cat and this is my dog."
replace_all(my_sentence, od)
print(my_sentence)
Output:
"This is my pig and this is my pig."
Careful #2: Inefficient if your text string is too big or there are many pairs in the dictionary.

Why not one solution like this?
s = "The quick brown fox jumps over the lazy dog"
for r in (("brown", "red"), ("lazy", "quick")):
s = s.replace(*r)
#output will be: The quick red fox jumps over the quick dog

Here is a variant of the first solution using reduce, in case you like being functional. :)
repls = {'hello' : 'goodbye', 'world' : 'earth'}
s = 'hello, world'
reduce(lambda a, kv: a.replace(*kv), repls.iteritems(), s)
martineau's even better version:
repls = ('hello', 'goodbye'), ('world', 'earth')
s = 'hello, world'
reduce(lambda a, kv: a.replace(*kv), repls, s)

This is just a more concise recap of F.J and MiniQuark great answers and last but decisive improvement by bgusach. All you need to achieve multiple simultaneous string replacements is the following function:
def multiple_replace(string, rep_dict):
pattern = re.compile("|".join([re.escape(k) for k in sorted(rep_dict,key=len,reverse=True)]), flags=re.DOTALL)
return pattern.sub(lambda x: rep_dict[x.group(0)], string)
Usage:
>>>multiple_replace("Do you like cafe? No, I prefer tea.", {'cafe':'tea', 'tea':'cafe', 'like':'prefer'})
'Do you prefer tea? No, I prefer cafe.'
If you wish, you can make your own dedicated replacement functions starting from this simpler one.

Starting Python 3.8, and the introduction of assignment expressions (PEP 572) (:= operator), we can apply the replacements within a list comprehension:
# text = "The quick brown fox jumps over the lazy dog"
# replacements = [("brown", "red"), ("lazy", "quick")]
[text := text.replace(a, b) for a, b in replacements]
# text = 'The quick red fox jumps over the quick dog'

I built this upon F.J.s excellent answer:
import re
def multiple_replacer(*key_values):
replace_dict = dict(key_values)
replacement_function = lambda match: replace_dict[match.group(0)]
pattern = re.compile("|".join([re.escape(k) for k, v in key_values]), re.M)
return lambda string: pattern.sub(replacement_function, string)
def multiple_replace(string, *key_values):
return multiple_replacer(*key_values)(string)
One shot usage:
>>> replacements = (u"café", u"tea"), (u"tea", u"café"), (u"like", u"love")
>>> print multiple_replace(u"Do you like café? No, I prefer tea.", *replacements)
Do you love tea? No, I prefer café.
Note that since replacement is done in just one pass, "café" changes to "tea", but it does not change back to "café".
If you need to do the same replacement many times, you can create a replacement function easily:
>>> my_escaper = multiple_replacer(('"','\\"'), ('\t', '\\t'))
>>> many_many_strings = (u'This text will be escaped by "my_escaper"',
u'Does this work?\tYes it does',
u'And can we span\nmultiple lines?\t"Yes\twe\tcan!"')
>>> for line in many_many_strings:
... print my_escaper(line)
...
This text will be escaped by \"my_escaper\"
Does this work?\tYes it does
And can we span
multiple lines?\t\"Yes\twe\tcan!\"
Improvements:
turned code into a function
added multiline support
fixed a bug in escaping
easy to create a function for a specific multiple replacement
Enjoy! :-)

I would like to propose the usage of string templates. Just place the string to be replaced in a dictionary and all is set! Example from docs.python.org
>>> from string import Template
>>> s = Template('$who likes $what')
>>> s.substitute(who='tim', what='kung pao')
'tim likes kung pao'
>>> d = dict(who='tim')
>>> Template('Give $who $100').substitute(d)
Traceback (most recent call last):
[...]
ValueError: Invalid placeholder in string: line 1, col 10
>>> Template('$who likes $what').substitute(d)
Traceback (most recent call last):
[...]
KeyError: 'what'
>>> Template('$who likes $what').safe_substitute(d)
'tim likes $what'

In my case, I needed a simple replacing of unique keys with names, so I thought this up:
a = 'This is a test string.'
b = {'i': 'I', 's': 'S'}
for x,y in b.items():
a = a.replace(x, y)
>>> a
'ThIS IS a teSt StrIng.'

Here my $0.02. It is based on Andrew Clark's answer, just a little bit clearer, and it also covers the case when a string to replace is a substring of another string to replace (longer string wins)
def multireplace(string, replacements):
"""
Given a string and a replacement map, it returns the replaced string.
:param str string: string to execute replacements on
:param dict replacements: replacement dictionary {value to find: value to replace}
:rtype: str
"""
# Place longer ones first to keep shorter substrings from matching
# where the longer ones should take place
# For instance given the replacements {'ab': 'AB', 'abc': 'ABC'} against
# the string 'hey abc', it should produce 'hey ABC' and not 'hey ABc'
substrs = sorted(replacements, key=len, reverse=True)
# Create a big OR regex that matches any of the substrings to replace
regexp = re.compile('|'.join(map(re.escape, substrs)))
# For each match, look up the new string in the replacements
return regexp.sub(lambda match: replacements[match.group(0)], string)
It is in this this gist, feel free to modify it if you have any proposal.

I needed a solution where the strings to be replaced can be a regular expressions,
for example to help in normalizing a long text by replacing multiple whitespace characters with a single one. Building on a chain of answers from others, including MiniQuark and mmj, this is what I came up with:
def multiple_replace(string, reps, re_flags = 0):
""" Transforms string, replacing keys from re_str_dict with values.
reps: dictionary, or list of key-value pairs (to enforce ordering;
earlier items have higher priority).
Keys are used as regular expressions.
re_flags: interpretation of regular expressions, such as re.DOTALL
"""
if isinstance(reps, dict):
reps = reps.items()
pattern = re.compile("|".join("(?P<_%d>%s)" % (i, re_str[0])
for i, re_str in enumerate(reps)),
re_flags)
return pattern.sub(lambda x: reps[int(x.lastgroup[1:])][1], string)
It works for the examples given in other answers, for example:
>>> multiple_replace("(condition1) and --condition2--",
... {"condition1": "", "condition2": "text"})
'() and --text--'
>>> multiple_replace('hello, world', {'hello' : 'goodbye', 'world' : 'earth'})
'goodbye, earth'
>>> multiple_replace("Do you like cafe? No, I prefer tea.",
... {'cafe': 'tea', 'tea': 'cafe', 'like': 'prefer'})
'Do you prefer tea? No, I prefer cafe.'
The main thing for me is that you can use regular expressions as well, for example to replace whole words only, or to normalize white space:
>>> s = "I don't want to change this name:\n Philip II of Spain"
>>> re_str_dict = {r'\bI\b': 'You', r'[\n\t ]+': ' '}
>>> multiple_replace(s, re_str_dict)
"You don't want to change this name: Philip II of Spain"
If you want to use the dictionary keys as normal strings,
you can escape those before calling multiple_replace using e.g. this function:
def escape_keys(d):
""" transform dictionary d by applying re.escape to the keys """
return dict((re.escape(k), v) for k, v in d.items())
>>> multiple_replace(s, escape_keys(re_str_dict))
"I don't want to change this name:\n Philip II of Spain"
The following function can help in finding erroneous regular expressions among your dictionary keys (since the error message from multiple_replace isn't very telling):
def check_re_list(re_list):
""" Checks if each regular expression in list is well-formed. """
for i, e in enumerate(re_list):
try:
re.compile(e)
except (TypeError, re.error):
print("Invalid regular expression string "
"at position {}: '{}'".format(i, e))
>>> check_re_list(re_str_dict.keys())
Note that it does not chain the replacements, instead performs them simultaneously. This makes it more efficient without constraining what it can do. To mimic the effect of chaining, you may just need to add more string-replacement pairs and ensure the expected ordering of the pairs:
>>> multiple_replace("button", {"but": "mut", "mutton": "lamb"})
'mutton'
>>> multiple_replace("button", [("button", "lamb"),
... ("but", "mut"), ("mutton", "lamb")])
'lamb'

Note: Test your case, see comments.
Here's a sample which is more efficient on long strings with many small replacements.
source = "Here is foo, it does moo!"
replacements = {
'is': 'was', # replace 'is' with 'was'
'does': 'did',
'!': '?'
}
def replace(source, replacements):
finder = re.compile("|".join(re.escape(k) for k in replacements.keys())) # matches every string we want replaced
result = []
pos = 0
while True:
match = finder.search(source, pos)
if match:
# cut off the part up until match
result.append(source[pos : match.start()])
# cut off the matched part and replace it in place
result.append(replacements[source[match.start() : match.end()]])
pos = match.end()
else:
# the rest after the last match
result.append(source[pos:])
break
return "".join(result)
print replace(source, replacements)
The point is in avoiding many concatenations of long strings. We chop the source string to fragments, replacing some of the fragments as we form the list, and then join the whole thing back into a string.

I was doing a similar exercise in one of my school homework. This was my solution
dictionary = {1: ['hate', 'love'],
2: ['salad', 'burger'],
3: ['vegetables', 'pizza']}
def normalize(text):
for i in dictionary:
text = text.replace(dictionary[i][0], dictionary[i][1])
return text
See result yourself on test string
string_to_change = 'I hate salad and vegetables'
print(normalize(string_to_change))

You can use the pandas library and the replace function which supports both exact matches as well as regex replacements. For example:
df = pd.DataFrame({'text': ['Billy is going to visit Rome in November', 'I was born in 10/10/2010', 'I will be there at 20:00']})
to_replace=['Billy','Rome','January|February|March|April|May|June|July|August|September|October|November|December', '\d{2}:\d{2}', '\d{2}/\d{2}/\d{4}']
replace_with=['name','city','month','time', 'date']
print(df.text.replace(to_replace, replace_with, regex=True))
And the modified text is:
0 name is going to visit city in month
1 I was born in date
2 I will be there at time
You can find an example here. Notice that the replacements on the text are done with the order they appear in the lists

I was struggling with this problem as well. With many substitutions regular expressions struggle, and are about four times slower than looping string.replace (in my experiment conditions).
You should absolutely try using the Flashtext library (blog post here, Github here). In my case it was a bit over two orders of magnitude faster, from 1.8 s to 0.015 s (regular expressions took 7.7 s) for each document.
It is easy to find use examples in the links above, but this is a working example:
from flashtext import KeywordProcessor
self.processor = KeywordProcessor(case_sensitive=False)
for k, v in self.my_dict.items():
self.processor.add_keyword(k, v)
new_string = self.processor.replace_keywords(string)
Note that Flashtext makes substitutions in a single pass (to avoid a --> b and b --> c translating 'a' into 'c'). Flashtext also looks for whole words (so 'is' will not match 'this'). It works fine if your target is several words (replacing 'This is' by 'Hello').

I face similar problem today, where I had to do use .replace() method multiple times but it didn't feel good to me. So I did something like this:
REPLACEMENTS = {'<': '<', '>': '>', '&': '&'}
event_title = ''.join([REPLACEMENTS.get(c,c) for c in event['summary']])

I feel this question needs a single-line recursive lambda function answer for completeness, just because. So there:
>>> mrep = lambda s, d: s if not d else mrep(s.replace(*d.popitem()), d)
Usage:
>>> mrep('abcabc', {'a': '1', 'c': '2'})
'1b21b2'
Notes:
This consumes the input dictionary.
Python dicts preserve key order as of 3.6; corresponding caveats in other answers are not relevant anymore. For backward compatibility one could resort to a tuple-based version:
>>> mrep = lambda s, d: s if not d else mrep(s.replace(*d.pop()), d)
>>> mrep('abcabc', [('a', '1'), ('c', '2')])
Note: As with all recursive functions in python, too large recursion depth (i.e. too large replacement dictionaries) will result in an error. See e.g. here.

You should really not do it this way, but I just find it way too cool:
>>> replacements = {'cond1':'text1', 'cond2':'text2'}
>>> cmd = 'answer = s'
>>> for k,v in replacements.iteritems():
>>> cmd += ".replace(%s, %s)" %(k,v)
>>> exec(cmd)
Now, answer is the result of all the replacements in turn
again, this is very hacky and is not something that you should be using regularly. But it's just nice to know that you can do something like this if you ever need to.

For replace only one character, use the translate and str.maketrans is my favorite method.
tl;dr > result_string = your_string.translate(str.maketrans(dict_mapping))
demo
my_string = 'This is a test string.'
dict_mapping = {'i': 's', 's': 'S'}
result_good = my_string.translate(str.maketrans(dict_mapping))
result_bad = my_string
for x, y in dict_mapping.items():
result_bad = result_bad.replace(x, y)
print(result_good) # ThsS sS a teSt Strsng.
print(result_bad) # ThSS SS a teSt StrSng.

I don't know about speed but this is my workaday quick fix:
reduce(lambda a, b: a.replace(*b)
, [('o','W'), ('t','X')] #iterable of pairs: (oldval, newval)
, 'tomato' #The string from which to replace values
)
... but I like the #1 regex answer above. Note - if one new value is a substring of another one then the operation is not commutative.

Here is a version with support for basic regex replacement. The main restriction is that expressions must not contain subgroups, and there may be some edge cases:
Code based on #bgusach and others
import re
class StringReplacer:
def __init__(self, replacements, ignore_case=False):
patterns = sorted(replacements, key=len, reverse=True)
self.replacements = [replacements[k] for k in patterns]
re_mode = re.IGNORECASE if ignore_case else 0
self.pattern = re.compile('|'.join(("({})".format(p) for p in patterns)), re_mode)
def tr(matcher):
index = next((index for index,value in enumerate(matcher.groups()) if value), None)
return self.replacements[index]
self.tr = tr
def __call__(self, string):
return self.pattern.sub(self.tr, string)
Tests
table = {
"aaa" : "[This is three a]",
"b+" : "[This is one or more b]",
r"<\w+>" : "[This is a tag]"
}
replacer = StringReplacer(table, True)
sample1 = "whatever bb, aaa, <star> BBB <end>"
print(replacer(sample1))
# output:
# whatever [This is one or more b], [This is three a], [This is a tag] [This is one or more b] [This is a tag]
The trick is to identify the matched group by its position. It is not super efficient (O(n)), but it works.
index = next((index for index,value in enumerate(matcher.groups()) if value), None)
Replacement is done in one pass.

Starting from the precious answer of Andrew i developed a script that loads the dictionary from a file and elaborates all the files on the opened folder to do the replacements. The script loads the mappings from an external file in which you can set the separator. I'm a beginner but i found this script very useful when doing multiple substitutions in multiple files. It loaded a dictionary with more than 1000 entries in seconds. It is not elegant but it worked for me
import glob
import re
mapfile = input("Enter map file name with extension eg. codifica.txt: ")
sep = input("Enter map file column separator eg. |: ")
mask = input("Enter search mask with extension eg. 2010*txt for all files to be processed: ")
suff = input("Enter suffix with extension eg. _NEW.txt for newly generated files: ")
rep = {} # creation of empy dictionary
with open(mapfile) as temprep: # loading of definitions in the dictionary using input file, separator is prompted
for line in temprep:
(key, val) = line.strip('\n').split(sep)
rep[key] = val
for filename in glob.iglob(mask): # recursion on all the files with the mask prompted
with open (filename, "r") as textfile: # load each file in the variable text
text = textfile.read()
# start replacement
#rep = dict((re.escape(k), v) for k, v in rep.items()) commented to enable the use in the mapping of re reserved characters
pattern = re.compile("|".join(rep.keys()))
text = pattern.sub(lambda m: rep[m.group(0)], text)
#write of te output files with the prompted suffice
target = open(filename[:-4]+"_NEW.txt", "w")
target.write(text)
target.close()

this is my solution to the problem. I used it in a chatbot to replace the different words at once.
def mass_replace(text, dct):
new_string = ""
old_string = text
while len(old_string) > 0:
s = ""
sk = ""
for k in dct.keys():
if old_string.startswith(k):
s = dct[k]
sk = k
if s:
new_string+=s
old_string = old_string[len(sk):]
else:
new_string+=old_string[0]
old_string = old_string[1:]
return new_string
print mass_replace("The dog hunts the cat", {"dog":"cat", "cat":"dog"})
this will become The cat hunts the dog

Another example :
Input list
error_list = ['[br]', '[ex]', 'Something']
words = ['how', 'much[ex]', 'is[br]', 'the', 'fish[br]', 'noSomething', 'really']
The desired output would be
words = ['how', 'much', 'is', 'the', 'fish', 'no', 'really']
Code :
[n[0][0] if len(n[0]) else n[1] for n in [[[w.replace(e,"") for e in error_list if e in w],w] for w in words]]

My approach would be to first tokenize the string, then decide for each token whether to include it or not.
Potentially, might be more performant, if we can assume O(1) lookup for a hashmap/set:
remove_words = {"we", "this"}
target_sent = "we should modify this string"
target_sent_words = target_sent.split()
filtered_sent = " ".join(list(filter(lambda word: word not in remove_words, target_sent_words)))
filtered_sent is now 'should modify string'

Or just for a fast hack:
for line in to_read:
read_buffer = line
stripped_buffer1 = read_buffer.replace("term1", " ")
stripped_buffer2 = stripped_buffer1.replace("term2", " ")
write_to_file = to_write.write(stripped_buffer2)

Here is another way of doing it with a dictionary:
listA="The cat jumped over the house".split()
modify = {word:word for number,word in enumerate(listA)}
modify["cat"],modify["jumped"]="dog","walked"
print " ".join(modify[x] for x in listA)

sentence='its some sentence with a something text'
def replaceAll(f,Array1,Array2):
if len(Array1)==len(Array2):
for x in range(len(Array1)):
return f.replace(Array1[x],Array2[x])
newSentence=replaceAll(sentence,['a','sentence','something'],['another','sentence','something something'])
print(newSentence)

Can I pass a list of strings as "old" in str.replace(old, new[, count])? [duplicate]

I would like to use the .replace function to replace multiple strings.
I currently have
string.replace("condition1", "")
but would like to have something like
string.replace("condition1", "").replace("condition2", "text")
although that does not feel like good syntax
what is the proper way to do this? kind of like how in grep/regex you can do \1 and \2 to replace fields to certain search strings

Here is a short example that should do the trick with regular expressions:
import re
rep = {"condition1": "", "condition2": "text"} # define desired replacements here
# use these three lines to do the replacement
rep = dict((re.escape(k), v) for k, v in rep.iteritems())
#Python 3 renamed dict.iteritems to dict.items so use rep.items() for latest versions
pattern = re.compile("|".join(rep.keys()))
text = pattern.sub(lambda m: rep[re.escape(m.group(0))], text)
For example:
>>> pattern.sub(lambda m: rep[re.escape(m.group(0))], "(condition1) and --condition2--")
'() and --text--'

You could just make a nice little looping function.
def replace_all(text, dic):
for i, j in dic.iteritems():
text = text.replace(i, j)
return text
where text is the complete string and dic is a dictionary — each definition is a string that will replace a match to the term.
Note: in Python 3, iteritems() has been replaced with items()
Careful: Python dictionaries don't have a reliable order for iteration. This solution only solves your problem if:
order of replacements is irrelevant
it's ok for a replacement to change the results of previous replacements
Update: The above statement related to ordering of insertion does not apply to Python versions greater than or equal to 3.6, as standard dicts were changed to use insertion ordering for iteration.
For instance:
d = { "cat": "dog", "dog": "pig"}
my_sentence = "This is my cat and this is my dog."
replace_all(my_sentence, d)
print(my_sentence)
Possible output #1:
"This is my pig and this is my pig."
Possible output #2
"This is my dog and this is my pig."
One possible fix is to use an OrderedDict.
from collections import OrderedDict
def replace_all(text, dic):
for i, j in dic.items():
text = text.replace(i, j)
return text
od = OrderedDict([("cat", "dog"), ("dog", "pig")])
my_sentence = "This is my cat and this is my dog."
replace_all(my_sentence, od)
print(my_sentence)
Output:
"This is my pig and this is my pig."
Careful #2: Inefficient if your text string is too big or there are many pairs in the dictionary.

Why not one solution like this?
s = "The quick brown fox jumps over the lazy dog"
for r in (("brown", "red"), ("lazy", "quick")):
s = s.replace(*r)
#output will be: The quick red fox jumps over the quick dog

Here is a variant of the first solution using reduce, in case you like being functional. :)
repls = {'hello' : 'goodbye', 'world' : 'earth'}
s = 'hello, world'
reduce(lambda a, kv: a.replace(*kv), repls.iteritems(), s)
martineau's even better version:
repls = ('hello', 'goodbye'), ('world', 'earth')
s = 'hello, world'
reduce(lambda a, kv: a.replace(*kv), repls, s)

This is just a more concise recap of F.J and MiniQuark great answers and last but decisive improvement by bgusach. All you need to achieve multiple simultaneous string replacements is the following function:
def multiple_replace(string, rep_dict):
pattern = re.compile("|".join([re.escape(k) for k in sorted(rep_dict,key=len,reverse=True)]), flags=re.DOTALL)
return pattern.sub(lambda x: rep_dict[x.group(0)], string)
Usage:
>>>multiple_replace("Do you like cafe? No, I prefer tea.", {'cafe':'tea', 'tea':'cafe', 'like':'prefer'})
'Do you prefer tea? No, I prefer cafe.'
If you wish, you can make your own dedicated replacement functions starting from this simpler one.

Starting Python 3.8, and the introduction of assignment expressions (PEP 572) (:= operator), we can apply the replacements within a list comprehension:
# text = "The quick brown fox jumps over the lazy dog"
# replacements = [("brown", "red"), ("lazy", "quick")]
[text := text.replace(a, b) for a, b in replacements]
# text = 'The quick red fox jumps over the quick dog'

I built this upon F.J.s excellent answer:
import re
def multiple_replacer(*key_values):
replace_dict = dict(key_values)
replacement_function = lambda match: replace_dict[match.group(0)]
pattern = re.compile("|".join([re.escape(k) for k, v in key_values]), re.M)
return lambda string: pattern.sub(replacement_function, string)
def multiple_replace(string, *key_values):
return multiple_replacer(*key_values)(string)
One shot usage:
>>> replacements = (u"café", u"tea"), (u"tea", u"café"), (u"like", u"love")
>>> print multiple_replace(u"Do you like café? No, I prefer tea.", *replacements)
Do you love tea? No, I prefer café.
Note that since replacement is done in just one pass, "café" changes to "tea", but it does not change back to "café".
If you need to do the same replacement many times, you can create a replacement function easily:
>>> my_escaper = multiple_replacer(('"','\\"'), ('\t', '\\t'))
>>> many_many_strings = (u'This text will be escaped by "my_escaper"',
u'Does this work?\tYes it does',
u'And can we span\nmultiple lines?\t"Yes\twe\tcan!"')
>>> for line in many_many_strings:
... print my_escaper(line)
...
This text will be escaped by \"my_escaper\"
Does this work?\tYes it does
And can we span
multiple lines?\t\"Yes\twe\tcan!\"
Improvements:
turned code into a function
added multiline support
fixed a bug in escaping
easy to create a function for a specific multiple replacement
Enjoy! :-)

I would like to propose the usage of string templates. Just place the string to be replaced in a dictionary and all is set! Example from docs.python.org
>>> from string import Template
>>> s = Template('$who likes $what')
>>> s.substitute(who='tim', what='kung pao')
'tim likes kung pao'
>>> d = dict(who='tim')
>>> Template('Give $who $100').substitute(d)
Traceback (most recent call last):
[...]
ValueError: Invalid placeholder in string: line 1, col 10
>>> Template('$who likes $what').substitute(d)
Traceback (most recent call last):
[...]
KeyError: 'what'
>>> Template('$who likes $what').safe_substitute(d)
'tim likes $what'

In my case, I needed a simple replacing of unique keys with names, so I thought this up:
a = 'This is a test string.'
b = {'i': 'I', 's': 'S'}
for x,y in b.items():
a = a.replace(x, y)
>>> a
'ThIS IS a teSt StrIng.'

Here my $0.02. It is based on Andrew Clark's answer, just a little bit clearer, and it also covers the case when a string to replace is a substring of another string to replace (longer string wins)
def multireplace(string, replacements):
"""
Given a string and a replacement map, it returns the replaced string.
:param str string: string to execute replacements on
:param dict replacements: replacement dictionary {value to find: value to replace}
:rtype: str
"""
# Place longer ones first to keep shorter substrings from matching
# where the longer ones should take place
# For instance given the replacements {'ab': 'AB', 'abc': 'ABC'} against
# the string 'hey abc', it should produce 'hey ABC' and not 'hey ABc'
substrs = sorted(replacements, key=len, reverse=True)
# Create a big OR regex that matches any of the substrings to replace
regexp = re.compile('|'.join(map(re.escape, substrs)))
# For each match, look up the new string in the replacements
return regexp.sub(lambda match: replacements[match.group(0)], string)
It is in this this gist, feel free to modify it if you have any proposal.

I needed a solution where the strings to be replaced can be a regular expressions,
for example to help in normalizing a long text by replacing multiple whitespace characters with a single one. Building on a chain of answers from others, including MiniQuark and mmj, this is what I came up with:
def multiple_replace(string, reps, re_flags = 0):
""" Transforms string, replacing keys from re_str_dict with values.
reps: dictionary, or list of key-value pairs (to enforce ordering;
earlier items have higher priority).
Keys are used as regular expressions.
re_flags: interpretation of regular expressions, such as re.DOTALL
"""
if isinstance(reps, dict):
reps = reps.items()
pattern = re.compile("|".join("(?P<_%d>%s)" % (i, re_str[0])
for i, re_str in enumerate(reps)),
re_flags)
return pattern.sub(lambda x: reps[int(x.lastgroup[1:])][1], string)
It works for the examples given in other answers, for example:
>>> multiple_replace("(condition1) and --condition2--",
... {"condition1": "", "condition2": "text"})
'() and --text--'
>>> multiple_replace('hello, world', {'hello' : 'goodbye', 'world' : 'earth'})
'goodbye, earth'
>>> multiple_replace("Do you like cafe? No, I prefer tea.",
... {'cafe': 'tea', 'tea': 'cafe', 'like': 'prefer'})
'Do you prefer tea? No, I prefer cafe.'
The main thing for me is that you can use regular expressions as well, for example to replace whole words only, or to normalize white space:
>>> s = "I don't want to change this name:\n Philip II of Spain"
>>> re_str_dict = {r'\bI\b': 'You', r'[\n\t ]+': ' '}
>>> multiple_replace(s, re_str_dict)
"You don't want to change this name: Philip II of Spain"
If you want to use the dictionary keys as normal strings,
you can escape those before calling multiple_replace using e.g. this function:
def escape_keys(d):
""" transform dictionary d by applying re.escape to the keys """
return dict((re.escape(k), v) for k, v in d.items())
>>> multiple_replace(s, escape_keys(re_str_dict))
"I don't want to change this name:\n Philip II of Spain"
The following function can help in finding erroneous regular expressions among your dictionary keys (since the error message from multiple_replace isn't very telling):
def check_re_list(re_list):
""" Checks if each regular expression in list is well-formed. """
for i, e in enumerate(re_list):
try:
re.compile(e)
except (TypeError, re.error):
print("Invalid regular expression string "
"at position {}: '{}'".format(i, e))
>>> check_re_list(re_str_dict.keys())
Note that it does not chain the replacements, instead performs them simultaneously. This makes it more efficient without constraining what it can do. To mimic the effect of chaining, you may just need to add more string-replacement pairs and ensure the expected ordering of the pairs:
>>> multiple_replace("button", {"but": "mut", "mutton": "lamb"})
'mutton'
>>> multiple_replace("button", [("button", "lamb"),
... ("but", "mut"), ("mutton", "lamb")])
'lamb'

Note: Test your case, see comments.
Here's a sample which is more efficient on long strings with many small replacements.
source = "Here is foo, it does moo!"
replacements = {
'is': 'was', # replace 'is' with 'was'
'does': 'did',
'!': '?'
}
def replace(source, replacements):
finder = re.compile("|".join(re.escape(k) for k in replacements.keys())) # matches every string we want replaced
result = []
pos = 0
while True:
match = finder.search(source, pos)
if match:
# cut off the part up until match
result.append(source[pos : match.start()])
# cut off the matched part and replace it in place
result.append(replacements[source[match.start() : match.end()]])
pos = match.end()
else:
# the rest after the last match
result.append(source[pos:])
break
return "".join(result)
print replace(source, replacements)
The point is in avoiding many concatenations of long strings. We chop the source string to fragments, replacing some of the fragments as we form the list, and then join the whole thing back into a string.

I was doing a similar exercise in one of my school homework. This was my solution
dictionary = {1: ['hate', 'love'],
2: ['salad', 'burger'],
3: ['vegetables', 'pizza']}
def normalize(text):
for i in dictionary:
text = text.replace(dictionary[i][0], dictionary[i][1])
return text
See result yourself on test string
string_to_change = 'I hate salad and vegetables'
print(normalize(string_to_change))

You can use the pandas library and the replace function which supports both exact matches as well as regex replacements. For example:
df = pd.DataFrame({'text': ['Billy is going to visit Rome in November', 'I was born in 10/10/2010', 'I will be there at 20:00']})
to_replace=['Billy','Rome','January|February|March|April|May|June|July|August|September|October|November|December', '\d{2}:\d{2}', '\d{2}/\d{2}/\d{4}']
replace_with=['name','city','month','time', 'date']
print(df.text.replace(to_replace, replace_with, regex=True))
And the modified text is:
0 name is going to visit city in month
1 I was born in date
2 I will be there at time
You can find an example here. Notice that the replacements on the text are done with the order they appear in the lists

I was struggling with this problem as well. With many substitutions regular expressions struggle, and are about four times slower than looping string.replace (in my experiment conditions).
You should absolutely try using the Flashtext library (blog post here, Github here). In my case it was a bit over two orders of magnitude faster, from 1.8 s to 0.015 s (regular expressions took 7.7 s) for each document.
It is easy to find use examples in the links above, but this is a working example:
from flashtext import KeywordProcessor
self.processor = KeywordProcessor(case_sensitive=False)
for k, v in self.my_dict.items():
self.processor.add_keyword(k, v)
new_string = self.processor.replace_keywords(string)
Note that Flashtext makes substitutions in a single pass (to avoid a --> b and b --> c translating 'a' into 'c'). Flashtext also looks for whole words (so 'is' will not match 'this'). It works fine if your target is several words (replacing 'This is' by 'Hello').

I face similar problem today, where I had to do use .replace() method multiple times but it didn't feel good to me. So I did something like this:
REPLACEMENTS = {'<': '<', '>': '>', '&': '&'}
event_title = ''.join([REPLACEMENTS.get(c,c) for c in event['summary']])

I feel this question needs a single-line recursive lambda function answer for completeness, just because. So there:
>>> mrep = lambda s, d: s if not d else mrep(s.replace(*d.popitem()), d)
Usage:
>>> mrep('abcabc', {'a': '1', 'c': '2'})
'1b21b2'
Notes:
This consumes the input dictionary.
Python dicts preserve key order as of 3.6; corresponding caveats in other answers are not relevant anymore. For backward compatibility one could resort to a tuple-based version:
>>> mrep = lambda s, d: s if not d else mrep(s.replace(*d.pop()), d)
>>> mrep('abcabc', [('a', '1'), ('c', '2')])
Note: As with all recursive functions in python, too large recursion depth (i.e. too large replacement dictionaries) will result in an error. See e.g. here.

You should really not do it this way, but I just find it way too cool:
>>> replacements = {'cond1':'text1', 'cond2':'text2'}
>>> cmd = 'answer = s'
>>> for k,v in replacements.iteritems():
>>> cmd += ".replace(%s, %s)" %(k,v)
>>> exec(cmd)
Now, answer is the result of all the replacements in turn
again, this is very hacky and is not something that you should be using regularly. But it's just nice to know that you can do something like this if you ever need to.

For replace only one character, use the translate and str.maketrans is my favorite method.
tl;dr > result_string = your_string.translate(str.maketrans(dict_mapping))
demo
my_string = 'This is a test string.'
dict_mapping = {'i': 's', 's': 'S'}
result_good = my_string.translate(str.maketrans(dict_mapping))
result_bad = my_string
for x, y in dict_mapping.items():
result_bad = result_bad.replace(x, y)
print(result_good) # ThsS sS a teSt Strsng.
print(result_bad) # ThSS SS a teSt StrSng.

I don't know about speed but this is my workaday quick fix:
reduce(lambda a, b: a.replace(*b)
, [('o','W'), ('t','X')] #iterable of pairs: (oldval, newval)
, 'tomato' #The string from which to replace values
)
... but I like the #1 regex answer above. Note - if one new value is a substring of another one then the operation is not commutative.

Here is a version with support for basic regex replacement. The main restriction is that expressions must not contain subgroups, and there may be some edge cases:
Code based on #bgusach and others
import re
class StringReplacer:
def __init__(self, replacements, ignore_case=False):
patterns = sorted(replacements, key=len, reverse=True)
self.replacements = [replacements[k] for k in patterns]
re_mode = re.IGNORECASE if ignore_case else 0
self.pattern = re.compile('|'.join(("({})".format(p) for p in patterns)), re_mode)
def tr(matcher):
index = next((index for index,value in enumerate(matcher.groups()) if value), None)
return self.replacements[index]
self.tr = tr
def __call__(self, string):
return self.pattern.sub(self.tr, string)
Tests
table = {
"aaa" : "[This is three a]",
"b+" : "[This is one or more b]",
r"<\w+>" : "[This is a tag]"
}
replacer = StringReplacer(table, True)
sample1 = "whatever bb, aaa, <star> BBB <end>"
print(replacer(sample1))
# output:
# whatever [This is one or more b], [This is three a], [This is a tag] [This is one or more b] [This is a tag]
The trick is to identify the matched group by its position. It is not super efficient (O(n)), but it works.
index = next((index for index,value in enumerate(matcher.groups()) if value), None)
Replacement is done in one pass.

Starting from the precious answer of Andrew i developed a script that loads the dictionary from a file and elaborates all the files on the opened folder to do the replacements. The script loads the mappings from an external file in which you can set the separator. I'm a beginner but i found this script very useful when doing multiple substitutions in multiple files. It loaded a dictionary with more than 1000 entries in seconds. It is not elegant but it worked for me
import glob
import re
mapfile = input("Enter map file name with extension eg. codifica.txt: ")
sep = input("Enter map file column separator eg. |: ")
mask = input("Enter search mask with extension eg. 2010*txt for all files to be processed: ")
suff = input("Enter suffix with extension eg. _NEW.txt for newly generated files: ")
rep = {} # creation of empy dictionary
with open(mapfile) as temprep: # loading of definitions in the dictionary using input file, separator is prompted
for line in temprep:
(key, val) = line.strip('\n').split(sep)
rep[key] = val
for filename in glob.iglob(mask): # recursion on all the files with the mask prompted
with open (filename, "r") as textfile: # load each file in the variable text
text = textfile.read()
# start replacement
#rep = dict((re.escape(k), v) for k, v in rep.items()) commented to enable the use in the mapping of re reserved characters
pattern = re.compile("|".join(rep.keys()))
text = pattern.sub(lambda m: rep[m.group(0)], text)
#write of te output files with the prompted suffice
target = open(filename[:-4]+"_NEW.txt", "w")
target.write(text)
target.close()

this is my solution to the problem. I used it in a chatbot to replace the different words at once.
def mass_replace(text, dct):
new_string = ""
old_string = text
while len(old_string) > 0:
s = ""
sk = ""
for k in dct.keys():
if old_string.startswith(k):
s = dct[k]
sk = k
if s:
new_string+=s
old_string = old_string[len(sk):]
else:
new_string+=old_string[0]
old_string = old_string[1:]
return new_string
print mass_replace("The dog hunts the cat", {"dog":"cat", "cat":"dog"})
this will become The cat hunts the dog

Another example :
Input list
error_list = ['[br]', '[ex]', 'Something']
words = ['how', 'much[ex]', 'is[br]', 'the', 'fish[br]', 'noSomething', 'really']
The desired output would be
words = ['how', 'much', 'is', 'the', 'fish', 'no', 'really']
Code :
[n[0][0] if len(n[0]) else n[1] for n in [[[w.replace(e,"") for e in error_list if e in w],w] for w in words]]

My approach would be to first tokenize the string, then decide for each token whether to include it or not.
Potentially, might be more performant, if we can assume O(1) lookup for a hashmap/set:
remove_words = {"we", "this"}
target_sent = "we should modify this string"
target_sent_words = target_sent.split()
filtered_sent = " ".join(list(filter(lambda word: word not in remove_words, target_sent_words)))
filtered_sent is now 'should modify string'

Or just for a fast hack:
for line in to_read:
read_buffer = line
stripped_buffer1 = read_buffer.replace("term1", " ")
stripped_buffer2 = stripped_buffer1.replace("term2", " ")
write_to_file = to_write.write(stripped_buffer2)

Here is another way of doing it with a dictionary:
listA="The cat jumped over the house".split()
modify = {word:word for number,word in enumerate(listA)}
modify["cat"],modify["jumped"]="dog","walked"
print " ".join(modify[x] for x in listA)

sentence='its some sentence with a something text'
def replaceAll(f,Array1,Array2):
if len(Array1)==len(Array2):
for x in range(len(Array1)):
return f.replace(Array1[x],Array2[x])
newSentence=replaceAll(sentence,['a','sentence','something'],['another','sentence','something something'])
print(newSentence)

Python dictionary replacement with space in key

I have a string and a dictionary, I have to replace every occurrence of the dict key in that text.
text = 'I have a smartphone and a Smart TV'
dict = {
'smartphone': 'toy',
'smart tv': 'junk'
}
If there is no space in keys, I will break the text into word and compare one by one with dict. Look like it took O(n). But now the key have space inside it so thing is more complected. Please suggest me the good way to do this and please notice the key may not match case with the text.
Update
I have think of this solution but it not efficient. O(m*n) or more...
for k,v in dict.iteritems():
text = text.replace(k,v) #or regex...

If the key word in the text is not close to each others (keyword other keyword) we may do this. Took O(n) to me >"<
def dict_replace(dictionary, text, strip_chars=None, replace_func=None):
"""
Replace word or word phrase in text with keyword in dictionary.
Arguments:
dictionary: dict with key:value, key should be in lower case
text: string to replace
strip_chars: string contain character to be strip out of each word
replace_func: function if exist will transform final replacement.
Must have 2 params as key and value
Return:
string
Example:
my_dict = {
"hello": "hallo",
"hallo": "hello", # Only one pass, don't worry
"smart tv": "http://google.com?q=smart+tv"
}
dict_replace(my_dict, "hello google smart tv",
replace_func=lambda k,v: '[%s](%s)'%(k,v))
"""
# First break word phrase in dictionary into single word
dictionary = dictionary.copy()
for key in dictionary.keys():
if ' ' in key:
key_parts = key.split()
for part in key_parts:
# Mark single word with False
if part not in dictionary:
dictionary[part] = False
# Break text into words and compare one by one
result = []
words = text.split()
words.append('')
last_match = '' # Last keyword (lower) match
original = '' # Last match in original
for word in words:
key_word = word.lower().strip(strip_chars) if \
strip_chars is not None else word.lower()
if key_word in dictionary:
last_match = last_match + ' ' + key_word if \
last_match != '' else key_word
original = original + ' ' + word if \
original != '' else word
else:
if last_match != '':
# If match whole word
if last_match in dictionary and dictionary[last_match] != False:
if replace_func is not None:
result.append(replace_func(original, dictionary[last_match]))
else:
result.append(dictionary[last_match])
else:
# Only match partial of keyword
match_parts = last_match.split(' ')
match_original = original.split(' ')
for i in xrange(0, len(match_parts)):
if match_parts[i] in dictionary and \
dictionary[match_parts[i]] != False:
if replace_func is not None:
result.append(replace_func(match_original[i], dictionary[match_parts[i]]))
else:
result.append(dictionary[match_parts[i]])
result.append(word)
last_match = ''
original = ''
return ' '.join(result)

If your keys have no spaces:
output = [dct[i] if i in dct else i for i in text.split()]
' '.join(output)
You should use dct instead of dict so it doesn't collide with the built in function dict()
This makes use of a dictionary comprehension, and a ternary operator
to filter the data.
If your keys do have spaces, you are correct:
for k,v in dct.iteritems():
string.replace('d', dct[d])
And yes, this time complexity will be m*n, as you have to iterate through the string every time for each key in dct.

Drop all dictionary keys and the input text to lower case, so the comparisons are easy. Now ...
for entry in my_dict:
if entry in text:
# process the match
This assumes that the dictionary is small enough to warrant the match. If, instead, the dictionary is large and the text is small, you'll need to take each word, then each 2-word phrase, and see whether they're in the dictionary.
Is that enough to get you going?

You need to test all the neighbor permutations from 1 (each individual word) to len(text) (the entire string). You can generate the neighbor permutations this way:
text = 'I have a smartphone and a Smart TV'
array = text.lower().split()
key_permutations = [" ".join(array[j:j + i]) for i in range(1, len(array) + 1) for j in range(0, len(array) - (i - 1))]
>>> key_permutations
['i', 'have', 'a', 'smartphone', 'and', 'a', 'smart', 'tv', 'i have', 'have a', 'a smartphone', 'smartphone and', 'and a', 'a smart', 'smart tv', 'i have a', 'have a smartphone', 'a smartphone and', 'smartphone and a', 'and a smart', 'a smart tv', 'i have a smartphone', 'have a smartphone and', 'a smartphone and a', 'smartphone and a smart', 'and a smart tv', 'i have a smartphone and', 'have a smartphone and a', 'a smartphone and a smart', 'smartphone and a smart tv', 'i have a smartphone and a', 'have a smartphone and a smart', 'a smartphone and a smart tv', 'i have a smartphone and a smart', 'have a smartphone and a smart tv', 'i have a smartphone and a smart tv']
Now we substitute through the dictionary:
import re
for permutation in key_permutations:
if permutation in dict:
text = re.sub(re.escape(permutation), dict[permutation], text, flags=re.IGNORECASE)
>>> text
'I have a toy and a junk'
Though you'll likely want to try the permutations in the reverse order, longest first, so more specific phrases have precedence over individual words.

You can do this pretty easily with regular expressions.
import re
text = 'I have a smartphone and a Smart TV'
dict = {
'smartphone': 'toy',
'smart tv': 'junk'
}
for k, v in dict.iteritems():
regex = re.compile(re.escape(k), flags=re.I)
text = regex.sub(v, text)
It still suffers from the problem of depending on processing order of the dict keys, if the replacement value for one item is part of the search term for another item.

Replacement of characters in a file (python) [duplicate]

I would like to use the .replace function to replace multiple strings.
I currently have
string.replace("condition1", "")
but would like to have something like
string.replace("condition1", "").replace("condition2", "text")
although that does not feel like good syntax
what is the proper way to do this? kind of like how in grep/regex you can do \1 and \2 to replace fields to certain search strings

Here is a short example that should do the trick with regular expressions:
import re
rep = {"condition1": "", "condition2": "text"} # define desired replacements here
# use these three lines to do the replacement
rep = dict((re.escape(k), v) for k, v in rep.iteritems())
#Python 3 renamed dict.iteritems to dict.items so use rep.items() for latest versions
pattern = re.compile("|".join(rep.keys()))
text = pattern.sub(lambda m: rep[re.escape(m.group(0))], text)
For example:
>>> pattern.sub(lambda m: rep[re.escape(m.group(0))], "(condition1) and --condition2--")
'() and --text--'

You could just make a nice little looping function.
def replace_all(text, dic):
for i, j in dic.iteritems():
text = text.replace(i, j)
return text
where text is the complete string and dic is a dictionary — each definition is a string that will replace a match to the term.
Note: in Python 3, iteritems() has been replaced with items()
Careful: Python dictionaries don't have a reliable order for iteration. This solution only solves your problem if:
order of replacements is irrelevant
it's ok for a replacement to change the results of previous replacements
Update: The above statement related to ordering of insertion does not apply to Python versions greater than or equal to 3.6, as standard dicts were changed to use insertion ordering for iteration.
For instance:
d = { "cat": "dog", "dog": "pig"}
my_sentence = "This is my cat and this is my dog."
replace_all(my_sentence, d)
print(my_sentence)
Possible output #1:
"This is my pig and this is my pig."
Possible output #2
"This is my dog and this is my pig."
One possible fix is to use an OrderedDict.
from collections import OrderedDict
def replace_all(text, dic):
for i, j in dic.items():
text = text.replace(i, j)
return text
od = OrderedDict([("cat", "dog"), ("dog", "pig")])
my_sentence = "This is my cat and this is my dog."
replace_all(my_sentence, od)
print(my_sentence)
Output:
"This is my pig and this is my pig."
Careful #2: Inefficient if your text string is too big or there are many pairs in the dictionary.

Why not one solution like this?
s = "The quick brown fox jumps over the lazy dog"
for r in (("brown", "red"), ("lazy", "quick")):
s = s.replace(*r)
#output will be: The quick red fox jumps over the quick dog

Here is a variant of the first solution using reduce, in case you like being functional. :)
repls = {'hello' : 'goodbye', 'world' : 'earth'}
s = 'hello, world'
reduce(lambda a, kv: a.replace(*kv), repls.iteritems(), s)
martineau's even better version:
repls = ('hello', 'goodbye'), ('world', 'earth')
s = 'hello, world'
reduce(lambda a, kv: a.replace(*kv), repls, s)

This is just a more concise recap of F.J and MiniQuark great answers and last but decisive improvement by bgusach. All you need to achieve multiple simultaneous string replacements is the following function:
def multiple_replace(string, rep_dict):
pattern = re.compile("|".join([re.escape(k) for k in sorted(rep_dict,key=len,reverse=True)]), flags=re.DOTALL)
return pattern.sub(lambda x: rep_dict[x.group(0)], string)
Usage:
>>>multiple_replace("Do you like cafe? No, I prefer tea.", {'cafe':'tea', 'tea':'cafe', 'like':'prefer'})
'Do you prefer tea? No, I prefer cafe.'
If you wish, you can make your own dedicated replacement functions starting from this simpler one.

Starting Python 3.8, and the introduction of assignment expressions (PEP 572) (:= operator), we can apply the replacements within a list comprehension:
# text = "The quick brown fox jumps over the lazy dog"
# replacements = [("brown", "red"), ("lazy", "quick")]
[text := text.replace(a, b) for a, b in replacements]
# text = 'The quick red fox jumps over the quick dog'

I built this upon F.J.s excellent answer:
import re
def multiple_replacer(*key_values):
replace_dict = dict(key_values)
replacement_function = lambda match: replace_dict[match.group(0)]
pattern = re.compile("|".join([re.escape(k) for k, v in key_values]), re.M)
return lambda string: pattern.sub(replacement_function, string)
def multiple_replace(string, *key_values):
return multiple_replacer(*key_values)(string)
One shot usage:
>>> replacements = (u"café", u"tea"), (u"tea", u"café"), (u"like", u"love")
>>> print multiple_replace(u"Do you like café? No, I prefer tea.", *replacements)
Do you love tea? No, I prefer café.
Note that since replacement is done in just one pass, "café" changes to "tea", but it does not change back to "café".
If you need to do the same replacement many times, you can create a replacement function easily:
>>> my_escaper = multiple_replacer(('"','\\"'), ('\t', '\\t'))
>>> many_many_strings = (u'This text will be escaped by "my_escaper"',
u'Does this work?\tYes it does',
u'And can we span\nmultiple lines?\t"Yes\twe\tcan!"')
>>> for line in many_many_strings:
... print my_escaper(line)
...
This text will be escaped by \"my_escaper\"
Does this work?\tYes it does
And can we span
multiple lines?\t\"Yes\twe\tcan!\"
Improvements:
turned code into a function
added multiline support
fixed a bug in escaping
easy to create a function for a specific multiple replacement
Enjoy! :-)

I would like to propose the usage of string templates. Just place the string to be replaced in a dictionary and all is set! Example from docs.python.org
>>> from string import Template
>>> s = Template('$who likes $what')
>>> s.substitute(who='tim', what='kung pao')
'tim likes kung pao'
>>> d = dict(who='tim')
>>> Template('Give $who $100').substitute(d)
Traceback (most recent call last):
[...]
ValueError: Invalid placeholder in string: line 1, col 10
>>> Template('$who likes $what').substitute(d)
Traceback (most recent call last):
[...]
KeyError: 'what'
>>> Template('$who likes $what').safe_substitute(d)
'tim likes $what'

In my case, I needed a simple replacing of unique keys with names, so I thought this up:
a = 'This is a test string.'
b = {'i': 'I', 's': 'S'}
for x,y in b.items():
a = a.replace(x, y)
>>> a
'ThIS IS a teSt StrIng.'

Here my $0.02. It is based on Andrew Clark's answer, just a little bit clearer, and it also covers the case when a string to replace is a substring of another string to replace (longer string wins)
def multireplace(string, replacements):
"""
Given a string and a replacement map, it returns the replaced string.
:param str string: string to execute replacements on
:param dict replacements: replacement dictionary {value to find: value to replace}
:rtype: str
"""
# Place longer ones first to keep shorter substrings from matching
# where the longer ones should take place
# For instance given the replacements {'ab': 'AB', 'abc': 'ABC'} against
# the string 'hey abc', it should produce 'hey ABC' and not 'hey ABc'
substrs = sorted(replacements, key=len, reverse=True)
# Create a big OR regex that matches any of the substrings to replace
regexp = re.compile('|'.join(map(re.escape, substrs)))
# For each match, look up the new string in the replacements
return regexp.sub(lambda match: replacements[match.group(0)], string)
It is in this this gist, feel free to modify it if you have any proposal.

I needed a solution where the strings to be replaced can be a regular expressions,
for example to help in normalizing a long text by replacing multiple whitespace characters with a single one. Building on a chain of answers from others, including MiniQuark and mmj, this is what I came up with:
def multiple_replace(string, reps, re_flags = 0):
""" Transforms string, replacing keys from re_str_dict with values.
reps: dictionary, or list of key-value pairs (to enforce ordering;
earlier items have higher priority).
Keys are used as regular expressions.
re_flags: interpretation of regular expressions, such as re.DOTALL
"""
if isinstance(reps, dict):
reps = reps.items()
pattern = re.compile("|".join("(?P<_%d>%s)" % (i, re_str[0])
for i, re_str in enumerate(reps)),
re_flags)
return pattern.sub(lambda x: reps[int(x.lastgroup[1:])][1], string)
It works for the examples given in other answers, for example:
>>> multiple_replace("(condition1) and --condition2--",
... {"condition1": "", "condition2": "text"})
'() and --text--'
>>> multiple_replace('hello, world', {'hello' : 'goodbye', 'world' : 'earth'})
'goodbye, earth'
>>> multiple_replace("Do you like cafe? No, I prefer tea.",
... {'cafe': 'tea', 'tea': 'cafe', 'like': 'prefer'})
'Do you prefer tea? No, I prefer cafe.'
The main thing for me is that you can use regular expressions as well, for example to replace whole words only, or to normalize white space:
>>> s = "I don't want to change this name:\n Philip II of Spain"
>>> re_str_dict = {r'\bI\b': 'You', r'[\n\t ]+': ' '}
>>> multiple_replace(s, re_str_dict)
"You don't want to change this name: Philip II of Spain"
If you want to use the dictionary keys as normal strings,
you can escape those before calling multiple_replace using e.g. this function:
def escape_keys(d):
""" transform dictionary d by applying re.escape to the keys """
return dict((re.escape(k), v) for k, v in d.items())
>>> multiple_replace(s, escape_keys(re_str_dict))
"I don't want to change this name:\n Philip II of Spain"
The following function can help in finding erroneous regular expressions among your dictionary keys (since the error message from multiple_replace isn't very telling):
def check_re_list(re_list):
""" Checks if each regular expression in list is well-formed. """
for i, e in enumerate(re_list):
try:
re.compile(e)
except (TypeError, re.error):
print("Invalid regular expression string "
"at position {}: '{}'".format(i, e))
>>> check_re_list(re_str_dict.keys())
Note that it does not chain the replacements, instead performs them simultaneously. This makes it more efficient without constraining what it can do. To mimic the effect of chaining, you may just need to add more string-replacement pairs and ensure the expected ordering of the pairs:
>>> multiple_replace("button", {"but": "mut", "mutton": "lamb"})
'mutton'
>>> multiple_replace("button", [("button", "lamb"),
... ("but", "mut"), ("mutton", "lamb")])
'lamb'

Note: Test your case, see comments.
Here's a sample which is more efficient on long strings with many small replacements.
source = "Here is foo, it does moo!"
replacements = {
'is': 'was', # replace 'is' with 'was'
'does': 'did',
'!': '?'
}
def replace(source, replacements):
finder = re.compile("|".join(re.escape(k) for k in replacements.keys())) # matches every string we want replaced
result = []
pos = 0
while True:
match = finder.search(source, pos)
if match:
# cut off the part up until match
result.append(source[pos : match.start()])
# cut off the matched part and replace it in place
result.append(replacements[source[match.start() : match.end()]])
pos = match.end()
else:
# the rest after the last match
result.append(source[pos:])
break
return "".join(result)
print replace(source, replacements)
The point is in avoiding many concatenations of long strings. We chop the source string to fragments, replacing some of the fragments as we form the list, and then join the whole thing back into a string.

I was doing a similar exercise in one of my school homework. This was my solution
dictionary = {1: ['hate', 'love'],
2: ['salad', 'burger'],
3: ['vegetables', 'pizza']}
def normalize(text):
for i in dictionary:
text = text.replace(dictionary[i][0], dictionary[i][1])
return text
See result yourself on test string
string_to_change = 'I hate salad and vegetables'
print(normalize(string_to_change))

You can use the pandas library and the replace function which supports both exact matches as well as regex replacements. For example:
df = pd.DataFrame({'text': ['Billy is going to visit Rome in November', 'I was born in 10/10/2010', 'I will be there at 20:00']})
to_replace=['Billy','Rome','January|February|March|April|May|June|July|August|September|October|November|December', '\d{2}:\d{2}', '\d{2}/\d{2}/\d{4}']
replace_with=['name','city','month','time', 'date']
print(df.text.replace(to_replace, replace_with, regex=True))
And the modified text is:
0 name is going to visit city in month
1 I was born in date
2 I will be there at time
You can find an example here. Notice that the replacements on the text are done with the order they appear in the lists

I was struggling with this problem as well. With many substitutions regular expressions struggle, and are about four times slower than looping string.replace (in my experiment conditions).
You should absolutely try using the Flashtext library (blog post here, Github here). In my case it was a bit over two orders of magnitude faster, from 1.8 s to 0.015 s (regular expressions took 7.7 s) for each document.
It is easy to find use examples in the links above, but this is a working example:
from flashtext import KeywordProcessor
self.processor = KeywordProcessor(case_sensitive=False)
for k, v in self.my_dict.items():
self.processor.add_keyword(k, v)
new_string = self.processor.replace_keywords(string)
Note that Flashtext makes substitutions in a single pass (to avoid a --> b and b --> c translating 'a' into 'c'). Flashtext also looks for whole words (so 'is' will not match 'this'). It works fine if your target is several words (replacing 'This is' by 'Hello').

I face similar problem today, where I had to do use .replace() method multiple times but it didn't feel good to me. So I did something like this:
REPLACEMENTS = {'<': '<', '>': '>', '&': '&'}
event_title = ''.join([REPLACEMENTS.get(c,c) for c in event['summary']])

I feel this question needs a single-line recursive lambda function answer for completeness, just because. So there:
>>> mrep = lambda s, d: s if not d else mrep(s.replace(*d.popitem()), d)
Usage:
>>> mrep('abcabc', {'a': '1', 'c': '2'})
'1b21b2'
Notes:
This consumes the input dictionary.
Python dicts preserve key order as of 3.6; corresponding caveats in other answers are not relevant anymore. For backward compatibility one could resort to a tuple-based version:
>>> mrep = lambda s, d: s if not d else mrep(s.replace(*d.pop()), d)
>>> mrep('abcabc', [('a', '1'), ('c', '2')])
Note: As with all recursive functions in python, too large recursion depth (i.e. too large replacement dictionaries) will result in an error. See e.g. here.

You should really not do it this way, but I just find it way too cool:
>>> replacements = {'cond1':'text1', 'cond2':'text2'}
>>> cmd = 'answer = s'
>>> for k,v in replacements.iteritems():
>>> cmd += ".replace(%s, %s)" %(k,v)
>>> exec(cmd)
Now, answer is the result of all the replacements in turn
again, this is very hacky and is not something that you should be using regularly. But it's just nice to know that you can do something like this if you ever need to.

For replace only one character, use the translate and str.maketrans is my favorite method.
tl;dr > result_string = your_string.translate(str.maketrans(dict_mapping))
demo
my_string = 'This is a test string.'
dict_mapping = {'i': 's', 's': 'S'}
result_good = my_string.translate(str.maketrans(dict_mapping))
result_bad = my_string
for x, y in dict_mapping.items():
result_bad = result_bad.replace(x, y)
print(result_good) # ThsS sS a teSt Strsng.
print(result_bad) # ThSS SS a teSt StrSng.

I don't know about speed but this is my workaday quick fix:
reduce(lambda a, b: a.replace(*b)
, [('o','W'), ('t','X')] #iterable of pairs: (oldval, newval)
, 'tomato' #The string from which to replace values
)
... but I like the #1 regex answer above. Note - if one new value is a substring of another one then the operation is not commutative.

Here is a version with support for basic regex replacement. The main restriction is that expressions must not contain subgroups, and there may be some edge cases:
Code based on #bgusach and others
import re
class StringReplacer:
def __init__(self, replacements, ignore_case=False):
patterns = sorted(replacements, key=len, reverse=True)
self.replacements = [replacements[k] for k in patterns]
re_mode = re.IGNORECASE if ignore_case else 0
self.pattern = re.compile('|'.join(("({})".format(p) for p in patterns)), re_mode)
def tr(matcher):
index = next((index for index,value in enumerate(matcher.groups()) if value), None)
return self.replacements[index]
self.tr = tr
def __call__(self, string):
return self.pattern.sub(self.tr, string)
Tests
table = {
"aaa" : "[This is three a]",
"b+" : "[This is one or more b]",
r"<\w+>" : "[This is a tag]"
}
replacer = StringReplacer(table, True)
sample1 = "whatever bb, aaa, <star> BBB <end>"
print(replacer(sample1))
# output:
# whatever [This is one or more b], [This is three a], [This is a tag] [This is one or more b] [This is a tag]
The trick is to identify the matched group by its position. It is not super efficient (O(n)), but it works.
index = next((index for index,value in enumerate(matcher.groups()) if value), None)
Replacement is done in one pass.

Starting from the precious answer of Andrew i developed a script that loads the dictionary from a file and elaborates all the files on the opened folder to do the replacements. The script loads the mappings from an external file in which you can set the separator. I'm a beginner but i found this script very useful when doing multiple substitutions in multiple files. It loaded a dictionary with more than 1000 entries in seconds. It is not elegant but it worked for me
import glob
import re
mapfile = input("Enter map file name with extension eg. codifica.txt: ")
sep = input("Enter map file column separator eg. |: ")
mask = input("Enter search mask with extension eg. 2010*txt for all files to be processed: ")
suff = input("Enter suffix with extension eg. _NEW.txt for newly generated files: ")
rep = {} # creation of empy dictionary
with open(mapfile) as temprep: # loading of definitions in the dictionary using input file, separator is prompted
for line in temprep:
(key, val) = line.strip('\n').split(sep)
rep[key] = val
for filename in glob.iglob(mask): # recursion on all the files with the mask prompted
with open (filename, "r") as textfile: # load each file in the variable text
text = textfile.read()
# start replacement
#rep = dict((re.escape(k), v) for k, v in rep.items()) commented to enable the use in the mapping of re reserved characters
pattern = re.compile("|".join(rep.keys()))
text = pattern.sub(lambda m: rep[m.group(0)], text)
#write of te output files with the prompted suffice
target = open(filename[:-4]+"_NEW.txt", "w")
target.write(text)
target.close()

this is my solution to the problem. I used it in a chatbot to replace the different words at once.
def mass_replace(text, dct):
new_string = ""
old_string = text
while len(old_string) > 0:
s = ""
sk = ""
for k in dct.keys():
if old_string.startswith(k):
s = dct[k]
sk = k
if s:
new_string+=s
old_string = old_string[len(sk):]
else:
new_string+=old_string[0]
old_string = old_string[1:]
return new_string
print mass_replace("The dog hunts the cat", {"dog":"cat", "cat":"dog"})
this will become The cat hunts the dog

Another example :
Input list
error_list = ['[br]', '[ex]', 'Something']
words = ['how', 'much[ex]', 'is[br]', 'the', 'fish[br]', 'noSomething', 'really']
The desired output would be
words = ['how', 'much', 'is', 'the', 'fish', 'no', 'really']
Code :
[n[0][0] if len(n[0]) else n[1] for n in [[[w.replace(e,"") for e in error_list if e in w],w] for w in words]]

My approach would be to first tokenize the string, then decide for each token whether to include it or not.
Potentially, might be more performant, if we can assume O(1) lookup for a hashmap/set:
remove_words = {"we", "this"}
target_sent = "we should modify this string"
target_sent_words = target_sent.split()
filtered_sent = " ".join(list(filter(lambda word: word not in remove_words, target_sent_words)))
filtered_sent is now 'should modify string'

Or just for a fast hack:
for line in to_read:
read_buffer = line
stripped_buffer1 = read_buffer.replace("term1", " ")
stripped_buffer2 = stripped_buffer1.replace("term2", " ")
write_to_file = to_write.write(stripped_buffer2)

Here is another way of doing it with a dictionary:
listA="The cat jumped over the house".split()
modify = {word:word for number,word in enumerate(listA)}
modify["cat"],modify["jumped"]="dog","walked"
print " ".join(modify[x] for x in listA)

sentence='its some sentence with a something text'
def replaceAll(f,Array1,Array2):
if len(Array1)==len(Array2):
for x in range(len(Array1)):
return f.replace(Array1[x],Array2[x])
newSentence=replaceAll(sentence,['a','sentence','something'],['another','sentence','something something'])
print(newSentence)

How to replace multiple substrings of a string?

I would like to use the .replace function to replace multiple strings.
I currently have
string.replace("condition1", "")
but would like to have something like
string.replace("condition1", "").replace("condition2", "text")
although that does not feel like good syntax
what is the proper way to do this? kind of like how in grep/regex you can do \1 and \2 to replace fields to certain search strings

Here is a short example that should do the trick with regular expressions:
import re
rep = {"condition1": "", "condition2": "text"} # define desired replacements here
# use these three lines to do the replacement
rep = dict((re.escape(k), v) for k, v in rep.iteritems())
#Python 3 renamed dict.iteritems to dict.items so use rep.items() for latest versions
pattern = re.compile("|".join(rep.keys()))
text = pattern.sub(lambda m: rep[re.escape(m.group(0))], text)
For example:
>>> pattern.sub(lambda m: rep[re.escape(m.group(0))], "(condition1) and --condition2--")
'() and --text--'

You could just make a nice little looping function.
def replace_all(text, dic):
for i, j in dic.iteritems():
text = text.replace(i, j)
return text
where text is the complete string and dic is a dictionary — each definition is a string that will replace a match to the term.
Note: in Python 3, iteritems() has been replaced with items()
Careful: Python dictionaries don't have a reliable order for iteration. This solution only solves your problem if:
order of replacements is irrelevant
it's ok for a replacement to change the results of previous replacements
Update: The above statement related to ordering of insertion does not apply to Python versions greater than or equal to 3.6, as standard dicts were changed to use insertion ordering for iteration.
For instance:
d = { "cat": "dog", "dog": "pig"}
my_sentence = "This is my cat and this is my dog."
replace_all(my_sentence, d)
print(my_sentence)
Possible output #1:
"This is my pig and this is my pig."
Possible output #2
"This is my dog and this is my pig."
One possible fix is to use an OrderedDict.
from collections import OrderedDict
def replace_all(text, dic):
for i, j in dic.items():
text = text.replace(i, j)
return text
od = OrderedDict([("cat", "dog"), ("dog", "pig")])
my_sentence = "This is my cat and this is my dog."
replace_all(my_sentence, od)
print(my_sentence)
Output:
"This is my pig and this is my pig."
Careful #2: Inefficient if your text string is too big or there are many pairs in the dictionary.

Why not one solution like this?
s = "The quick brown fox jumps over the lazy dog"
for r in (("brown", "red"), ("lazy", "quick")):
s = s.replace(*r)
#output will be: The quick red fox jumps over the quick dog

Here is a variant of the first solution using reduce, in case you like being functional. :)
repls = {'hello' : 'goodbye', 'world' : 'earth'}
s = 'hello, world'
reduce(lambda a, kv: a.replace(*kv), repls.iteritems(), s)
martineau's even better version:
repls = ('hello', 'goodbye'), ('world', 'earth')
s = 'hello, world'
reduce(lambda a, kv: a.replace(*kv), repls, s)

This is just a more concise recap of F.J and MiniQuark great answers and last but decisive improvement by bgusach. All you need to achieve multiple simultaneous string replacements is the following function:
def multiple_replace(string, rep_dict):
pattern = re.compile("|".join([re.escape(k) for k in sorted(rep_dict,key=len,reverse=True)]), flags=re.DOTALL)
return pattern.sub(lambda x: rep_dict[x.group(0)], string)
Usage:
>>>multiple_replace("Do you like cafe? No, I prefer tea.", {'cafe':'tea', 'tea':'cafe', 'like':'prefer'})
'Do you prefer tea? No, I prefer cafe.'
If you wish, you can make your own dedicated replacement functions starting from this simpler one.

Starting Python 3.8, and the introduction of assignment expressions (PEP 572) (:= operator), we can apply the replacements within a list comprehension:
# text = "The quick brown fox jumps over the lazy dog"
# replacements = [("brown", "red"), ("lazy", "quick")]
[text := text.replace(a, b) for a, b in replacements]
# text = 'The quick red fox jumps over the quick dog'

I built this upon F.J.s excellent answer:
import re
def multiple_replacer(*key_values):
replace_dict = dict(key_values)
replacement_function = lambda match: replace_dict[match.group(0)]
pattern = re.compile("|".join([re.escape(k) for k, v in key_values]), re.M)
return lambda string: pattern.sub(replacement_function, string)
def multiple_replace(string, *key_values):
return multiple_replacer(*key_values)(string)
One shot usage:
>>> replacements = (u"café", u"tea"), (u"tea", u"café"), (u"like", u"love")
>>> print multiple_replace(u"Do you like café? No, I prefer tea.", *replacements)
Do you love tea? No, I prefer café.
Note that since replacement is done in just one pass, "café" changes to "tea", but it does not change back to "café".
If you need to do the same replacement many times, you can create a replacement function easily:
>>> my_escaper = multiple_replacer(('"','\\"'), ('\t', '\\t'))
>>> many_many_strings = (u'This text will be escaped by "my_escaper"',
u'Does this work?\tYes it does',
u'And can we span\nmultiple lines?\t"Yes\twe\tcan!"')
>>> for line in many_many_strings:
... print my_escaper(line)
...
This text will be escaped by \"my_escaper\"
Does this work?\tYes it does
And can we span
multiple lines?\t\"Yes\twe\tcan!\"
Improvements:
turned code into a function
added multiline support
fixed a bug in escaping
easy to create a function for a specific multiple replacement
Enjoy! :-)

I would like to propose the usage of string templates. Just place the string to be replaced in a dictionary and all is set! Example from docs.python.org
>>> from string import Template
>>> s = Template('$who likes $what')
>>> s.substitute(who='tim', what='kung pao')
'tim likes kung pao'
>>> d = dict(who='tim')
>>> Template('Give $who $100').substitute(d)
Traceback (most recent call last):
[...]
ValueError: Invalid placeholder in string: line 1, col 10
>>> Template('$who likes $what').substitute(d)
Traceback (most recent call last):
[...]
KeyError: 'what'
>>> Template('$who likes $what').safe_substitute(d)
'tim likes $what'

In my case, I needed a simple replacing of unique keys with names, so I thought this up:
a = 'This is a test string.'
b = {'i': 'I', 's': 'S'}
for x,y in b.items():
a = a.replace(x, y)
>>> a
'ThIS IS a teSt StrIng.'

Here my $0.02. It is based on Andrew Clark's answer, just a little bit clearer, and it also covers the case when a string to replace is a substring of another string to replace (longer string wins)
def multireplace(string, replacements):
"""
Given a string and a replacement map, it returns the replaced string.
:param str string: string to execute replacements on
:param dict replacements: replacement dictionary {value to find: value to replace}
:rtype: str
"""
# Place longer ones first to keep shorter substrings from matching
# where the longer ones should take place
# For instance given the replacements {'ab': 'AB', 'abc': 'ABC'} against
# the string 'hey abc', it should produce 'hey ABC' and not 'hey ABc'
substrs = sorted(replacements, key=len, reverse=True)
# Create a big OR regex that matches any of the substrings to replace
regexp = re.compile('|'.join(map(re.escape, substrs)))
# For each match, look up the new string in the replacements
return regexp.sub(lambda match: replacements[match.group(0)], string)
It is in this this gist, feel free to modify it if you have any proposal.

I needed a solution where the strings to be replaced can be a regular expressions,
for example to help in normalizing a long text by replacing multiple whitespace characters with a single one. Building on a chain of answers from others, including MiniQuark and mmj, this is what I came up with:
def multiple_replace(string, reps, re_flags = 0):
""" Transforms string, replacing keys from re_str_dict with values.
reps: dictionary, or list of key-value pairs (to enforce ordering;
earlier items have higher priority).
Keys are used as regular expressions.
re_flags: interpretation of regular expressions, such as re.DOTALL
"""
if isinstance(reps, dict):
reps = reps.items()
pattern = re.compile("|".join("(?P<_%d>%s)" % (i, re_str[0])
for i, re_str in enumerate(reps)),
re_flags)
return pattern.sub(lambda x: reps[int(x.lastgroup[1:])][1], string)
It works for the examples given in other answers, for example:
>>> multiple_replace("(condition1) and --condition2--",
... {"condition1": "", "condition2": "text"})
'() and --text--'
>>> multiple_replace('hello, world', {'hello' : 'goodbye', 'world' : 'earth'})
'goodbye, earth'
>>> multiple_replace("Do you like cafe? No, I prefer tea.",
... {'cafe': 'tea', 'tea': 'cafe', 'like': 'prefer'})
'Do you prefer tea? No, I prefer cafe.'
The main thing for me is that you can use regular expressions as well, for example to replace whole words only, or to normalize white space:
>>> s = "I don't want to change this name:\n Philip II of Spain"
>>> re_str_dict = {r'\bI\b': 'You', r'[\n\t ]+': ' '}
>>> multiple_replace(s, re_str_dict)
"You don't want to change this name: Philip II of Spain"
If you want to use the dictionary keys as normal strings,
you can escape those before calling multiple_replace using e.g. this function:
def escape_keys(d):
""" transform dictionary d by applying re.escape to the keys """
return dict((re.escape(k), v) for k, v in d.items())
>>> multiple_replace(s, escape_keys(re_str_dict))
"I don't want to change this name:\n Philip II of Spain"
The following function can help in finding erroneous regular expressions among your dictionary keys (since the error message from multiple_replace isn't very telling):
def check_re_list(re_list):
""" Checks if each regular expression in list is well-formed. """
for i, e in enumerate(re_list):
try:
re.compile(e)
except (TypeError, re.error):
print("Invalid regular expression string "
"at position {}: '{}'".format(i, e))
>>> check_re_list(re_str_dict.keys())
Note that it does not chain the replacements, instead performs them simultaneously. This makes it more efficient without constraining what it can do. To mimic the effect of chaining, you may just need to add more string-replacement pairs and ensure the expected ordering of the pairs:
>>> multiple_replace("button", {"but": "mut", "mutton": "lamb"})
'mutton'
>>> multiple_replace("button", [("button", "lamb"),
... ("but", "mut"), ("mutton", "lamb")])
'lamb'

Note: Test your case, see comments.
Here's a sample which is more efficient on long strings with many small replacements.
source = "Here is foo, it does moo!"
replacements = {
'is': 'was', # replace 'is' with 'was'
'does': 'did',
'!': '?'
}
def replace(source, replacements):
finder = re.compile("|".join(re.escape(k) for k in replacements.keys())) # matches every string we want replaced
result = []
pos = 0
while True:
match = finder.search(source, pos)
if match:
# cut off the part up until match
result.append(source[pos : match.start()])
# cut off the matched part and replace it in place
result.append(replacements[source[match.start() : match.end()]])
pos = match.end()
else:
# the rest after the last match
result.append(source[pos:])
break
return "".join(result)
print replace(source, replacements)
The point is in avoiding many concatenations of long strings. We chop the source string to fragments, replacing some of the fragments as we form the list, and then join the whole thing back into a string.

I was doing a similar exercise in one of my school homework. This was my solution
dictionary = {1: ['hate', 'love'],
2: ['salad', 'burger'],
3: ['vegetables', 'pizza']}
def normalize(text):
for i in dictionary:
text = text.replace(dictionary[i][0], dictionary[i][1])
return text
See result yourself on test string
string_to_change = 'I hate salad and vegetables'
print(normalize(string_to_change))

You can use the pandas library and the replace function which supports both exact matches as well as regex replacements. For example:
df = pd.DataFrame({'text': ['Billy is going to visit Rome in November', 'I was born in 10/10/2010', 'I will be there at 20:00']})
to_replace=['Billy','Rome','January|February|March|April|May|June|July|August|September|October|November|December', '\d{2}:\d{2}', '\d{2}/\d{2}/\d{4}']
replace_with=['name','city','month','time', 'date']
print(df.text.replace(to_replace, replace_with, regex=True))
And the modified text is:
0 name is going to visit city in month
1 I was born in date
2 I will be there at time
You can find an example here. Notice that the replacements on the text are done with the order they appear in the lists

I was struggling with this problem as well. With many substitutions regular expressions struggle, and are about four times slower than looping string.replace (in my experiment conditions).
You should absolutely try using the Flashtext library (blog post here, Github here). In my case it was a bit over two orders of magnitude faster, from 1.8 s to 0.015 s (regular expressions took 7.7 s) for each document.
It is easy to find use examples in the links above, but this is a working example:
from flashtext import KeywordProcessor
self.processor = KeywordProcessor(case_sensitive=False)
for k, v in self.my_dict.items():
self.processor.add_keyword(k, v)
new_string = self.processor.replace_keywords(string)
Note that Flashtext makes substitutions in a single pass (to avoid a --> b and b --> c translating 'a' into 'c'). Flashtext also looks for whole words (so 'is' will not match 'this'). It works fine if your target is several words (replacing 'This is' by 'Hello').

I face similar problem today, where I had to do use .replace() method multiple times but it didn't feel good to me. So I did something like this:
REPLACEMENTS = {'<': '<', '>': '>', '&': '&'}
event_title = ''.join([REPLACEMENTS.get(c,c) for c in event['summary']])

I feel this question needs a single-line recursive lambda function answer for completeness, just because. So there:
>>> mrep = lambda s, d: s if not d else mrep(s.replace(*d.popitem()), d)
Usage:
>>> mrep('abcabc', {'a': '1', 'c': '2'})
'1b21b2'
Notes:
This consumes the input dictionary.
Python dicts preserve key order as of 3.6; corresponding caveats in other answers are not relevant anymore. For backward compatibility one could resort to a tuple-based version:
>>> mrep = lambda s, d: s if not d else mrep(s.replace(*d.pop()), d)
>>> mrep('abcabc', [('a', '1'), ('c', '2')])
Note: As with all recursive functions in python, too large recursion depth (i.e. too large replacement dictionaries) will result in an error. See e.g. here.

You should really not do it this way, but I just find it way too cool:
>>> replacements = {'cond1':'text1', 'cond2':'text2'}
>>> cmd = 'answer = s'
>>> for k,v in replacements.iteritems():
>>> cmd += ".replace(%s, %s)" %(k,v)
>>> exec(cmd)
Now, answer is the result of all the replacements in turn
again, this is very hacky and is not something that you should be using regularly. But it's just nice to know that you can do something like this if you ever need to.

For replace only one character, use the translate and str.maketrans is my favorite method.
tl;dr > result_string = your_string.translate(str.maketrans(dict_mapping))
demo
my_string = 'This is a test string.'
dict_mapping = {'i': 's', 's': 'S'}
result_good = my_string.translate(str.maketrans(dict_mapping))
result_bad = my_string
for x, y in dict_mapping.items():
result_bad = result_bad.replace(x, y)
print(result_good) # ThsS sS a teSt Strsng.
print(result_bad) # ThSS SS a teSt StrSng.

I don't know about speed but this is my workaday quick fix:
reduce(lambda a, b: a.replace(*b)
, [('o','W'), ('t','X')] #iterable of pairs: (oldval, newval)
, 'tomato' #The string from which to replace values
)
... but I like the #1 regex answer above. Note - if one new value is a substring of another one then the operation is not commutative.

Here is a version with support for basic regex replacement. The main restriction is that expressions must not contain subgroups, and there may be some edge cases:
Code based on #bgusach and others
import re
class StringReplacer:
def __init__(self, replacements, ignore_case=False):
patterns = sorted(replacements, key=len, reverse=True)
self.replacements = [replacements[k] for k in patterns]
re_mode = re.IGNORECASE if ignore_case else 0
self.pattern = re.compile('|'.join(("({})".format(p) for p in patterns)), re_mode)
def tr(matcher):
index = next((index for index,value in enumerate(matcher.groups()) if value), None)
return self.replacements[index]
self.tr = tr
def __call__(self, string):
return self.pattern.sub(self.tr, string)
Tests
table = {
"aaa" : "[This is three a]",
"b+" : "[This is one or more b]",
r"<\w+>" : "[This is a tag]"
}
replacer = StringReplacer(table, True)
sample1 = "whatever bb, aaa, <star> BBB <end>"
print(replacer(sample1))
# output:
# whatever [This is one or more b], [This is three a], [This is a tag] [This is one or more b] [This is a tag]
The trick is to identify the matched group by its position. It is not super efficient (O(n)), but it works.
index = next((index for index,value in enumerate(matcher.groups()) if value), None)
Replacement is done in one pass.

Starting from the precious answer of Andrew i developed a script that loads the dictionary from a file and elaborates all the files on the opened folder to do the replacements. The script loads the mappings from an external file in which you can set the separator. I'm a beginner but i found this script very useful when doing multiple substitutions in multiple files. It loaded a dictionary with more than 1000 entries in seconds. It is not elegant but it worked for me
import glob
import re
mapfile = input("Enter map file name with extension eg. codifica.txt: ")
sep = input("Enter map file column separator eg. |: ")
mask = input("Enter search mask with extension eg. 2010*txt for all files to be processed: ")
suff = input("Enter suffix with extension eg. _NEW.txt for newly generated files: ")
rep = {} # creation of empy dictionary
with open(mapfile) as temprep: # loading of definitions in the dictionary using input file, separator is prompted
for line in temprep:
(key, val) = line.strip('\n').split(sep)
rep[key] = val
for filename in glob.iglob(mask): # recursion on all the files with the mask prompted
with open (filename, "r") as textfile: # load each file in the variable text
text = textfile.read()
# start replacement
#rep = dict((re.escape(k), v) for k, v in rep.items()) commented to enable the use in the mapping of re reserved characters
pattern = re.compile("|".join(rep.keys()))
text = pattern.sub(lambda m: rep[m.group(0)], text)
#write of te output files with the prompted suffice
target = open(filename[:-4]+"_NEW.txt", "w")
target.write(text)
target.close()

this is my solution to the problem. I used it in a chatbot to replace the different words at once.
def mass_replace(text, dct):
new_string = ""
old_string = text
while len(old_string) > 0:
s = ""
sk = ""
for k in dct.keys():
if old_string.startswith(k):
s = dct[k]
sk = k
if s:
new_string+=s
old_string = old_string[len(sk):]
else:
new_string+=old_string[0]
old_string = old_string[1:]
return new_string
print mass_replace("The dog hunts the cat", {"dog":"cat", "cat":"dog"})
this will become The cat hunts the dog

Another example :
Input list
error_list = ['[br]', '[ex]', 'Something']
words = ['how', 'much[ex]', 'is[br]', 'the', 'fish[br]', 'noSomething', 'really']
The desired output would be
words = ['how', 'much', 'is', 'the', 'fish', 'no', 'really']
Code :
[n[0][0] if len(n[0]) else n[1] for n in [[[w.replace(e,"") for e in error_list if e in w],w] for w in words]]

My approach would be to first tokenize the string, then decide for each token whether to include it or not.
Potentially, might be more performant, if we can assume O(1) lookup for a hashmap/set:
remove_words = {"we", "this"}
target_sent = "we should modify this string"
target_sent_words = target_sent.split()
filtered_sent = " ".join(list(filter(lambda word: word not in remove_words, target_sent_words)))
filtered_sent is now 'should modify string'

Or just for a fast hack:
for line in to_read:
read_buffer = line
stripped_buffer1 = read_buffer.replace("term1", " ")
stripped_buffer2 = stripped_buffer1.replace("term2", " ")
write_to_file = to_write.write(stripped_buffer2)

Here is another way of doing it with a dictionary:
listA="The cat jumped over the house".split()
modify = {word:word for number,word in enumerate(listA)}
modify["cat"],modify["jumped"]="dog","walked"
print " ".join(modify[x] for x in listA)

sentence='its some sentence with a something text'
def replaceAll(f,Array1,Array2):
if len(Array1)==len(Array2):
for x in range(len(Array1)):
return f.replace(Array1[x],Array2[x])
newSentence=replaceAll(sentence,['a','sentence','something'],['another','sentence','something something'])
print(newSentence)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python tokenize sentence with optional key/val pairs - python

Related

How to use multiple patterns for multiple replacements with Python module re? [duplicate]

Can I pass a list of strings as "old" in str.replace(old, new[, count])? [duplicate]

Python dictionary replacement with space in key

Replacement of characters in a file (python) [duplicate]

How to replace multiple substrings of a string?

Categories

Resources