How to parse and evaluate a math expression with Pandas Dataframe columns?

How to parse and evaluate a math expression with Pandas Dataframe columns? - python

What I would like to do is to parse an expression such this one:
result = A + B + sqrt(B + 4)
Where A and B are columns of a dataframe. So I would have to parse the expresion like this in order to get the result:
new_col = df.B + 4
result = df.A + df.B + new_col.apply(sqrt)
Where df is the dataframe.
I have tried with re.sub but it would be good only to replace the column variables (not the functions) like this:
import re
def repl(match):
inner_word = match.group(1)
new_var = "df['{}']".format(inner_word)
return new_var
eq = 'A + 3 / B'
new_eq = re.sub('([a-zA-Z_]+)', repl, eq)
result = eval(new_eq)
So, my questions are:
Is there a python library to do this? If not, how can I achieve this in a simple way?
Creating a recursive function could be the solution?
If I use the "reverse polish notation" could simplify the parsing?
Would I have to use the ast module?

Pandas DataFrames do have an eval function. Using your example equation:
import pandas as pd
# create an example DataFrame to work with
df = pd.DataFrame({"A": [1, 2], "B": [3, 4]})
# define equation
eq = 'A + 3 / B'
# actual computation
df.eval(eq)
# more complicated equation
eq = "A + B + sqrt(B + 4)"
df.eval(eq)
Warning
Keep in mind that eval allows to run arbitrary code, which can make you vulnerable to code injection if you pass user input to this function.

Following the example provided by #uuazed, a faster way would be using numexpr
import pandas as pd
import numpy as np
import numexpr as ne
df = pd.DataFrame(np.random.randn(int(1e6), 2), columns=['A', 'B'])
eq = "A + B + sqrt(B + 4)"
timeit df.eval(eq)
# 15.9 ms ± 177 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
timeit A=df.A; B=df.B; ne.evaluate(eq)
# 6.24 ms ± 396 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
numexpr may also have more supported operations

Related

Numba slower when optionally returning arrays

I try to write a numba function, that optionally returns arrays, that are created inside the function. I found, that just the implementation of this option slows down the execution compared to the same function without the option of returning the arrays.
Here is an example:
import numpy as np
from numba import jit
#jit(nopython=True, nogil=True, cache=True)
def testfun1(array_in, arrays_out=np.empty((0,0), np.float64)):
array_add_1 = array_in + 1
array_add_2 = array_in + 2
array_sum_1 = sum(array_add_1)
array_sum_2 = sum(array_add_2)
if arrays_out.shape == (10,len(array_in)):
# write to an array
arrays_out[0,:] = array_add_1
arrays_out[1,:] = array_add_1
arrays_out[2,:] = array_add_1
arrays_out[3,:] = array_add_1
arrays_out[4,:] = array_add_1
arrays_out[5,:] = array_add_2
arrays_out[6,:] = array_add_2
arrays_out[7,:] = array_add_2
arrays_out[8,:] = array_add_2
arrays_out[9,:] = array_add_2
return (array_sum_1, array_sum_2)
#jit(nopython=True, nogil=True, cache=True)
def testfun2(array_in, arrays_out=np.empty((0,0), np.float64)):
array_add_1 = array_in + 1
array_add_2 = array_in + 2
array_sum_1 = sum(array_add_1)
array_sum_2 = sum(array_add_2)
if arrays_out.shape == (10,len(array_in)):
array_add_1 = 0 # do something
array_add_2 = 0 # do something
return (array_sum_1, array_sum_2)
array_in = np.arange(0,10000)
Timing gives me:
%timeit testfun1(array_in) # 70.1 µs ± 346 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit testfun2(array_in) # 69.6 µs ± 217 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
(Please note, that this is a really small scaled example. In my actual code, the time difference is much more noticeable because of more and bigger arrays to optionally return)
So basically both functions do the same, as no arrays_out variable with correct size is passed to the functions. But why is testfun1 slower?
Do you know an alternative to code a function, that optionally returns arrays but won't slow down the execution when arrays should not be returned?
Thanks and best regards,
Scooba

convert time string XhYmZs to seconds in python

I have a string which comes in three forms:
XhYmZs or YmZs or Zs
where, h,m,s are for hours, mins, secs and X,Y,Z are the corresponding values.
How do I efficiently convert these strings to seconds in python2.7?
I guess I can do something like:
s="XhYmZs"
if "h" in s:
hours=s.split("h")
elif "m" in s:
mins=s.split("m")[0][-1]
... but this does not seem very efficient to me :(

Split on the delimiters you're interested in, then parse each resulting element into an integer and multiply as needed:
import re
def hms(s):
l = list(map(int, re.split('[hms]', s)[:-1]))
if len(l) == 3:
return l[0]*3600 + l[1]*60 + l[2]
elif len(l) == 2:
return l[0]*60 + l[1]
else:
return l[0]
This produces a duration normalized to seconds.
>>> hms('3h4m5s')
11045
>>> 3*3600+4*60+5
11045
>>> hms('70m5s')
4205
>>> 70*60+5
4205
>>> hms('300s')
300
You can also make this one line by turning the re.split() result around and multiplying by 60 raised to an incrementing power based on the element's position in the list:
def hms2(s):
return sum(int(x)*60**i for i,x in enumerate(re.split('[hms]', s)[-2::-1]))

>>> import datetime
>>> datetime.datetime.strptime('3h4m5s', '%Hh%Mm%Ss').time()
datetime.time(3, 4, 5)
Since it varies which fields are in your strings, you may have to build a matching format string.
>>> def parse(s):
... fmt=''.join('%'+c.upper()+c for c in 'hms' if c in s)
... return datetime.datetime.strptime(s, fmt).time()
The datetime module is the standard library way to handle times.
Asking to do this "efficiently" is a bit of a fool's errand. String parsing in an interpreted language isn't fast; aim for clarity. In addition, seeming efficient isn't very meaningful; either analyze the algorithm or benchmark, otherwise it's speculation.

Do not know how efficient this is, but this is how I would do it:
import re
test_data = [
'1h2m3s',
'1m2s',
'1s',
'3s1h2m',
]
HMS_REGEX = re.compile('^(\d+)h(\d+)m(\d+)s$')
MS_REGEX = re.compile('^(\d+)m(\d+)s$')
S_REGEX = re.compile('^(\d+)s$')
def total_seconds(hms_string):
found = HMS_REGEX.match(hms_string)
if found:
x = found.group(1)
return 3600 * int(found.group(1)) + \
60 * int(found.group(2)) + \
int(found.group(3))
found = MS_REGEX.match(hms_string)
if found:
return 60 * int(found.group(1)) + int(found.group(2))
found = S_REGEX.match(hms_string)
if found:
return int(found.group(1))
raise ValueError('Could not convert ' + hms_string)
for datum in test_data:
try:
print(total_seconds(datum))
except ValueError as exc:
print(exc)
or going to a single match and riffing on TigerhawkT3's one liner, but retaining the error checking of non-matching strings:
HMS_REGEX = re.compile('^(\d+)h(\d+)m(\d+)s$|^(\d+)m(\d+)s$|^(\d+)s$')
def total_seconds(hms_string):
found = HMS_REGEX.match(hms_string)
if found:
return sum(
int(x or 0) * 60 ** i for i, x in enumerate(
(y for y in reversed(found.groups()) if y is not None))
raise ValueError('Could not convert ' + hms_string)

My fellow pythonistas, please stop using regular expression for everything. Regular Expression is not needed for such simple tasks. Python is considered a slow language not because the GIL or the interpreter, because such mis-usage.
In [1]: import re
...: def hms(s):
...: l = list(map(int, re.split('[hms]', s)[:-1]))
...: if len(l) == 3:
...: return l[0]*3600 + l[1]*60 + l[2]
...: elif len(l) == 2:
...: return l[0]*60 + l[1]
...: else:
...: return l[0]
In [2]: %timeit hms("6h7m8s")
5.62 µs ± 722 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [6]: def ehms(s):
...: bases=dict(h=3600, m=60, s=1)
...: secs = 0
...: num = 0
...: for c in s:
...: if c.isdigit():
...: num = num * 10 + int(c)
...: else:
...: secs += bases[c] * num
...: num = 0
...: return secs
In [7]: %timeit ehms("6h7m8s")
2.07 µs ± 70.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [8]: %timeit hms("8s")
2.35 µs ± 124 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [9]: %timeit ehms("8s")
1.06 µs ± 118 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [10]: bases=dict(h=3600, m=60, s=1)
In [15]: a = ord('a')
In [16]: def eehms(s):
...: secs = 0
...: num = 0
...: for c in s:
...: if c.isdigit():
...: num = num * 10 + ord(c) - a
...: else:
...: secs += bases[c] * num
...: num = 0
...: return secs
In [17]: %timeit eehms("6h7m8s")
1.45 µs ± 30 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
see, almost 4 times as fast.

There's a library python-dateutil - pip install python-dateutil, it takes a string and returns a datetime.datetime.
It can parse values as 5h 30m, 0.5h 30m, 0.5h - with spaces or without.
from datetime import datetime
from dateutil import parser
time = '5h15m50s'
midnight_plus_time = parser.parse(time)
midnight: datetime = datetime.combine(datetime.today(), datetime.min.time())
timedelta = midnight_plus_time - midnight
print(timedelta.seconds) # 18950
It can't parse more than 24h at once though.

How to turn a 1D radial profile into a 2D array in python

I have a list that models a phenomenon that is a function of radius. I want to convert this to a 2D array. I wrote some code that does exactly what I want, but since it uses nested for loops, it is quite slow.
l = len(profile1D)/2
critDim = int((l**2 /2.)**(1/2.))
profile2D = np.empty([critDim, critDim])
for x in xrange(0, critDim):
for y in xrange(0,critDim):
r = ((x**2 + y**2)**(1/2.))
profile2D[x,y] = profile1D[int(l+r)]
Is there a more efficient way to do the same thing by avoiding these loops?

Here's a vectorized approach using broadcasting -
a = np.arange(critDim)**2
r2D = np.sqrt(a[:,None] + a)
out = profile1D[(l+r2D).astype(int)]
If there are many repeated indices generated by l+r2D, we can use np.take for some further performance boost, like so -
out = np.take(profile1D,(l+r2D).astype(int))
Runtime test
Function definitions -
def org_app(profile1D,l,critDim):
profile2D = np.empty([critDim, critDim])
for x in xrange(0, critDim):
for y in xrange(0,critDim):
r = ((x**2 + y**2)**(1/2.))
profile2D[x,y] = profile1D[int(l+r)]
return profile2D
def vect_app1(profile1D,l,critDim):
a = np.arange(critDim)**2
r2D = np.sqrt(a[:,None] + a)
out = profile1D[(l+r2D).astype(int)]
return out
def vect_app2(profile1D,l,critDim):
a = np.arange(critDim)**2
r2D = np.sqrt(a[:,None] + a)
out = np.take(profile1D,(l+r2D).astype(int))
return out
Timings and verification -
In [25]: # Setup input array and params
...: profile1D = np.random.randint(0,9,(1000))
...: l = len(profile1D)/2
...: critDim = int((l**2 /2.)**(1/2.))
...:
In [26]: np.allclose(org_app(profile1D,l,critDim),vect_app1(profile1D,l,critDim))
Out[26]: True
In [27]: np.allclose(org_app(profile1D,l,critDim),vect_app2(profile1D,l,critDim))
Out[27]: True
In [28]: %timeit org_app(profile1D,l,critDim)
10 loops, best of 3: 154 ms per loop
In [29]: %timeit vect_app1(profile1D,l,critDim)
1000 loops, best of 3: 1.69 ms per loop
In [30]: %timeit vect_app2(profile1D,l,critDim)
1000 loops, best of 3: 1.68 ms per loop
In [31]: # Setup input array and params
...: profile1D = np.random.randint(0,9,(5000))
...: l = len(profile1D)/2
...: critDim = int((l**2 /2.)**(1/2.))
...:
In [32]: %timeit org_app(profile1D,l,critDim)
1 loops, best of 3: 3.76 s per loop
In [33]: %timeit vect_app1(profile1D,l,critDim)
10 loops, best of 3: 59.8 ms per loop
In [34]: %timeit vect_app2(profile1D,l,critDim)
10 loops, best of 3: 59.5 ms per loop

Faster loop operating on two values of an array

Consider the following function:
def dostuff(n, f):
array = numpy.arange(0, n)
for i in range(1, n): # Line 1
array[i] = f(array[i-1], array[i]) # Line 2
return numpy.sum(array)
How can I rewrite the Line 1/Line 2 to make the loop faster in python 3 (without using cython)?

I encourage you to check this question on SO generalized cumulative functions in NumPy/SciPy? , since you want a generalized cumulative function .
also check scipy documentation for the function frompyfunc Here
func = np.frompyfunc(f , 2 , 1)
def dostuff(n,f):
final_array = func.accumulate(np.arange(0,n), dtype=np.object).astype(np.int)
return np.sum(final_array)
Example
In [86]:
def f(num1 , num2):
return num1 + num2
In [87]:
func = np.frompyfunc(f , 2 , 1)
In [88]:
def dostuff(n,f):
final_array = func.accumulate(np.arange(0,n), dtype=np.object).astype(np.int)
return np.sum(final_array)
In [108]:
dostuff(15,f)
Out[108]:
560
In [109]:
dostuff(10,f)
Out[109]:
165
Benchmarks
def dostuff1(n, f):
array = np.arange(0, n)
for i in range(1, n): # Line 1
array[i] = f(array[i-1], array[i]) # Line 2
return np.sum(array)
def dostuff2(n,f):
final_array = func.accumulate(np.arange(0,n), dtype=np.object).astype(np.int)
return np.sum(final_array)
In [126]:
%timeit dostuff1(100,f)
10000 loops, best of 3: 40.6 µs per loop
In [127]:
%timeit dostuff2(100,f)
The slowest run took 4.98 times longer than the fastest. This could mean that an intermediate result is being cached
10000 loops, best of 3: 23.8 µs per loop

Best way to replace multiple characters in a string?

I need to replace some characters as follows: & ➔ \&, # ➔ \#, ...
I coded as follows, but I guess there should be some better way. Any hints?
strs = strs.replace('&', '\&')
strs = strs.replace('#', '\#')
...

Replacing two characters
I timed all the methods in the current answers along with one extra.
With an input string of abc&def#ghi and replacing & -> \& and # -> \#, the fastest way was to chain together the replacements like this: text.replace('&', '\&').replace('#', '\#').
Timings for each function:
a) 1000000 loops, best of 3: 1.47 μs per loop
b) 1000000 loops, best of 3: 1.51 μs per loop
c) 100000 loops, best of 3: 12.3 μs per loop
d) 100000 loops, best of 3: 12 μs per loop
e) 100000 loops, best of 3: 3.27 μs per loop
f) 1000000 loops, best of 3: 0.817 μs per loop
g) 100000 loops, best of 3: 3.64 μs per loop
h) 1000000 loops, best of 3: 0.927 μs per loop
i) 1000000 loops, best of 3: 0.814 μs per loop
Here are the functions:
def a(text):
chars = "&#"
for c in chars:
text = text.replace(c, "\\" + c)
def b(text):
for ch in ['&','#']:
if ch in text:
text = text.replace(ch,"\\"+ch)
import re
def c(text):
rx = re.compile('([&#])')
text = rx.sub(r'\\\1', text)
RX = re.compile('([&#])')
def d(text):
text = RX.sub(r'\\\1', text)
def mk_esc(esc_chars):
return lambda s: ''.join(['\\' + c if c in esc_chars else c for c in s])
esc = mk_esc('&#')
def e(text):
esc(text)
def f(text):
text = text.replace('&', '\&').replace('#', '\#')
def g(text):
replacements = {"&": "\&", "#": "\#"}
text = "".join([replacements.get(c, c) for c in text])
def h(text):
text = text.replace('&', r'\&')
text = text.replace('#', r'\#')
def i(text):
text = text.replace('&', r'\&').replace('#', r'\#')
Timed like this:
python -mtimeit -s"import time_functions" "time_functions.a('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.b('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.c('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.d('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.e('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.f('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.g('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.h('abc&def#ghi')"
python -mtimeit -s"import time_functions" "time_functions.i('abc&def#ghi')"
Replacing 17 characters
Here's similar code to do the same but with more characters to escape (\`*_{}>#+-.!$):
def a(text):
chars = "\\`*_{}[]()>#+-.!$"
for c in chars:
text = text.replace(c, "\\" + c)
def b(text):
for ch in ['\\','`','*','_','{','}','[',']','(',')','>','#','+','-','.','!','$','\'']:
if ch in text:
text = text.replace(ch,"\\"+ch)
import re
def c(text):
rx = re.compile('([&#])')
text = rx.sub(r'\\\1', text)
RX = re.compile('([\\`*_{}[]()>#+-.!$])')
def d(text):
text = RX.sub(r'\\\1', text)
def mk_esc(esc_chars):
return lambda s: ''.join(['\\' + c if c in esc_chars else c for c in s])
esc = mk_esc('\\`*_{}[]()>#+-.!$')
def e(text):
esc(text)
def f(text):
text = text.replace('\\', '\\\\').replace('`', '\`').replace('*', '\*').replace('_', '\_').replace('{', '\{').replace('}', '\}').replace('[', '\[').replace(']', '\]').replace('(', '\(').replace(')', '\)').replace('>', '\>').replace('#', '\#').replace('+', '\+').replace('-', '\-').replace('.', '\.').replace('!', '\!').replace('$', '\$')
def g(text):
replacements = {
"\\": "\\\\",
"`": "\`",
"*": "\*",
"_": "\_",
"{": "\{",
"}": "\}",
"[": "\[",
"]": "\]",
"(": "\(",
")": "\)",
">": "\>",
"#": "\#",
"+": "\+",
"-": "\-",
".": "\.",
"!": "\!",
"$": "\$",
}
text = "".join([replacements.get(c, c) for c in text])
def h(text):
text = text.replace('\\', r'\\')
text = text.replace('`', r'\`')
text = text.replace('*', r'\*')
text = text.replace('_', r'\_')
text = text.replace('{', r'\{')
text = text.replace('}', r'\}')
text = text.replace('[', r'\[')
text = text.replace(']', r'\]')
text = text.replace('(', r'\(')
text = text.replace(')', r'\)')
text = text.replace('>', r'\>')
text = text.replace('#', r'\#')
text = text.replace('+', r'\+')
text = text.replace('-', r'\-')
text = text.replace('.', r'\.')
text = text.replace('!', r'\!')
text = text.replace('$', r'\$')
def i(text):
text = text.replace('\\', r'\\').replace('`', r'\`').replace('*', r'\*').replace('_', r'\_').replace('{', r'\{').replace('}', r'\}').replace('[', r'\[').replace(']', r'\]').replace('(', r'\(').replace(')', r'\)').replace('>', r'\>').replace('#', r'\#').replace('+', r'\+').replace('-', r'\-').replace('.', r'\.').replace('!', r'\!').replace('$', r'\$')
Here's the results for the same input string abc&def#ghi:
a) 100000 loops, best of 3: 6.72 μs per loop
b) 100000 loops, best of 3: 2.64 μs per loop
c) 100000 loops, best of 3: 11.9 μs per loop
d) 100000 loops, best of 3: 4.92 μs per loop
e) 100000 loops, best of 3: 2.96 μs per loop
f) 100000 loops, best of 3: 4.29 μs per loop
g) 100000 loops, best of 3: 4.68 μs per loop
h) 100000 loops, best of 3: 4.73 μs per loop
i) 100000 loops, best of 3: 4.24 μs per loop
And with a longer input string (## *Something* and [another] thing in a longer sentence with {more} things to replace$):
a) 100000 loops, best of 3: 7.59 μs per loop
b) 100000 loops, best of 3: 6.54 μs per loop
c) 100000 loops, best of 3: 16.9 μs per loop
d) 100000 loops, best of 3: 7.29 μs per loop
e) 100000 loops, best of 3: 12.2 μs per loop
f) 100000 loops, best of 3: 5.38 μs per loop
g) 10000 loops, best of 3: 21.7 μs per loop
h) 100000 loops, best of 3: 5.7 μs per loop
i) 100000 loops, best of 3: 5.13 μs per loop
Adding a couple of variants:
def ab(text):
for ch in ['\\','`','*','_','{','}','[',']','(',')','>','#','+','-','.','!','$','\'']:
text = text.replace(ch,"\\"+ch)
def ba(text):
chars = "\\`*_{}[]()>#+-.!$"
for c in chars:
if c in text:
text = text.replace(c, "\\" + c)
With the shorter input:
ab) 100000 loops, best of 3: 7.05 μs per loop
ba) 100000 loops, best of 3: 2.4 μs per loop
With the longer input:
ab) 100000 loops, best of 3: 7.71 μs per loop
ba) 100000 loops, best of 3: 6.08 μs per loop
So I'm going to use ba for readability and speed.
Addendum
Prompted by haccks in the comments, one difference between ab and ba is the if c in text: check. Let's test them against two more variants:
def ab_with_check(text):
for ch in ['\\','`','*','_','{','}','[',']','(',')','>','#','+','-','.','!','$','\'']:
if ch in text:
text = text.replace(ch,"\\"+ch)
def ba_without_check(text):
chars = "\\`*_{}[]()>#+-.!$"
for c in chars:
text = text.replace(c, "\\" + c)
Times in μs per loop on Python 2.7.14 and 3.6.3, and on a different machine from the earlier set, so cannot be compared directly.
╭────────────╥──────┬───────────────┬──────┬──────────────────╮
│ Py, input ║ ab │ ab_with_check │ ba │ ba_without_check │
╞════════════╬══════╪═══════════════╪══════╪══════════════════╡
│ Py2, short ║ 8.81 │ 4.22 │ 3.45 │ 8.01 │
│ Py3, short ║ 5.54 │ 1.34 │ 1.46 │ 5.34 │
├────────────╫──────┼───────────────┼──────┼──────────────────┤
│ Py2, long ║ 9.3 │ 7.15 │ 6.85 │ 8.55 │
│ Py3, long ║ 7.43 │ 4.38 │ 4.41 │ 7.02 │
└────────────╨──────┴───────────────┴──────┴──────────────────┘
We can conclude that:
Those with the check are up to 4x faster than those without the check
ab_with_check is slightly in the lead on Python 3, but ba (with check) has a greater lead on Python 2
However, the biggest lesson here is Python 3 is up to 3x faster than Python 2! There's not a huge difference between the slowest on Python 3 and fastest on Python 2!

>>> string="abc&def#ghi"
>>> for ch in ['&','#']:
... if ch in string:
... string=string.replace(ch,"\\"+ch)
...
>>> print string
abc\&def\#ghi

Here is a python3 method using str.translate and str.maketrans:
s = "abc&def#ghi"
print(s.translate(str.maketrans({'&': '\&', '#': '\#'})))
The printed string is abc\&def\#ghi.

Simply chain the replace functions like this
strs = "abc&def#ghi"
print strs.replace('&', '\&').replace('#', '\#')
# abc\&def\#ghi
If the replacements are going to be more in number, you can do this in this generic way
strs, replacements = "abc&def#ghi", {"&": "\&", "#": "\#"}
print "".join([replacements.get(c, c) for c in strs])
# abc\&def\#ghi

Late to the party, but I lost a lot of time with this issue until I found my answer.
Short and sweet, translate is superior to replace. If you're more interested in funcionality over time optimization, do not use replace.
Also use translate if you don't know if the set of characters to be replaced overlaps the set of characters used to replace.
Case in point:
Using replace you would naively expect the snippet "1234".replace("1", "2").replace("2", "3").replace("3", "4") to return "2344", but it will return in fact "4444".
Translation seems to perform what OP originally desired.

Are you always going to prepend a backslash? If so, try
import re
rx = re.compile('([&#])')
# ^^ fill in the characters here.
strs = rx.sub('\\\\\\1', strs)
It may not be the most efficient method but I think it is the easiest.

You may consider writing a generic escape function:
def mk_esc(esc_chars):
return lambda s: ''.join(['\\' + c if c in esc_chars else c for c in s])
>>> esc = mk_esc('&#')
>>> print esc('Learn & be #1')
Learn \& be \#1
This way you can make your function configurable with a list of character that should be escaped.

For Python 3.8 and above, one can use assignment expressions
[text := text.replace(s, f"\\{s}") for s in "&#" if s in text];
Although, I am quite unsure if this would be considered "appropriate use" of assignment expressions as described in PEP 572, but looks clean and reads quite well (to my eyes). The semicolon at the end suppresses output if you run this in a REPL.
This would be "appropriate" if you wanted all intermediate strings as well. For example, (removing all lowercase vowels):
text = "Lorem ipsum dolor sit amet"
intermediates = [text := text.replace(i, "") for i in "aeiou" if i in text]
['Lorem ipsum dolor sit met',
'Lorm ipsum dolor sit mt',
'Lorm psum dolor st mt',
'Lrm psum dlr st mt',
'Lrm psm dlr st mt']
On the plus side, it does seem (unexpectedly?) faster than some of the faster methods in the accepted answer, and seems to perform nicely with both increasing strings length and an increasing number of substitutions.
The code for the above comparison is below. I am using random strings to make my life a bit simpler, and the characters to replace are chosen randomly from the string itself. (Note: I am using ipython's %timeit magic here, so run this in ipython/jupyter).
import random, string
def make_txt(length):
"makes a random string of a given length"
return "".join(random.choices(string.printable, k=length))
def get_substring(s, num):
"gets a substring"
return "".join(random.choices(s, k=num))
def a(text, replace): # one of the better performing approaches from the accepted answer
for i in replace:
if i in text:
text = text.replace(i, "")
def b(text, replace):
_ = (text := text.replace(i, "") for i in replace if i in text)
def compare(strlen, replace_length):
"use ipython / jupyter for the %timeit functionality"
times_a, times_b = [], []
for i in range(*strlen):
el = make_txt(i)
et = get_substring(el, replace_length)
res_a = %timeit -n 1000 -o a(el, et) # ipython magic
el = make_txt(i)
et = get_substring(el, replace_length)
res_b = %timeit -n 1000 -o b(el, et) # ipython magic
times_a.append(res_a.average * 1e6)
times_b.append(res_b.average * 1e6)
return times_a, times_b
#----run
t2 = compare((2*2, 1000, 50), 2)
t10 = compare((2*10, 1000, 50), 10)

FYI, this is of little or no use to the OP but it may be of use to other readers (please do not downvote, I'm aware of this).
As a somewhat ridiculous but interesting exercise, wanted to see if I could use python functional programming to replace multiple chars. I'm pretty sure this does NOT beat just calling replace() twice. And if performance was an issue, you could easily beat this in rust, C, julia, perl, java, javascript and maybe even awk. It uses an external 'helpers' package called pytoolz, accelerated via cython (cytoolz, it's a pypi package).
from cytoolz.functoolz import compose
from cytoolz.itertoolz import chain,sliding_window
from itertools import starmap,imap,ifilter
from operator import itemgetter,contains
text='&hello#hi&yo&'
char_index_iter=compose(partial(imap, itemgetter(0)), partial(ifilter, compose(partial(contains, '#&'), itemgetter(1))), enumerate)
print '\\'.join(imap(text.__getitem__, starmap(slice, sliding_window(2, chain((0,), char_index_iter(text), (len(text),))))))
I'm not even going to explain this because no one would bother using this to accomplish multiple replace. Nevertheless, I felt somewhat accomplished in doing this and thought it might inspire other readers or win a code obfuscation contest.

How about this?
def replace_all(dict, str):
for key in dict:
str = str.replace(key, dict[key])
return str
then
print(replace_all({"&":"\&", "#":"\#"}, "&#"))
output
\&\#
similar to answer

Using reduce which is available in python2.7 and python3.* you can easily replace mutiple substrings in a clean and pythonic way.
# Lets define a helper method to make it easy to use
def replacer(text, replacements):
return reduce(
lambda text, ptuple: text.replace(ptuple[0], ptuple[1]),
replacements, text
)
if __name__ == '__main__':
uncleaned_str = "abc&def#ghi"
cleaned_str = replacer(uncleaned_str, [("&","\&"),("#","\#")])
print(cleaned_str) # "abc\&def\#ghi"
In python2.7 you don't have to import reduce but in python3.* you have to import it from the functools module.

advanced way using regex
import re
text = "hello ,world!"
replaces = {"hello": "hi", "world":" 2020", "!":"."}
regex = re.sub("|".join(replaces.keys()), lambda match: replaces[match.string[match.start():match.end()]], text)
print(regex)

>>> a = '&#'
>>> print a.replace('&', r'\&')
\&#
>>> print a.replace('#', r'\#')
&\#
>>>
You want to use a 'raw' string (denoted by the 'r' prefixing the replacement string), since raw strings to not treat the backslash specially.

Maybe a simple loop for chars to replace:
a = '&#'
to_replace = ['&', '#']
for char in to_replace:
a = a.replace(char, "\\"+char)
print(a)
>>> \&\#

This will help someone looking for a simple solution.
def replacemany(our_str, to_be_replaced:tuple, replace_with:str):
for nextchar in to_be_replaced:
our_str = our_str.replace(nextchar, replace_with)
return our_str
os = 'the rain in spain falls mainly on the plain ttttttttt sssssssssss nnnnnnnnnn'
tbr = ('a','t','s','n')
rw = ''
print(replacemany(os,tbr,rw))
Output:
he ri i pi fll mily o he pli

Example is given below for the or condition, it will delete all ' and , from the given string. pass as many characters as you want separated by |
import re
test = re.sub("('|,)","",str(jsonAtrList))
Before:
After:

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to parse and evaluate a math expression with Pandas Dataframe columns? - python

Related

Numba slower when optionally returning arrays

convert time string XhYmZs to seconds in python

How to turn a 1D radial profile into a 2D array in python

Faster loop operating on two values of an array

Best way to replace multiple characters in a string?

Categories

Resources