Remove specific word in Python [duplicate] - python

This question already has answers here:
Apply function to each element of a list
(4 answers)
Why doesn't calling a string method (such as .replace or .strip) modify (mutate) the string?
(3 answers)
Closed 5 months ago.
I have a set of strings and all the strings have one of two specific substrings which I want to remove:
set1 = {'Apple.good', 'Orange.good', 'Pear.bad', 'Pear.good', 'Banana.bad', 'Potato.bad'}
I want the ".good" and ".bad" substrings removed from all the strings. I tried this:
for x in set1:
x.replace('.good', '')
x.replace('.bad', '')
but it doesn't seem to work, set1 stays exactly the same. I tried using for x in list(set1) instead but that doesn't change anything.

Strings are immutable. str.replace creates a new string. This is stated in the documentation:
str.replace(old, new[, count])
Return a copy of the string with all occurrences of substring old replaced by new. [...]
This means you have to re-allocate the set or re-populate it (re-allocating is easier with a set comprehension):
new_set = {x.replace('.good', '').replace('.bad', '') for x in set1}
P.S. if you want to change the prefix or suffix of a string and you're using Python 3.9 or newer, use str.removeprefix() or str.removesuffix() instead:
new_set = {x.removesuffix('.good').removesuffix('.bad') for x in set1}

>>> x = 'Pear.good'
>>> y = x.replace('.good','')
>>> y
'Pear'
>>> x
'Pear.good'
.replace doesn't change the string, it returns a copy of the string with the replacement. You can't change the string directly because strings are immutable.
You need to take the return values from x.replace and put them in a new set.

In Python 3.9+ you could remove the suffix using str.removesuffix('mysuffix'). From the docs:
If the string ends with the suffix string and that suffix is not empty, return string[:-len(suffix)]. Otherwise, return a copy of the original string
So you can either create a new empty set and add each element without the suffix to it:
set1 = {'Apple.good', 'Orange.good', 'Pear.bad', 'Pear.good', 'Banana.bad', 'Potato.bad'}
set2 = set()
for s in set1:
set2.add(s.removesuffix(".good").removesuffix(".bad"))
Or create the new set using a set comprehension:
set2 = {s.removesuffix(".good").removesuffix(".bad") for s in set1}
print(set2)
Output:
{'Orange', 'Pear', 'Apple', 'Banana', 'Potato'}

All you need is a bit of black magic!
>>> a = ["cherry.bad","pear.good", "apple.good"]
>>> a = list(map(lambda x: x.replace('.good','').replace('.bad',''),a))
>>> a
['cherry', 'pear', 'apple']

When there are multiple substrings to remove, one simple and effective option is to use re.sub with a compiled pattern that involves joining all the substrings-to-remove using the regex OR (|) pipe.
import re
to_remove = ['.good', '.bad']
strings = ['Apple.good','Orange.good','Pear.bad']
p = re.compile('|'.join(map(re.escape, to_remove))) # escape to handle metachars
[p.sub('', s) for s in strings]
# ['Apple', 'Orange', 'Pear']

You could do this:
import re
import string
set1={'Apple.good','Orange.good','Pear.bad','Pear.good','Banana.bad','Potato.bad'}
for x in set1:
x.replace('.good',' ')
x.replace('.bad',' ')
x = re.sub('\.good$', '', x)
x = re.sub('\.bad$', '', x)
print(x)

# practices 2
str = "Amin Is A Good Programmer"
new_set = str.replace('Good', '')
print(new_set)
print : Amin Is A Programmer

I did the test (but it is not your example) and the data does not return them orderly or complete
>>> ind = ['p5','p1','p8','p4','p2','p8']
>>> newind = {x.replace('p','') for x in ind}
>>> newind
{'1', '2', '8', '5', '4'}
I proved that this works:
>>> ind = ['p5','p1','p8','p4','p2','p8']
>>> newind = [x.replace('p','') for x in ind]
>>> newind
['5', '1', '8', '4', '2', '8']
or
>>> newind = []
>>> ind = ['p5','p1','p8','p4','p2','p8']
>>> for x in ind:
... newind.append(x.replace('p',''))
>>> newind
['5', '1', '8', '4', '2', '8']

If list
I was doing something for a list which is a set of strings and you want to remove all lines that have a certain substring you can do this
import re
def RemoveInList(sub,LinSplitUnOr):
indices = [i for i, x in enumerate(LinSplitUnOr) if re.search(sub, x)]
A = [i for j, i in enumerate(LinSplitUnOr) if j not in indices]
return A
where sub is a patter that you do not wish to have in a list of lines LinSplitUnOr
for example
A=['Apple.good','Orange.good','Pear.bad','Pear.good','Banana.bad','Potato.bad']
sub = 'good'
A=RemoveInList(sub,A)
Then A will be

Related

Python - How to get only the numbers from a string which has got commas in between

Consider like I have a string :
stringA = "values-are-10,20,30,40,50"
stringB = "values-are-10"
I need to get only the strings :
desired output :
for stringA: 10,20,30,40,50
for stringB: 10
I tried using this - int(''.join(filter(str.isdigit, stringA)))
But it removed all the commas, please let me know how to get the output in that format.
Using re.findall here is your friend:
stringA = "values-are-10,20,30,40,50"
stringB = "values-are-10"
strings = [stringA, stringB]
output = [re.findall(r'\d+(?:,\d+)*', s)[0] for s in strings]
print(output) # ['10,20,30,40,50', '10']
[int(v) for v in stringA.rsplit("-", 1)[-1].split(",")]
rsplit splits from the right - all numbers appear after the last "-". Then we split by ","
If it's guaranteed that the initial string will always have the format,
prefix + values, why not just substring this out, a simple
stringA, stringB = stringA[len(prefixA):], stringB[len(prefixB):];
and then,
list1, list2 = [int(num) for num in stringA.split(',')], [int(num) for num in stringB.split(',')];
should do the job...
# Add regex package
import re
stringA = "values-are-10,20,30,40,50"
stringB = "values-are-10"
#search using regex
A = re.findall('[0-9]+', stringA)
print(A) # ['10', '20', '30', '40', '50']
B = re.findall('[0-9]+', stringB)
print(B) # ['10']

pattern match get list and dict from string

I have string below,and I want to get list,dict,var from this string.
How can I to split this string to specific format?
s = 'list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}'
import re
m1 = re.findall (r'(?=.*,)(.*?=\[.+?\],?)',s)
for i in m1 :
print('m1:',i)
I only get result 1 correctly.
Does anyone know how to do?
m1: list_c=[1,2],
m1: a=3,b=1.3,c=abch,list_a=[1,2],
Use '=' to split instead, then you can work around with variable name and it's value.
You still need to handle the type casting for values (regex, split, try with casting may help).
Also, same as others' comment, using dict may be easier to handle
s = 'list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}'
al = s.split('=')
var_l = [al[0]]
value_l = []
for a in al[1:-1]:
var_l.append(a.split(',')[-1])
value_l.append(','.join(a.split(',')[:-1]))
value_l.append(al[-1])
output = dict(zip(var_l, value_l))
print(output)
You may have better luck if you more or less explicitly describe the right-hand side expressions: numbers, lists, dictionaries, and identifiers:
re.findall(r"([^=]+)=" # LHS and assignment operator
+r"([+-]?\d+(?:\.\d+)?|" # Numbers
+r"[+-]?\d+\.|" # More numbers
+r"\[[^]]+\]|" # Lists
+r"{[^}]+}|" # Dictionaries
+r"[a-zA-Z_][a-zA-Z_\d]*)", # Idents
s)
# [('list_c', '[1,2]'), ('a', '3'), ('b', '1.3'), ('c', 'abch'),
# ('list_a', '[1,2]'), ('dict_a', '{a:2,b:3}')]
The answer is like below
import re
from pprint import pprint
s = 'list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1],Save,Record,dict_a={a:2,b:3}'
m1 = re.findall(r"([^=]+)=" # LHS and assignment operator
+r"([+-]?\d+(?:\.\d+)?|" # Numbers
+r"[+-]?\d+\.|" # More numbers
+r"\[[^]]+\]|" # Lists
+r"{[^}]+}|" # Dictionaries
+r"[a-zA-Z_][a-zA-Z_\d]*)", # Idents
s)
temp_d = {}
for i,j in m1:
temp = i.strip(',').split(',')
if len(temp)>1:
for k in temp[:-1]:
temp_d[k]=''
temp_d[temp[-1]] = j
else:
temp_d[temp[0]] = j
pprint(temp_d)
Output is like
{'Record': '',
'Save': '',
'a': '3',
'b': '1.3',
'c': 'abch',
'dict_a': '{a:2,b:3}',
'list_a': '[1]',
'list_c': '[1,2]'}
Instead of picking out the types, you can start by capturing the identifiers. Here's a regex that captures all the identifiers in the string (for lowercase only, but see note):
regex = re.compile(r'([a-z]|_)+=')
#note if you want all valid variable names: r'([a-z]|[A-Z]|[0-9]|_)+'
cases = [x.group() for x in re.finditer(regex, s)]
This gives a list of all the identifiers in the string:
['list_c=', 'a=', 'b=', 'c=', 'list_a=', 'dict_a=']
We can now define a function to sequentially chop up s using the
above list to partition the string sequentially:
def chop(mystr, mylist):
temp = mystr.partition(mylist[0])[2]
cut = temp.find(mylist[1]) #strip leading bits
return mystr.partition(mylist[0])[2][cut:], mylist[1:]
mystr = s[:]
temp = [mystr]
mylist = cases[:]
while len() > 1:
mystr, mylist = chop(mystr, mylist)
temp.append(mystr)
This (convoluted) slicing operation gives this list of strings:
['list_c=[1,2],a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'a=3,b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'b=1.3,c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'c=abch,list_a=[1,2],dict_a={a:2,b:3}',
'list_a=[1,2],dict_a={a:2,b:3}',
'dict_a={a:2,b:3}']
Now cut off the ends using each successive entry:
result = []
for x in range(len(temp) - 1):
cut = temp[x].find(temp[x+1]) - 1 #-1 to remove commas
result.append(temp[x][:cut])
result.append(temp.pop()) #get the last item
Now we have the full list:
['list_c=[1,2]', 'a=3', 'b=1.3', 'c=abch', 'list_a=[1,2]', 'dict_a={a:2,b:3}']
Each element is easily parsable into key:value pairs (and is also executable via exec).

How to remove specific substrings from a set of strings in Python? [duplicate]

This question already has answers here:
Apply function to each element of a list
(4 answers)
Why doesn't calling a string method (such as .replace or .strip) modify (mutate) the string?
(3 answers)
Closed 5 months ago.
I have a set of strings and all the strings have one of two specific substrings which I want to remove:
set1 = {'Apple.good', 'Orange.good', 'Pear.bad', 'Pear.good', 'Banana.bad', 'Potato.bad'}
I want the ".good" and ".bad" substrings removed from all the strings. I tried this:
for x in set1:
x.replace('.good', '')
x.replace('.bad', '')
but it doesn't seem to work, set1 stays exactly the same. I tried using for x in list(set1) instead but that doesn't change anything.
Strings are immutable. str.replace creates a new string. This is stated in the documentation:
str.replace(old, new[, count])
Return a copy of the string with all occurrences of substring old replaced by new. [...]
This means you have to re-allocate the set or re-populate it (re-allocating is easier with a set comprehension):
new_set = {x.replace('.good', '').replace('.bad', '') for x in set1}
P.S. if you want to change the prefix or suffix of a string and you're using Python 3.9 or newer, use str.removeprefix() or str.removesuffix() instead:
new_set = {x.removesuffix('.good').removesuffix('.bad') for x in set1}
>>> x = 'Pear.good'
>>> y = x.replace('.good','')
>>> y
'Pear'
>>> x
'Pear.good'
.replace doesn't change the string, it returns a copy of the string with the replacement. You can't change the string directly because strings are immutable.
You need to take the return values from x.replace and put them in a new set.
In Python 3.9+ you could remove the suffix using str.removesuffix('mysuffix'). From the docs:
If the string ends with the suffix string and that suffix is not empty, return string[:-len(suffix)]. Otherwise, return a copy of the original string
So you can either create a new empty set and add each element without the suffix to it:
set1 = {'Apple.good', 'Orange.good', 'Pear.bad', 'Pear.good', 'Banana.bad', 'Potato.bad'}
set2 = set()
for s in set1:
set2.add(s.removesuffix(".good").removesuffix(".bad"))
Or create the new set using a set comprehension:
set2 = {s.removesuffix(".good").removesuffix(".bad") for s in set1}
print(set2)
Output:
{'Orange', 'Pear', 'Apple', 'Banana', 'Potato'}
All you need is a bit of black magic!
>>> a = ["cherry.bad","pear.good", "apple.good"]
>>> a = list(map(lambda x: x.replace('.good','').replace('.bad',''),a))
>>> a
['cherry', 'pear', 'apple']
When there are multiple substrings to remove, one simple and effective option is to use re.sub with a compiled pattern that involves joining all the substrings-to-remove using the regex OR (|) pipe.
import re
to_remove = ['.good', '.bad']
strings = ['Apple.good','Orange.good','Pear.bad']
p = re.compile('|'.join(map(re.escape, to_remove))) # escape to handle metachars
[p.sub('', s) for s in strings]
# ['Apple', 'Orange', 'Pear']
You could do this:
import re
import string
set1={'Apple.good','Orange.good','Pear.bad','Pear.good','Banana.bad','Potato.bad'}
for x in set1:
x.replace('.good',' ')
x.replace('.bad',' ')
x = re.sub('\.good$', '', x)
x = re.sub('\.bad$', '', x)
print(x)
# practices 2
str = "Amin Is A Good Programmer"
new_set = str.replace('Good', '')
print(new_set)
print : Amin Is A Programmer
I did the test (but it is not your example) and the data does not return them orderly or complete
>>> ind = ['p5','p1','p8','p4','p2','p8']
>>> newind = {x.replace('p','') for x in ind}
>>> newind
{'1', '2', '8', '5', '4'}
I proved that this works:
>>> ind = ['p5','p1','p8','p4','p2','p8']
>>> newind = [x.replace('p','') for x in ind]
>>> newind
['5', '1', '8', '4', '2', '8']
or
>>> newind = []
>>> ind = ['p5','p1','p8','p4','p2','p8']
>>> for x in ind:
... newind.append(x.replace('p',''))
>>> newind
['5', '1', '8', '4', '2', '8']
If list
I was doing something for a list which is a set of strings and you want to remove all lines that have a certain substring you can do this
import re
def RemoveInList(sub,LinSplitUnOr):
indices = [i for i, x in enumerate(LinSplitUnOr) if re.search(sub, x)]
A = [i for j, i in enumerate(LinSplitUnOr) if j not in indices]
return A
where sub is a patter that you do not wish to have in a list of lines LinSplitUnOr
for example
A=['Apple.good','Orange.good','Pear.bad','Pear.good','Banana.bad','Potato.bad']
sub = 'good'
A=RemoveInList(sub,A)
Then A will be

What's the best way to "periodically" replace characters in a string in Python?

I have a string where a character ('#') needs to be replaced by characters from a list of one or more characters "in order" and "periodically".
So for example I have
'ab#cde##fghi#jk#lmno###p#qrs#tuvwxy#z'
and want
'ab1cde23fghi1jk2lmno312p3qrs1tuvwxy2z'
for replace_chars = ['1', '2', '3']
The problem is that in this example there are more # in the string
than I have replacers.
This is my try:
result = ''
replace_chars = ['1', '2', '3']
string = 'ab#cde##fghi#jk#lmno###p#qrs#tuvwxy#z'
i = 0
for char in string:
if char == '#':
result += replace_chars[i]
i += 1
else:
result += char
print(result)
but this only works of course if there are not more than three # in the original string and otherwise I get IndexError.
Edit: Thanks for the answers!
Your code could be fixed by adding the line i = i%len(replace_chars) as the last line of your if clause. This way you will be taking the remainder from the division of i by the length of your list of replacement characters.
The shorter solution is to use a generator that periodically spits out replacement characters.
>>> from itertools import cycle
>>> s = 'ab#cde##fghi#jk#lmno###p#qrs#tuvwxy#z'
>>> replace_chars = ['1', '2', '3']
>>>
>>> replacer = cycle(replace_chars)
>>> ''.join([next(replacer) if c == '#' else c for c in s])
'ab1cde23fghi1jk2lmno312p3qrs1tuvwxy2z'
For each character c in your string s, we get the next replacement character from the replacer generator if the character is an '#', otherwise it just gives you the original character.
For an explanation why I used a list comprehension instead of a generator expression, read this.
Generators are fun.
def gen():
replace_chars = ['1', '2', '3']
while True:
for rc in replace_chars:
yield rc
with gen() as g:
s = 'ab#cde##fghi#jk#lmno###p#qrs#tuvwxy#z'
s = ''.join(next(g) if c == '#' else c for c in s)
As PM 2Ring suggested, this is functionally the same as itertools.cycle. The difference is that itertools.cycle will hold an extra copy of the list in memory which may not be necessary.
itertools.cycle source:
def cycle(iterable):
saved = []
for element in iterable:
yield element
saved.append(element)
while saved:
for element in saved:
yield element
You could also keep your index logic once you use modulo, using a list comp by using itertools.count to keep track of where you are:
from itertools import count
cn, ln = count(), len(replace_chars)
print("".join([replace_chars[next(cn) % ln] if c == "#" else c for c in string]))
ab1cde23fghi1jk2lmno312p3qrs1tuvwxy2z
I think it is better to not iterate character-by-character, especially for long string with lengthy parts without #.
from itertools import cycle, chain
s = 'ab#cde##fghi#jk#lmno###p#qrs#tuvwxy#z'
replace_chars = ['1', '2', '3']
result = ''.join(chain.from_iterable(zip(s.split('#'), cycle(replace_chars))))[:-1]
I don't know how to efficiently kill last char [:-1].

How to replace string to the other string in list (python)

What is the best way to replace every string in the list?
For example if I have a list:
a = ['123.txt', '1234.txt', '654.txt']
and I would like to have:
a = ['123', '1234', '654']
Assuming that sample input is similar to what you actually have, use os.path.splitext() to remove file extensions:
>>> import os
>>> a = ['123.txt', '1234.txt', '654.txt']
>>> [os.path.splitext(item)[0] for item in a]
['123', '1234', '654']
Use a list comprehension as follows:
a = ['123.txt', '1234.txt', '654.txt']
answer = [item.replace('.txt', '') for item in a]
print(answer)
Output
['123', '1234', '654']
Assuming that all your strings end with '.txt', just slice the last four characters off.
>>> a = ['123.txt', '1234.txt', '654.txt']
>>> a = [x[:-4] for x in a]
>>> a
['123', '1234', '654']
This will also work if you have some weird names like 'some.txtfile.txt'
You could split you with . separator and get first item:
In [486]: [x.split('.')[0] for x in a]
Out[486]: ['123', '1234', '654']
Another way to do this:
a = [x[: -len("txt")-1] for x in a]
What is the best way to replace every string in the list?
That completely depends on how you define 'best'. I, for example, like regular expressions:
import re
a = ['123.txt', '1234.txt', '654.txt']
answer = [re.sub('^(\w+)\..*', '\g<1>', item) for item in a]
#print(answer)
#['123', '1234', '654']
Depending on the content of the strings, you could adjust it:
\w+ vs [0-9]+ for only digits
\..* vs \.txt if all strings end with .txt
data.colname = [item.replace('anythingtoreplace', 'desiredoutput') for item in data.colname]
Please note here 'data' is the dataframe, 'colname' is the column name you might have in that dataframe. Even the spaces are accounted, if you want to remove them from a string or number. This was quite useful for me. Also this does not change the datatype of the column so you might have to do that separately if required.

Categories

Resources