Python Regex: Ignore Brackets [duplicate] - python

This question already has answers here:
How can I remove text within parentheses with a regex?
(9 answers)
Closed 4 years ago.
LIST = ['ichenbsdr1.chen.com', 'ichenbsds1(SSI15170CCD)',
'ichenbsds1', 'ichenbsdm2.chen.com',
'ichenbsdm2.chen.com(ABQB344DEGH)', 'ichenbsdm2']
Need to filter using regex on above list. whichever the index got
brackets need to be removed with the information. LIST[1] is
'ichenbsds1(SSI15170CCD)', have to remove "(SSI15170CCD)" and show
'ichenbsds1' alone same as in LIST[4] as well.
I have this regex r'(.*?)\(.*\)' to remove brackets and whatever
present inside those brackets. But when i run in the below script its
not giving exact output.
sws=[]
for line in LIST:
Type = re.search(r'(.*?)\(.*\)', line)
sws.append(Type)
print (sws)
Expected Output:
['ichenbsdr1.chen.com', 'ichenbsds1', 'ichenbsds1', 'ichenbsdm2.chen.com', 'ichenbsdm2.chen.com', 'ichenbsdm2']

Use re.sub to remove everything between parenthesis
>>> [re.sub(r'\(.*?\)', '', s) for s in LIST]
['ichenbsdr1.chen.com', 'ichenbsds1', 'ichenbsds1', 'ichenbsdm2.chen.com', 'ichenbsdm2.chen.com', 'ichenbsdm2']

Related

How to check and remove '/' and '-' from a list of words at the same time [duplicate]

This question already has answers here:
Split Strings into words with multiple word boundary delimiters
(31 answers)
Split string with multiple delimiters in Python [duplicate]
(5 answers)
Closed 3 years ago.
I have the following words in a list
listx=['info/base','tri-gen']
I am trying to remove both the '/' and '-' at the same time.
Currently I have two separate blocks of code (mentioned below) which achieve the above
listx=['info/base','tri-gen']
if '/' in listx:
listmain= '/'.join(listx).split('/')
listmain = list(filter(None, listmain))
if '-' in listx:
listmain= '-'.join(listx).split('-')
listmain = list(filter(None, listmain))
How do I achieve it in a single if condition or is there a way to include many conditions for e.g like below
'-','/'.join(listx).split('-','/')
Expected output
listx=['info base','tri gen']
The quick way to do this is using the re module, which provides you with regex magic. Feel free to read the documentation: https://docs.python.org/3/library/re.html
import re
listx=['info/base','tri-gen']
[re.sub("\/|\-"," ",i) for i in listx]
Output:
['info base', 'tri gen']
EDIT
For your comment, I think you can get away without an if statement.
This regex will find all the words you need while ignoring the ones in parenthesis:
\b\w+\b(?![\(\w+\)])
See it at work: https://regex101.com/r/YqhJDb/1
You can implement something like this:
[" ".join(re.findall(r"\b\w+\b(?![\(\w+\)])", i)) for i in listx]
Output:
['info base', 'tri gen', 'century tech limited']

re.findall() returning empty element on using \w*$ while it is expected to return only last word [duplicate]

This question already has answers here:
String.replaceAll(regex) makes the same replacement twice
(2 answers)
Python regex: greedy pattern returning multiple empty matches
(1 answer)
Closed 3 years ago.
I was trying the python regular expression and was trying to get first and last word from the string using ^\w* and \w*$. But I am getting the following results:
>>> re.findall(r'^\w*', 'This is a test string')
['This']
#to get the last word
>>> re.findall(r'\w*$', 'This is a test string')
['string', '']
can someone explain how this regexp works and why i am getting the empty element after string (['string', '']).
Note: It works with ^\w+ and \w+$.

What is wrong with this Python regular expression? [duplicate]

This question already has answers here:
re.findall behaves weird
(3 answers)
Closed 4 years ago.
Given a string, I want to find all the substrings consisting of two or three '4,'.
For example, given '1,4,3,2,1,1,4,4,3,2,1,4,4,3,2,1,4,4,4,3,2,'
I want to get ['4,4,', '4,4,', '4,4,4'].
str_ = '1,4,4,3,2,1,1,4,4,3,2,1,4,4,3,2,1,4,4,3,2,'
m = re.findall(r"(4,){2,3}", str_)
what I get is :
['4,', '4,', '4,', '4,']
what's wrong?
It seems to me that the parenthesis wrapping '4,' is interpreted as grouping but not telling Python '4' and ',' should occur together. However, I don't know how to do this.
Just use non-capturing group (online version of this regex here):
import re
s = '1,4,3,2,1,1,4,4,3,2,1,4,4,3,2,1,4,4,4,3,2,'
print(re.findall(r'(?:4,?){2,3}', s))
Prints:
['4,4,', '4,4,', '4,4,4,']
EDIT:
Edited regex to capture 2 or 3 elements "4,"

How to remove a character from a string until certain index? [duplicate]

This question already has answers here:
Remove characters from beginning and end or only end of line
(5 answers)
Closed 4 years ago.
So, I have the following string "........my.python.string" and I want to remove all the "." until it gets to the first alphanumeric character, is there a way to achieve this other than converting the string to a list and work it from there?
You can use re.sub:
import re
s = "........my.python.string"
new_s = re.sub('^\.+', '', s)
print(new_s)
Output:
my.python.string

Removing contents within brackets in string [duplicate]

This question already has answers here:
replace string in pandas dataframe
(3 answers)
Closed 6 years ago.
I want to remove the brackets and contents withing the brackets from a string.
I tried following code:
a['Street Name'].str.replace('\(.*)','')
But it is not working. Can anybody please tell me what is wrong with this statement?
Try this:
import re
s = "I want to remove all words in brackets( like (this) and ((this)) and ((even) this))."
while True:
s_new = re.sub(r'\([^\(]*?\)', r'', s)
if s_new == s:
break
s = s_new
print(s_new) # I want to remove all words in brackets.

Categories

Resources