How to set end of string in Python regex? - python

I have the following list
lst = ['BILL_FROM:', 'MyCompany', '525._S._Lexington_Ave.', 'Burlington._NC._2725', 'United_States', 'musicjohnliofficial#gmail.com', 'BILL_TO:', 'O.Relly', '343._S._Lexington_Ave.', 'Burlington._NC._2725', 'United_States', 'musicjohnliofficial#gmail.com', 'INVOICE_number', '01', 'INVOICE_DATE', '2022-12-27', 'AMOUNT_DUE', '1.128', 'SUBTOTAL', '999.00', 'TAX_(13.0%)', '129.87', 'TOTAL', '1.128']
And I want to get it's BILL_TO: field using regex.
I'm trying to do
>>> bill_to = re.compile("(\w+)to$", re.IGNORECASE)
>>> list(filter(bill_to.match, lst))
to get ['BILL_TO:'] field only, but instead getting
['Burlington._NC._2725', 'BILL_TO:', 'Burlington._NC._2725', 'SUBTOTAL']
Why the $ symbol is not working here? Or am I doing something else wrong?
Thank you

The $ will match the end of the string, but you have a : beforehand which you also need to match:
(\w+)to:$
Also, it's recommended to use a raw string to escape the \ (notice the r):
bill_to = re.compile(r"(\w+)to:$", re.IGNORECASE)

Related

extract and filter values from string list python

so I have an array that looks like the one below. the "error" substring always starts with this character a special character "‘" so I was able to just get the errors with something like this
a = [' 276ARDUINO_i2c.c:70:27: error: ‘ARDUINO_I2C_nI2C', ' 248rpy_i2c.h:76:40: error: ‘RPY_I2C_BASE_ADDR_LIST', ' 452rpy_i2c.c:79:77: error: ‘RPY_I2C_IRQ_LIST']
newlist = [x.split('‘')[1] for x in a]
print(newlist)
and the output would look like this
['ARDUINO_I2C_nI2C', 'RPY_I2C_BASE_ADDR_LIST', 'RPY_I2C_IRQ_LIST']
but now, i also need to get the name of the file related to that error. The name of the file always start with a numeric substring that I also need to remove. the output I want would look like this
['ARDUINO_i2c.c', 'ARDUINO_I2C_nI2C'], ['rpy_i2c.h', 'RPY_I2C_BASE_ADDR_LIST'], ['rpy_i2c.c','RPY_I2C_IRQ_LIST']
I'll apreciate any suggestions. thanks.
You could use a regular expression to capture the required parts of your string. For example, the following regex (Try it online):
\d+([^:]+):.*‘(.*)$
Explanation:
-----------
\d+ : One or more numbers
( ) ( ) : Capturing groups
[^:]+ : One or more non-colon characters (in capturing group 1)
: : One colon
.* : Any number of any character
‘ : The ‘ character
.* : Any number of any character (in capturing group 2)
$ : End of string
To use it:
import re
regex = re.compile(r"\d+([^:]+):.*‘(.*)$")
newlist = [regex.search(s).groups() for s in a]
which gives a list of tuples:
[('ARDUINO_i2c.c', 'ARDUINO_I2C_nI2C'),
('rpy_i2c.h', 'RPY_I2C_BASE_ADDR_LIST'),
('rpy_i2c.c', 'RPY_I2C_IRQ_LIST')]
If you really want a list of lists, you can convert the result of .groups() to a list:
newlist = [list(regex.search(s).groups()) for s in a]
I have created this code to get the exact result as you like but there could be more efficient ways too. I have split the values and used regex to get the needed result.
import re
a = [' 276ARDUINO_i2c.c:70:27: error: ‘ARDUINO_I2C_nI2C', '248rpy_i2c.h:76:40: error: ‘RPY_I2C_BASE_ADDR_LIST', ' 452rpy_i2c.c:79:77: error: ‘RPY_I2C_IRQ_LIST']
r=[]
for x in a:
d=x.split(": error: ‘")
r.append([re.sub("[0-9]{3}","",d[0].split(":")[0].strip()),d[1]])
print(r)

Remove brackets and number inside from string Python

I've seen a lot of examples on how to remove brackets from a string in Python, but I've not seen any that allow me to remove the brackets and a number inside of the brackets from that string.
For example, suppose I've got a string such as "abc[1]". How can I remove the "[1]" from the string to return just "abc"?
I've tried the following:
stringTest = "abc[1]"
stringTestWithoutBrackets = str(stringTest).strip('[]')
but this only outputs the string without the final bracket
abc[1
I've also tried with a wildcard option:
stringTest = "abc[1]"
stringTestWithoutBrackets = str(stringTest).strip('[\w+\]')
but this also outputs the string without the final bracket
abc[1
You could use regular expressions for that, but I think the easiest way would be to use split:
>>> stringTest = "abc[1][2][3]"
>>> stringTest.split('[', maxsplit=1)[0]
'abc'
You can use regex but you need to use it with the re module:
re.sub(r'\[\d+\]', '', stringTest)
If the [<number>] part is always at the end of the string you can also strip via:
stringTest.rstrip('[0123456789]')
Though the latter version might strip beyond the [ if the previous character is in the strip list too. For example in "abc1[5]" the "1" would be stripped as well.
Assuming your string has the format "text[number]" and you only want to keep the "text", then you could do:
stringTest = "abc[1]"
bracketBegin = stringTest.find('[')
stringTestWithoutBrackets = stringTest[:bracketBegin]

How to count the number of double and triple repetitions of a letter in a string without the two counts overlapping? [duplicate]

I am trying to replace single $ characters with something else, and want to ignore multiple $ characters in a row, and I can't quite figure out how. I tried using lookahead:
s='$a $$b $$$c $d'
re.sub('\$(?!\$)','z',s)
This gives me:
'za $zb $$zc zd'
when what I want is
'za $$b $$$c zd'
What am I doing wrong?
notes, if not using a callable for the replacement function:
you would need look-ahead because you must not match if followed by $
you would need look-behind because you must not match if preceded by $
not as elegant but this is very readable:
>>> def dollar_repl(matchobj):
... val = matchobj.group(0)
... if val == '$':
... val = 'z'
... return val
...
>>> import re
>>> s = '$a $$b $$$c $d'
>>> re.sub('\$+', dollar_repl, s)
'za $$b $$$c zd'
Hmm. It looks like I can get it to work if I used both lookahead and lookbehind. Seems like there should be an easier way, though.
>>> re.sub('(?<!\$)\$(?!\$)','z',s)
'za $$b $$$c zd'
Ok, without lookaround and without callback function:
re.sub('(^|[^$])\$([^$]|$)', '\1z\2', s)
An alternative with re.split:
''.join('z' if x == '$' else x for x in re.split('(\$+)', s))

How can split string in python and get result with delimiter?

I have code like
a = "*abc*bbc"
a.split("*")#['','abc','bbc']
#i need ["*","abc","*","bbc"]
a = "abc*bbc"
a.split("*")#['abc','bbc']
#i need ["abc","*","bbc"]
How can i get list with delimiter in python split function or regex or partition ?
I am using python 2.7 , windows
You need to use RegEx with the delimiter as a group and ignore the empty string, like this
>>> [item for item in re.split(r"(\*)", "abc*bbc") if item]
['abc', '*', 'bbc']
>>> [item for item in re.split(r"(\*)", "*abc*bbc") if item]
['*', 'abc', '*', 'bbc']
Note 1: You need to escape * with \, because RegEx has special meaning for *. So, you need to tell RegEx engine that * should be treated as the normal character.
Note 2: You ll be getting an empty string, when you are splitting the string where the delimiter is at the beginning or at the end. Check this question to understand the reason behind it.
import re
x="*abc*bbc"
print [x for x in re.split(r"(\*)",x) if x]
You have to use re.split and group the delimiter.
or
x="*abc*bbc"
print re.findall(r"[^*]+|\*",x)
Or thru re.findall
Use partition();
a = "abc*bbc"
print (a.partition("*"))
>>>
('abc', '*', 'bbc')
>>>

Using parentheses as delimiter in re or str.split() python

I am trying to split a string such as: add(ten)sub(one) into add(ten) sub(one).
I can't figure out how to match the close parentheses. I have used re.sub(r'\\)', '\\) ') and every variation of escaping the parentheses,I can think of. It is hard to tell in this font but I am trying to add a space between these commands so I can split it into a list later.
There's no need to escape ) in the replacement string, ) has a special a special meaning only in the regex pattern so it needs to be escaped there in order to match it in the string, but in normal string it can be used as is.
>>> strs = "add(ten)sub(one)"
>>> re.sub(r'\)(?=\S)',r') ', strs)
'add(ten) sub(one)'
As #StevenRumbalski pointed out in comments the above operation can be simply done using str.replace and str.rstrip:
>>> strs.replace(')',') ').strip()
'add(ten) sub(one)'
d = ')'
my_str = 'add(ten)sub(one)'
result = [t+d for t in my_str.split(d) if len(t) > 0]
result = ['add(ten)','sub(one)']
Create a list of all substrings
import re
a = 'add(ten)sub(one)'
print [ b for b in re.findall('(.+?\(.+?\))', a) ]
Output:
['add(ten)', 'sub(one)']

Categories

Resources