Python Regex capture multiple sections within string - python

I have string that are always of the format track-a-b where a and b are integers.
For example:
track-12-29
track-1-210
track-56-1
How do I extract a and b from such strings in python?

If it's just a single string, I would approach this using split:
>>> s = 'track-12-29'
>>> s.split('-')[1:]
['12', '29']
If it is a multi-line string, I would use the same approach ...
>>> s = 'track-12-29\ntrack-1-210\ntrack-56-1'
>>> results = [x.split('-')[1:] for x in s.splitlines()]
[['12', '29'], ['1', '210'], ['56', '1']]

You'll want to use re.findall() with capturing groups:
results = [re.findall(r'track-(\d+)-(\d+)', datum) for datum in data]

Related

How do I collect values into a list in Python standard regex?

I have a string with repeated parts:
s = '[1][2][5] and [3][8]'
And I want to group the numbers into two lists using re.match. The expected result is:
{'x': ['1', '2', '5'], 'y': ['3', '8']}
I tried this expression that gives a wrong result:
re.match(r'^(?:\[(?P<x>\d+)\])+ and (?:\[(?P<y>\d+)\])+$', s).groupdict()
# {'x': '5', 'y': '8'}
It looks like re.match keeps the last match only. How do I collect all the parts into a list instead of the last one only?
Of course, I know that I could split the line on ' and ' separator and use re.findall for the parts instead, but this approach is not general enough because it gives some issues for more complex strings so I would always need to think about correct splitting separately all the time.
We can use regular expressions here. First, iterate the input string looking for matches of the type [3][8]. For each match, use re.findall to generate a list of number strings. Then, add a key whose value is that list. Note that we maintain a list of keys and pop each one when we use it.
import re
s = '[1][2][5] and [3][8]'
keys= ['x', 'y']
d = {}
for m in re.finditer('(?:\[\d+\])+', s):
d[keys.pop(0)] = re.findall(r'\d+', m.group())
print(d) # {'y': ['3', '8'], 'x': ['1', '2', '5']}
If you want to use the named capture groups, you can write the pattern like this repeating the digits between the square brackets inside the named group.
Then you can get the digits from the groupdict using re.findall on the values and first check if there is a match for the pattern:
^(?P<x>(?:\[\d+])+) and (?P<y>(?:\[\d+])+)$
See a regex demo
Example
import re
s = '[1][2][5] and [3][8]'
m = re.match(r'^(?P<x>(?:\[\d+])+) and (?P<y>(?:\[\d+])+)$', s)
if m:
dct = {k: re.findall(r"\d+", v) for k, v in m.groupdict().items()}
print(dct)
Output
{'x': ['1', '2', '5'], 'y': ['3', '8']}

python split string using alphabet to get only numerical values

Using python 3.8
Given str = A11B11C32D34,....
I want to split it into [11, 11, 32, 34 ...]. Meaning split using alphabets. How could I do this?
Thanks in advance!
Check with
s= 'A11B11C32D34'
s
Out[388]: 'A11B11C32D34'
import re
re.findall(r'\d+', s)
Out[390]: ['11', '11', '32', '34']
I might also suggest using a regex split approach here:
inp = "A11B11C32D34"
nums = [x for x in re.split(r'\D+', inp) if x]
print(nums) # ['11', '11', '32', '34']
The idea here is to split the string on any one or more collection of non digit characters. I also use a list comprehension to remove any leading/trailing empty entries in the output from re.split which might arise due to the string starting/ending with a non digit character.

How to split algebraic expressions in a string using python?

For example I get following input:
-9x+5x-2-4x+5
And I need to get following list:
['-9x', '5x', '-2', '-4x', '5']
Here is my code, but I don't know how to deal with minuses.
import re
text = '-3x-5x+2=9x-9'
text = re.split(r'\W', text)
print(text)
warning: I cannot use any libraries except re and math.
You could re.findall all groups of characters followed by + or - (or end-of-string $), then strip the + (which, like -, is still part of the following group) from the substrings.
>>> s = "-9x+5x-2-4x+5"
>>> [x.strip("+") for x in re.findall(r".+?(?=[+-]|$)", s)]
['-9x', '5x', '-2', '-4x', '5']
Similarly, for the second string with =, add that to the character group and also strip it off the substrings:
>>> s = '-3x-5x+2=9x-9'
>>> [x.strip("+=") for x in re.findall(r".+?(?=[+=-]|$)", s)]
>>> ['-3x', '-5x', '2', '9x', '-9']
Or apply the original comprehension to the substrings after splitting by =, depending on how the result should look like:
>>> [[x.strip("+") for x in re.findall(r".+?(?=[+-]|$)", s2)] for s2 in s.split("=")]
>>> [['-3x', '-5x', '2'], ['9x', '-9']]
In fact, now that I think of it, you can also just findall that match an optional minus, followed by some digits, and an optional x, with or without splitting by = first:
>>> [re.findall(r"-?\d+x?", s2) for s2 in s.split("=")]
[['-3x', '-5x', '2'], ['9x', '-9']]
One of many possible ways:
import re
term = "-9x+5x-2-4x+5"
rx = re.compile(r'-?\d+[a-z]?')
factors = rx.findall(term)
print(factors)
This yields
['-9x', '5x', '-2', '-4x', '5']
For your example data, you might split on either a plus or equals sign or split when asserting a minus sign on the right which is not at the start of the string.
[+=]|(?=(?<!^)-)
[+=] Match either + or =
| Or
(?=(?<!^)-) Positive lookahead, assert what is on the right is - but not at the start of the string
Regex demo | Python demo
Output for both example strings
['-9x', '5x', '-2', '-4x', '5']
['-3x', '-5x', '2', '9x', '-9']

How to remove specific strings from a list

From the following list how can I remove elements ending with Text.
My expected result is a=['1,2,3,4']
My List is a=['1,2,3,4,5Text,6Text']
Should i use endswith to go about this problem?
Split on commas, then filter on strings that are only digits:
a = [','.join(v for v in a[0].split(',') if v.isdigit())]
Demo:
>>> a=['1,2,3,4,5Text,6Text']
>>> [','.join(v for v in a[0].split(',') if v.isdigit())]
['1,2,3,4']
It looks as if you really wanted to work with lists of more than one element though, at which point you could just filter:
a = ['1', '2', '3', '4', '5Text', '6Text']
a = filter(str.isdigit, a)
or, using a list comprehension (more suitable for Python 3 too):
a = ['1', '2', '3', '4', '5Text', '6Text']
a = [v for v in a if v.isdigit()]
Use str.endswith to filter out such items:
>>> a = ['1,2,3,4,5Text,6Text']
>>> [','.join(x for x in a[0].split(',') if not x.endswith('Text'))]
['1,2,3,4']
Here str.split splits the string at ',' and returns a list:
>>> a[0].split(',')
['1', '2', '3', '4', '5Text', '6Text']
Now filter out items from this list and then join them back using str.join.
try this. This works with every text you have in the end.
a=['1,2,3,4,5Text,6Text']
a = a[0].split(',')
li = []
for v in a:
try : li.append(int(v))
except : pass
print li

Splitting a string into a list (but not separating adjacent numbers) in Python

For example, I have:
string = "123ab4 5"
I want to be able to get the following list:
["123","ab","4","5"]
rather than list(string) giving me:
["1","2","3","a","b","4"," ","5"]
Find one or more adjacent digits (\d+), or if that fails find non-digit, non-space characters ([^\d\s]+).
>>> string = '123ab4 5'
>>> import re
>>> re.findall('\d+|[^\d\s]+', string)
['123', 'ab', '4', '5']
If you don't want the letters joined together, try this:
>>> re.findall('\d+|\S', string)
['123', 'a', 'b', '4', '5']
The other solutions are definitely easier. If you want something far less straightforward, you could try something like this:
>>> import string
>>> from itertools import groupby
>>> s = "123ab4 5"
>>> result = [''.join(list(v)) for _, v in groupby(s, key=lambda x: x.isdigit())]
>>> result = [x for x in result if x not in string.whitespace]
>>> result
['123', 'ab', '4', '5']
You could do:
>>> [el for el in re.split('(\d+)', string) if el.strip()]
['123', 'ab', '4', '5']
This will give the split you want:
re.findall(r'\d+|[a-zA-Z]+', "123ab4 5")
['123', 'ab', '4', '5']
you can do a few things here, you can
1. iterate the list and make groups of numbers as you go, appending them to your results list.
not a great solution.
2. use regular expressions.
implementation of 2:
>>> import re
>>> s = "123ab4 5"
>>> re.findall('\d+|[^\d]', s)
['123', 'a', 'b', '4', ' ', '5']
you want to grab any group which is at least 1 number \d+ or any other character.
edit
John beat me to the correct solution first. and its a wonderful solution.
i will leave this here though because someone else might misunderstand the question and look for an answer to what i thought was written also. i was under the impression the OP wanted to capture only groups of numbers, and leave everything else individual.

Categories

Resources