python split string using alphabet to get only numerical values - python

Using python 3.8
Given str = A11B11C32D34,....
I want to split it into [11, 11, 32, 34 ...]. Meaning split using alphabets. How could I do this?
Thanks in advance!

Check with
s= 'A11B11C32D34'
s
Out[388]: 'A11B11C32D34'
import re
re.findall(r'\d+', s)
Out[390]: ['11', '11', '32', '34']

I might also suggest using a regex split approach here:
inp = "A11B11C32D34"
nums = [x for x in re.split(r'\D+', inp) if x]
print(nums) # ['11', '11', '32', '34']
The idea here is to split the string on any one or more collection of non digit characters. I also use a list comprehension to remove any leading/trailing empty entries in the output from re.split which might arise due to the string starting/ending with a non digit character.

Related

Extract digits from string with consecutive digit characters

I cannot use Regular Expressions or library :(. I need to extract all digits from an alphanumeric string. Each consecutive sequence of digits (we can call "temperature") is precluded by a (+, -, or *) and will be considered as a single number (all are integers, no float). There are other non digit characters in the string that can be ignored. I need to extract each "temperature" into a data structure.
Example String "BARN21+77-48CDAIRY87+56-12" yields [21, 77, 48, 87, 56, 12]
The data string can be many many magnitudes larger.
All solutions I can find assume there is only 1 sequence of digits (temperature) in the string or that the (temperatures) are separated by a space/delimiter. I was able to get working by iterating through string and adding a space before and after each digit sequence and then using split but that feels like cheating. I wonder if you professionals distort data for a happy solution??
incoming data "BARN21+77-48CDAIRY87+56-12"
temp is what I change data to
temp = "BARN* 21 + 77 - 48 DAIRY* 87 + 56 - 12"
result = [int(i)
for i in temp.split()
if i.isdigit()]
print("The result ", result)
The result [21, 77, 48, 87, 56, 12]
Here is a version which does not use regular expressions:
inp = "BARN21+77-48CDAIRY87+56-12"
inp = ''.join(' ' if not ch.isdigit() else ch for ch in inp).strip()
nums = inp.split()
print(nums) # ['21', '77', '48', '87', '56', '12']
If regex be available for you, we can use re.findall with the regex pattern \d+:
inp = "BARN21+77-48CDAIRY87+56-12"
nums = re.findall(r'\d+', inp)
print(nums) # ['21', '77', '48', '87', '56', '12']

Remove leading '0's from all strings in a list

My code
l=['99','08','096']
for i in l:
if i.startswith('0'):
i.replace('0','')
print(l)
Output
l=['99','08','096']
I just want to remove the leading '0's from all the strings:
l=['99','8','96']
You can use str.lstrip() for this in a list comprehension - to remove all leading '0's from each string.
>>> l = ['99', '08', '096']
>>> [x.lstrip('0') for x in l]
['99', '8', '96']
This has the added benefit of not removing instances of '0' from within the string, only from the front.
>>> l=['99', '08', '096', '0102']
>>> [x.lstrip('0') for x in l]
['99', '8', '96', '102']
If I understood you correctly, you want to remove the string from the array if it starts with 0, but from the code you showed, you are just replacing the 0 in the string with nothing. i.replace('0','')
one way to filter out is:
new_list = [el for el in l if not el.startswith('0')]
you can use list comprehension to correct your strings by using str.lstrip:
[e.lstrip('0') for e in l]
or you can use a for loop:
for i in range(len(l)):
if l[i].startswith('0'):
l[i] = l[i][1:]

How to split algebraic expressions in a string using python?

For example I get following input:
-9x+5x-2-4x+5
And I need to get following list:
['-9x', '5x', '-2', '-4x', '5']
Here is my code, but I don't know how to deal with minuses.
import re
text = '-3x-5x+2=9x-9'
text = re.split(r'\W', text)
print(text)
warning: I cannot use any libraries except re and math.
You could re.findall all groups of characters followed by + or - (or end-of-string $), then strip the + (which, like -, is still part of the following group) from the substrings.
>>> s = "-9x+5x-2-4x+5"
>>> [x.strip("+") for x in re.findall(r".+?(?=[+-]|$)", s)]
['-9x', '5x', '-2', '-4x', '5']
Similarly, for the second string with =, add that to the character group and also strip it off the substrings:
>>> s = '-3x-5x+2=9x-9'
>>> [x.strip("+=") for x in re.findall(r".+?(?=[+=-]|$)", s)]
>>> ['-3x', '-5x', '2', '9x', '-9']
Or apply the original comprehension to the substrings after splitting by =, depending on how the result should look like:
>>> [[x.strip("+") for x in re.findall(r".+?(?=[+-]|$)", s2)] for s2 in s.split("=")]
>>> [['-3x', '-5x', '2'], ['9x', '-9']]
In fact, now that I think of it, you can also just findall that match an optional minus, followed by some digits, and an optional x, with or without splitting by = first:
>>> [re.findall(r"-?\d+x?", s2) for s2 in s.split("=")]
[['-3x', '-5x', '2'], ['9x', '-9']]
One of many possible ways:
import re
term = "-9x+5x-2-4x+5"
rx = re.compile(r'-?\d+[a-z]?')
factors = rx.findall(term)
print(factors)
This yields
['-9x', '5x', '-2', '-4x', '5']
For your example data, you might split on either a plus or equals sign or split when asserting a minus sign on the right which is not at the start of the string.
[+=]|(?=(?<!^)-)
[+=] Match either + or =
| Or
(?=(?<!^)-) Positive lookahead, assert what is on the right is - but not at the start of the string
Regex demo | Python demo
Output for both example strings
['-9x', '5x', '-2', '-4x', '5']
['-3x', '-5x', '2', '9x', '-9']

How to remove alphabets and extract numbers using regex in python?

How to remove alphabets and extract numbers using regex in python?
import re
l=["098765432123 M","123456789012"]
s = re.findall(r"(?<!\d)\d{12}", l)
print(s)
Expected Output:
123456789012
If all you want is to have filtered list, consisting elements with pure digits, use filter with str.isdigit:
list(filter(str.isdigit, l))
Or as #tobias_k suggested, list comprehension is always your friend:
[s for s in l if s.isdigit()]
Output:
['123456789012']
I would suggest to use a negative lookahead assertion, if as stated you want to use regex only.
l=["098765432123 M","123456789012"]
res=[]
for a in l:
s = re.search(r"(?<!\d)\d{12}(?! [a-zA-Z])", a)
if s is not None:
res.append(s.group(0))
The result would then be:
['123456789012']
To keep only digits you can do re.findall('\d',s), but you'll get a list:
s = re.findall('\d', "098765432123 M")
print(s)
> ['0', '9', '8', '7', '6', '5', '4', '3', '2', '1', '2', '3']
So to be clear, you want to ignore the whole string if there is a alphabetic character in it? Or do you still want to extract the numbers of a string with both numbers and alphabetic characters in it?
If you want to find all numbers, and always find the longest number use this:
regex = r"\d+"
matches = re.finditer(regex, test_str, re.MULTILINE)
\d will search for digits, + will find one or more of the defined characters, and will always find the longest consecutive line of these characters.
If you only want to find strings without alphabets:
import re
regex = r"[a-zA-Z]"
test_str = ("098765432123 M", "123456789012")
for x in test_str:
if not re.search(regex, x):
print(x)

Python Regex capture multiple sections within string

I have string that are always of the format track-a-b where a and b are integers.
For example:
track-12-29
track-1-210
track-56-1
How do I extract a and b from such strings in python?
If it's just a single string, I would approach this using split:
>>> s = 'track-12-29'
>>> s.split('-')[1:]
['12', '29']
If it is a multi-line string, I would use the same approach ...
>>> s = 'track-12-29\ntrack-1-210\ntrack-56-1'
>>> results = [x.split('-')[1:] for x in s.splitlines()]
[['12', '29'], ['1', '210'], ['56', '1']]
You'll want to use re.findall() with capturing groups:
results = [re.findall(r'track-(\d+)-(\d+)', datum) for datum in data]

Categories

Resources