Extracting a string within a string and omitting the search string

Extracting a string within a string and omitting the search string - python

I have a string:
string="soupnot$23.99dedarikjdf$44.65 notworryfence$98.44coyoteugle$33.94rock$2,300.00"
I want to extract the numbers 23.99, 44.65, 98.44,33.44, 2,300.00. I have this regex
\$(.*[^\s])
There are 2 issues with this.
It returns the '$' sign. I only want the number.
It only works when there is a space at the end of the number but sometimes there might be letters and it won't work in that case.
Thanks.

You can use regex as shown:
import re
string="soupnot$23.99dedarikjdf$44.65 notworryfence$98.44coyoteugle$33.94rock$2,300.00"
res = re.findall(pattern="[\d.,]+", string=string)
output:
['23.99', '44.65', '98.44', '33.94', '2,300.00']

Try this regex:
(?<=\$)\d+(?:,\d+)*(?:\.\d+)?
Click for Demo
Explanation
(?<=\$) - positive lookbehind to find the position just preceded by a $
\d+ - matches 1+ occurrences of a digit
(?:,\d+)* - matches 0+ occurrences of a , followed by 1 or more digits
(?:\.\d+)? - matches a . followed by 1+ digits. ? in the end makes this decimal part optional

Related

Regex - Regular expression for counting no.of digits between alphabets

Need to construct a regular expression that counts numbers between alphabets.
schowalte3rguss77ie85 - 2
xyz1zyx - 1
x1y1z1 - 2
I have constructed this . But this doesn't work for case 3.
[[a-z]+[0-9]+[a-z]]*
Any help would be appreciated. Thanks in advance.

Use regx:
(?<=[a-z])\d+(?=[a-z])
Demo: https://regex101.com/r/tpss6x/1
[Javascript]

If you want a count only, the last part should be a lookahead assertion.
If you want to also match uppercase chars, you can make the pattern case insensitive.
[a-z]\d+(?=[a-z])
Explanation
[a-z] Match a single char a-z
\d+ Match 1+ digits
(?=[a-z]) Positive lookahead, assert a char a-z to the right
Regex demo

You can use
(?<=[^\W\d_])\d+(?=[^\W\d_])
See the regex demo. If you want to only support ASCII letters, replace [^\W\d_] (that matches any Unicode letter) with [a-zA-Z].
Details:
(?<=[^\W\d_]) - immediately before the current location, there must be any Unicode letter
\d+ - one or more digits
(?=[^\W\d_]) - immediately after the current location, there must be any Unicode letter.
Counting can be done with len(...), see this Python demo:
import re
text = "schowalte3rguss77ie85"
matches = re.findall(r'(?<=[^\W\d_])\d+(?=[^\W\d_])', text)
print(len(matches)) # => 2

How do I write a Regex in Python to remove leading zeros for a number in the middle of a string

I have a string composed of both letters followed by a number, and I need to remove all the letters, as well as the leading zeros in the number.
For example: in the test string U012034, I want to match the U and the 0 at the beginning of 012034.
So far I have [^0-9] to match the any characters that aren't digits, but I can't figure out how to also remove the leading zeros in the number.
I know I could do this in multiple steps with something like int(re.sub("[^0-9]", "", test_string) but I need this process to be done in one regex.

You can use
re.sub(r'^\D*0*', '', text)
See the regex demo. Details
^ - start of string
\D* - any zero or more non-digit chars
0* - zero or more zeros.
See Python demo:
import re
text = "U012034"
print( re.sub(r'^\D*0*', '', text) )
# => 12034
If there is more text after the first number, use
print( re.sub(r'^\D*0*(\d+).*', r'\1', text) )
See this regex demo. Details:
^ - start of string
\D* - zero or more non-digits
0* - zero or more zeros
(\d+) - Group 1: one or more digits (use (\d+(?:\.\d+)?) to match float or int values)
`.* - the rest of the string.
The replacement is the Group 1 value.

You may use this re.sub in Python:
string = re.sub(r'^[a-zA-Z]*0*|[a-zA-Z]+', '', string)
RegEx Demo
Explanation:
^: Start
[a-zA-Z]*: Match 0 or more letters
0*L: Match 0 or more zeroes
|: OR
[a-zA-Z]+: Match 1+ of letters

Does this do what you need?
re.sub("[^0-9]+0*", "", "U0123")
>>> '123'

Not able get desired output after string parsing through regex

input =
6:/BENM/Gravity Exports/REM//INV: 3267/FEB20:65:ghgh
6:/BENM/Tabuler Trading/REM//IMP/2020-341
original_regex = 6:[A-Za-z0-9 \/\.\-:] - bt this is taking full string 6:/BENM/Gravity Exports/REM//INV: 3267/FEB20:65:ghgh
modified_regex_pattern = 6:[A-Za-z0-9 \/\.\-:]{1,}[\/-:]
In the first string i want output till
6:/BENM/Gravity Exports/REM//INV: 3267/FEB20
but its giving till :65:
Can anyone suggest better way to write this.
Example as below
https://regex101.com/r/pAduvy/1

You could for example use a capturing group with an optional part at the end to match the :digits:a-z part.
(6:[A-Za-z0-9 \/.:-]+?)(?::\d+:[a-z]+)?$
( Capture group 1
6:[A-Za-z0-9 \/.:-]+? Match any of the listed in the character class as least as possible
) Close group 1
(?::\d+:[a-z]+)? optionally match the part at the end that you don't want to include
$ End of string
Regex demo
Note Not sure if intended, but the last part of your pattern [\/-:] denotes a range from ASCII range 47 - 58.
Or a more precise pattern to get the match only
6:/\w+/\w+ \w+/[A-Z]+//[A-Z]+(?:: \d+)?/[A-Z]*\d+(?:-\d+)?
6:/\w+/\w+ Match 6 and 2 times / followed by 1+ word chars and a space
\w+/[A-Z]+//[A-Z]+ Match 1+ word chars, / and uppercase chars, // and again uppercase chars
(?:: \d+)? Optionally match a space and 1+ digits
/[A-Z]*\d+ Match /, optional uppercase chars and 1+ digits
(?:-\d+)? Optionally match - and 1+ digits
Regex demo

RegEx for matching two digits and everything except new lines and dot

Using python v3, I'm trying to find a string only if it contains one to two digits (and not anymore than that in the same number) along with everything else following it. The match breaks on periods or new lines.
\d{1,2}[^.\n]+ is almost right except it returns numbers greater than two digits.
For example:
"5+years {} experience. stop.
10 asdasdas . 255
1abc1
5555afasfasf++++s()(jn."
Should return:
5+years {} experience
10 asdasdas
1abc1

Based upon your description and your sample data, you can use following regex to match the intended strings and discard others,
^\d[^\d.]*\d?[^\d.\n]*(?=\.|$)
Regex Explanation:
^ - Start of line
\d - Matches a digit
[^\d.]* - This matches any character other than digit or dot zero or more times. This basically allows optionally matching of non-digit non-dot characters.
\d? - As you want to allow one or two digits, this is the second digit which is optional hence \d followed by ?
[^\d.\n]* - This matches any character other than digit or dot or newline
(?=\.|$) - This positive look ahead ensures, the match either ends with a dot or end of line
Also, notice, multiline mode is enabled as ^ and $ need to match start of line and end of line.ad
Regex Demo 1
Code:
import re
s = '''5+years {} experience. stop.
10 asdasdas . 255
1abc1
5555afasfasf++++s()(2jn.'''
print(re.findall(r'(?m)^\d[^\d.]*\d?[^\d.\n]*(?=\.|$)', s))
Prints:
['5+years {} experience', '10 asdasdas ', '1abc1']
Also, if matching lines doesn't necessarily start with digits, you can use this regex to capture your intended string but here you need to get your string from group1 if you want captured string to start with number only, and if intended string doesn't necessarily have to start with digits, then you can capture whole match.
^[^\d\n]*(\d[^\d.]*\d?[^\d.\n]*)(?=\.|$)
Regex Explanation:
^ - Start of line
[^\d\n]* - Allows zero or more non-digit characters before first digit
( - Starts first grouping pattern to capture the string starting with first digit
\d - Matches a digit
[^\d.]* - This matches any character other than digit or dot zero or more times. This basically allows optionally matching of non-digit non-dot characters.
\d? - As you want to allow one or two digits, this is the second digit which is optional hence \d followed by ?
[^\d.\n]* - This matches any character other than digit or dot or newline
`) - End of first capturing pattern
(?=\.|$) - This positive look ahead ensures, the match either ends with a dot or end of line
Multiline mode is enabled which you can enable by placing (?m) before start of regex also called inline modifier or by passing third argument to re.search as re.MULTILINE
Regex Demo 2
Code:
import re
s = '''5+years {} experience. stop.
10 asdasdas . 255
1abc1
aaa1abc1
aa2aa1abc1
5555afasfasf++++s()(2jn.'''
print(re.findall(r'(?m)^[^\d\n]*(\d[^\d.]*\d?[^\d.\n]*)(?=\.|$)', s))
Prints:
['5+years {} experience', '10 asdasdas ', '1abc1', '1abc1']

Regex matching digit in between?

I would like to get number in between these strings.
strings = ["point_right: account ISLAMIC: 860328 9221 asdsad",
"account 723123123",
"account823123213",
"account 823.123.213",
"account 823-123-213",
"account:123213123 ",
"account: 123213123 asdasdsad 017-299906",
"account: 123213123",
"point_right: account ISLAMIC: 860328 9221"
]
Result would be
[860328 9221,723123123, 823123213, 823.123.213, 823-123-213, 123213123, 123213123, 123213123]
And i can do processing later to make them into number. So far my strategy is to get everything after pattern and anything before a letter. I have tried:
for string in strings:
print(re.findall("(?<=account)(.*)", string.lower()))
Please help to give some pointers on the regex match.

Try this pattern:
(?=[^0-9]*)[0-9][0-9 .-]*[0-9]
Breakdown:
(?=[^0-9]*) Lookahead for a word, such as "account", non-matching
[0-9] Find a digit
[0-9 .-]* Find any number of digits or special characters (in your strings you have spaces, dashes, periods so I included those)
[0-9] Find another digit (to prevent spaces at the end)
Check it out here, and sample code here

(?!\W)([\d\s.-]+)(?<!\s)
The negative lookahead and lookbehind seems like overkills here but I wasn't able to get a clean match otherwise. You may see the results here
(?!\W) Negative lookahead to exclude any non-word characters [^a-zA-Z0-9_]
([\d\s.-]+) The capturing group for your numbers
(?<!\s) Negative lookbehind to exclude whitespace characters [\r\n\t\f\v ]

If the numbers must be the first numbers after the account substring use
re.findall("account\D*([\d\s.-]*\d)", s)
See the Python demo and the regex demo.
Pattern details
account - a literal substring
\D* - 0+ chars other than digits
([\d\s.-]*\d) - Capturing group 1 (the value returned by re.findall): 0 or more digits, whitespaces, . and - chars followed with a digit.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extracting a string within a string and omitting the search string - python

You can use regex as shown: import re string="soupnot$23.99dedarikjdf$44.65 notworryfence$98.44coyoteugle$33.94rock$2,300.00" res = re.findall(pattern="[\d.,]+", string=string) output: ['23.99', '44.65', '98.44', '33.94', '2,300.00']

Related

Regex - Regular expression for counting no.of digits between alphabets

How do I write a Regex in Python to remove leading zeros for a number in the middle of a string

Not able get desired output after string parsing through regex

RegEx for matching two digits and everything except new lines and dot

Regex matching digit in between?

Categories

Resources