RegEx phone number doesn't match - python

I am trying to match following formats:
06142/898-301
+49 6142 898-301
with this
(([+][\d]{2}[ ])|0)([\d]{4}/)([/d]{2,}[.-])+
Debuggex Demo
But after the area code before the / it won't match anymore. Why?

you mean this?
(([+][\d]{2}[ ])|0)([\d]{4}/)([\d]{2,}[.-])+
what I changed in your expression:
[/d]{2,} - > [\d]{2,} actually \d{2,} would do too

Looks like you want something more like this:
^(\+\d{2} |0)\d{4}[/ ]\d{3}[.-]\d{3}$
Example: http://regex101.com/r/qG2zY2
You don't need character classes defined for a single character, and you probably don't need all the capture groups either. I also added the anchor characters in there (^, $) but you can remove them if you're trying to pick this out of a larger string.

Related

Python regex expression example

I have an input that is valid if it has this parts:
starts with letters(upper and lower), numbers and some of the following characters (!,#,#,$,?)
begins with = and contains only of numbers
begins with "<<" and may contain anything
example: !!Hel##lo!#=7<<vbnfhfg
what is the right regex expression in python to identify if the input is valid?
I am trying with
pattern= r"([a-zA-Z0-9|!|#|#|$|?]{2,})([=]{1})([0-9]{1})([<]{2})([a-zA-Z0-9]{1,})/+"
but apparently am wrong.
For testing regex I can really recommend regex101. Makes it much easier to understand what your regex is doing and what strings it matches.
Now, for your regex pattern and the example you provided you need to remove the /+ in the end. Then it matches your example string. However, it splits it into four capture groups and not into three as I understand you want to have from your list. To split it into four caputre groups you could use this:
"([a-zA-Z0-9!##$?]{2,})([=]{1}[0-9]+)(<<.*)"
This returns the capture groups:
!!Hel##lo!#
=7
<<vbnfhfg
Notice I simplified your last group a little bit, using a dot instead of the list of characters. A dot matches anything, so change that back to your approach in case you don't want to match special characters.
Here is a link to your regex in regex101: link.

Regex: Another way to match structure separated by commas

I want to know if a string is a collection of, by example, numbers ([0-9]).
I this case, i'm using the regular expression [0-9](,[0-9])* to find one or more numbers separated by commas (A collection of numbers).
Is there a better way to do it? I mean a shorter expression perhaps.
I would suggest the following pattern:
(?<=^|,|\s)(\d+)
(?<=...) is a lookbehind assertion that will not be captured into the groups nor be included into the matched string. It is used to identify the starting position of the number to be matched.
You can try the above pattern interactively in the following website:
https://regex101.com/r/IKGWtA/1
\d*(,\d*)* will catch the situation where you have multiple digits before and after a comma e.g. 100,000. This regex will only grab 0,0 from that same number.

Regex (python) exclude some part of the replacement

For sure there are other ways to solve this, but I'm interested if this can be solved exclusively via regex. I have lines of text like this:
9,A
11,B
22,>
33,B
72,A
91,<
112,A
162,B
When I try to apply this replacement to basically "join" or erase the part between arrows and replace them with "+++":
re.sub(r'\>(\n\d.+)+<','+++',string_above)
I get this, which is fine:
9,A
11,B
22,+++
112,A
162,B
But what if want to keep that last number before the "<" sign and "X" last say, so to get something like this:
9,A
11,B
22,+++
91,X
112,A
162,B
How can I do that?
In this concrete case, you may replace with
r'+++\1X'
See the regex demo
If X is a digit, replace with
r'+++\g<1>X'
The \1 and \g<1> are called replacement backreferences, these refer to the capturing group #1 value.

Regex to match all occurences of dots within enclosing dollar signs

I need to replace all dots in a string which are enclosed by dollar signs.
There is no nested structure so I think regular expressions are the right tool for this.
An example string looks like this:
asdf $asdf.asdf.$ $..asdf$
The regex I came up with matches the part within the dollar signs, but I want a match for each dot within the dollar signs (example):
\$([^$]*)\$
so for the example string it should yield four matches. How can I achieve that?
Since you are using Python, the easiest solution is to use your pattern to match the substrings from $ to $, and replace . with anything you want with a lambda:
import re
s = "a.sdf $asdf.asdf.$. . .$..asdf$"
r = re.compile(r'\$([^$]*)\$')
print(r.sub(lambda m: m.group().replace('.',''), s))
# => a.sdf $asdfasdf$. . .$asdf$
See the IDEONE demo
I think this will be very hard with a regex, since you have to count the dollar signs in some sense (you want only to call every second gap between two dollar signs "enclosed", and the others "outside" right?)
So it seems easier to (python example)
do a mystring.split('$'), which gives you a list
then take every second item, e.g. newlist = oldlist[1::2]
count the dots (''.join(newlist).count('.'))

Regular expressions and how to get captured values

custom = 'number=value1;user=value2;yr=value3'
number=re.findall('number=(.+?);',custom)
user=re.findall('user=(.+?);',custom)
yr=re.findall('yr=(.+?)[;\w]',custom))
outcome:
print number
value1
I am trying to extract value of number, user,and yr from custom. It is working except 'yr', because since 'yr' is last word it does not end with ';'. I tried adding \w, but not working. Is there way to add ends with either ';' or end of string? I could search for custom[-1], but I just want to know how to do in regex, and yr is not always last; number or user can be last sometimes.
You can leverage the regex lookbehind and use a regex like this:
(?<==)(\w+)
Working demo
So, you can use this regex for each case:
(?<=number=)(\w+)
(?<=user=)(\w+)
(?<=yr=)(\w+)
You can have your code as this:
custom = 'number=value1;user=value2;yr=value3'
number=re.findall('(?<=number=)(\w+)',custom)
user=re.findall('(?<=user=)(\w+)',custom)
yr=re.findall('(?<=yr=)(\w+)',custom))
outcome:
print number
value1
Update: as CommuSoft pointed in his comment, the regex won't capture the content if you have spaces. So, you can improve the regex by using:
(?<==)([^;]+)
So, you can have for each parameter something like this:
(?<=number=)([^;]+)
(?<=user=)([^;]+)
(?<=yr=)([^;]+)
Working demo
\w matches any word character but you want to match the end of the string.
You can use instead:
yr=(.+?)(?:;|$)
Also for learning/debugging regexes there are regex testers like this one:
https://regex101.com/
\w means a word character. Now since you made the regex "ungreedy" the regex wants to terminate the group as soon as possible, so it will match only the first character, and match the remainder with \w. You can however use:
(;|$)
So this results in:
yr=re.findall('yr=(.+?)(?:;|$)',custom)
which gives the correct result
The reason the ?: is added in the front is because you don't want to capture it (show it in the output).
Try this:
number, user, yr = re.findall('(?<==)[^;]+', custom)
print number, user, yr
Result: value1 value2 value3

Categories

Resources