Regular expression to find even/odd number - python

I used codes below to find out even numbers from a string and returned nothing.
could anyone tell me what I missed? Thank you very much.
import re
str2 = "adbv345hj43hvb42"
even_number = re.findall('/^[0-9]*[02468]$/', str2 )

In python you should not wrap expression with slashes ('/^[0-9]*[02468]$/' -> '^[0-9]*[02468]$')
$ and ^ are used to match the beginning and the end of string (or line in MULTILINE regex). But your example doesn't look you need to ('^[0-9]*[02468]$'' -> '[0-9]*[02468]')
After that you need to stop matching only prefixes ('[0-9]*[02468]' -> r'[0-9]*[02468](?![0-9])')
That's it :)

Your re matches:
Start of string
0 or more digits 0,1,2,3,4,5,6,7,8 or 9
One even number
End of string
That does not match your string, you should drop the begin of string ^ and end of string $ markers
To find an even number, just match any number of digits that ends with an even number '/[0-9]*02468/'

Not sure what exactly you want to extract from the string, but in order to match single even numbers use such syntax: [02468] (find one of the present in the list).

Related

How to extract date in yyyy-yyyy format using regex python

I know this is basic but can someone please provide a regex solution to extract "1234-5678" out of "abcfd1234-5678gfvjh". Here the leading and trailing strings can be anything and they might not be there always i.e. the string can be just "1234-5678" as well. It is guaranteed that there will no be alphabet between the numbers only "-" can be there. There is one more format of the string "1234-56". i.e. the second number can be of length 2 or 4. Please see the below explanation:
input :a = "abcfd1234-5678gfvjh"
output :"1234-5678"
input :a = "abcfd1234-56gfvjh"
output :"1234-56"
input :a = "1234-5678hgjg"
output :"1234-5678"
input :a = "abcfd1234-5678"
output :"1234-5678"
input :a = "1234-56"
output :"1234-56"
\d{4}[-–](?:\d{4}|\d{2})
See an explanation here: https://regex101.com/r/kocRuY/2
Basically we say to search for four digits then a hyphen then either (using a non-capturing group to bracket) four digits or, failing that, two digits.
You should use the regex "search" method rather than "match" method as the processor will have to find where the sequence starts in the string. If you are restricted to matching from the start with "match", then you could add some sort of quantifier at the start to gobble up the start characters.
>>> import re
>>> re.findall('\d+-\d+', "abcfd1234-5678gfvjh")
['1234-5678']
you can try different regexes in https://regex101.com/
Surely a dozen duplicates on StackOverflow.
As the request occurs very often, there's a module called datefinder (pip install datefinder). You'd then call it like this:
import datefinder
matches = datefinder.find_dates(your_string_here)
for match in matches:
print (match)

regex expression to get all digits before full stop

I wish to do as my title said but I cant seem to be able to do it.
string = "tex3591.45" #please be aware that my digit is in half-width
text_temp = re.findall("(\d.)", string)
My current output is:
['35', '91', '45']
My expected output is:
['3591.'] # with the "." at the end of the integer. No matter how many integer infront of this full stop
You need to escape the .:
text_temp = re.findall(r"\d+\.", string)
since . is a special character in regex, which matches any character. Added the + also to match 1 or more digits.
Or if you actually are using 'FULLWIDTH FULL STOP' (U+FF0E) you can just use the special character in the regex without escaping it:
text_temp = re.findall(r"\d+.", string)
You can use this regex along with re.findall to get your desired result
\d(?=.*?.)
will generate individual digits as answer
Demo in regex 101
\d+(?=.*?.)
Demo2
This will generate a bunch of numbers as one string
I used a positive lookahead and a greedy matching to check if there is a full stop after a certain digit and then give output. Hope this helps :).

Dynamically Removing string with regex python

I am currently having trouble removing the end of strings using regex. I have tried using .partition with unsuccessful results. I am now trying to use regex unsuccessfully. All the strings follow the format of some random words **X*.* Some more words. Where * is a digit and X is a literal X. For Example 21X2.5. Everything after this dynamic string should be removed. I am trying to use re.sub('\d\d\X\d.\d', string). Can someone point me in the right direction with regex and how to split the string?
The expected output should read:
some random words 21X2.5
Thanks!
Use following regex:
re.search("(.*?\d\dX\d\.\d)", "some random words 21X2.5 Some more words").groups()[0]
Output:
'some random words 21X2.5'
Your regex is not correct. The biggest problem is that you need to escape the period. Otherwise, the regex treats the period as a match to any character. To match just that pattern, you can use something like:
re.findall('[\d]{2}X\d\.\d', 'asb12X4.4abc')
[\d]{2} matches a sequence of two integers, X matches the literal X, \d matches a single integer, \. matches the literal ., and \d matches the final integer.
This will match and return only 12X4.4.
It sounds like you instead want to remove everything after the matched expression. To get your desired output, you can do something like:
re.split('(.*?[\d]{2}X\d\.\d)', 'some random words 21X2.5 Some more words')[1]
which will return some random words 21X2.5. This expression pulls everything before and including the matched regex and returns it, discarding the end.
Let me know if this works.
To remove everything after the pattern, i.e do exactly as you say...:
s = re.sub(r'(\d\dX\d\.\d).*', r'\1', s)
Of course, if you mean something else than what you said, something different will be needed! E.g if you want to also remove the pattern itself, not just (as you said) what's after it:
s = re.sub(r'\d\dX\d\.\d.*', r'', s)
and so forth, depending on what, exactly, are your specs!-)

Python regex for int with at least 4 digits

I am just learning regex and I'm a bit confused here. I've got a string from which I want to extract an int with at least 4 digits and at most 7 digits. I tried it as follows:
>>> import re
>>> teststring = 'abcd123efg123456'
>>> re.match(r"[0-9]{4,7}$", teststring)
Where I was expecting 123456, unfortunately this results in nothing at all. Could anybody help me out a little bit here?
#ExplosionPills is correct, but there would still be two problems with your regex.
First, $ matches the end of the string. I'm guessing you'd like to be able to extract an int in the middle of the string as well, e.g. abcd123456efg789 to return 123456. To fix that, you want this:
r"[0-9]{4,7}(?![0-9])"
^^^^^^^^^
The added portion is a negative lookahead assertion, meaning, "...not followed by any more numbers." Let me simplify that by the use of \d though:
r"\d{4,7}(?!\d)"
That's better. Now, the second problem. You have no constraint on the left side of your regex, so given a string like abcd123efg123456789, you'd actually match 3456789. So, you need a negative lookbehind assertion as well:
r"(?<!\d)\d{4,7}(?!\d)"
.match will only match if the string starts with the pattern. Use .search.
You can also use:
re.findall(r"[0-9]{4,7}", teststring)
Which will return a list of all substrings that match your regex, in your case ['123456']
If you're interested in just the first matched substring, then you can write this as:
next(iter(re.findall(r"[0-9]{4,7}", teststring)), None)

python regular expression matching either all uppercase letter and number or just numbers

import re
re.compile(([0-9]|[A-Z0-9]))
Is this the correct way about doing it?
Thank you!
You need to provide re.compile() a string, and your current regular expression will only match a single character, try changing it to the following:
import re
pattern = re.compile(r'^[A-Z\d]+$')
Now you can test strings to see if the match this pattern by using pattern.match(some_string).
Note that I used a raw string literal, which ensures the proper handling of backslashes.
The ^ at the beginning and $ at the end are called anchors, ^ matches only at the beginning of the string and $ matches only at the end of the string, they are necessary since you specified you want to only match strings that are entirely uppercase characters or digits, otherwise you could just match a substring.
Correct way is:
re.compile(r'^[A-Z\d]+$')

Categories

Resources