How to match regex to line ending in python [duplicate] - python

This question already has an answer here:
Regular expression works on regex101.com, but not on prod
(1 answer)
Closed 2 years ago.
I'm trying to get python regex to match the end of a string (primarily because I want to remove a common section off the end of the string. I have the following code which I think is how the docs describe to do it, but it's not performing as I'm expecting:
input_value = "Non-numeric qty or weight, from 00|XFX|201912192009"
pattern = ", from .*$"
match = re.match(pattern , input_value)
print(match)
The result is None, however I'm expecting to have matched something. I've also tested these values with an online regex tool: https://regex101.com/ using the python flavour, and it works as expected.
What am I doing wrong?

match = re.match(".*, from.*$", input_value)
you should use .* infront else it will try to fin exact match

Related

Find string, extract value [duplicate]

This question already has answers here:
Extract part of a regex match
(11 answers)
Closed 3 years ago.
I'm trying to parse HTML in Python that has an inline script in it. I need to find a string inside of the script, then extract the value. I've been trying to do this in regex for the past few hours, but I'm still not convinced this is the correct approach.
Here is a sample:
['key_to_search_for']['post_date'] = '10 days ago';
The result I want to extract is: 10 days ago
This regex gets me part of the way, but I can't figure out the full match:
^\[\'key_to_search_for\'\]\[\'post_date\'\] = '(\d{1,2})+( \w)
Regex playground
However, even once I can match with regex, I'm not sure the best way to get only the value. I was thinking of just replacing the keys with blanks, like .replace('['key_to_search_for']['post_date'] = '',''), but that seems inefficient.
Should I be matching the regex then replacing? Is there a better way to handle this?
You can extract the value using a single capturing group and match the 2 words using a quantifier for \w+.
The value is in capture group 1.
^\['key_to_search_for'\]\['post_date'\] = '(\d{1,2} \w+ \w+)';$
Regex demo
Or use a negated character class matching any char except a '
^\['key_to_search_for'\]\['post_date'\] = '([^']+)';$
Regex demo

Remove everything after regex pattern match but keep pattern [duplicate]

This question already has answers here:
Using regex to remove all text after the last number in a string
(2 answers)
Closed 4 years ago.
I was searching for a way to remove all characters past a certain pattern match. I know that there are many similar questions here on SO but i was unable to find one that works for me. Basically i have a fixed pattern (\w\w\d\d\d\d), and i want to remove everything after that, but keep the pattern.
ive tried using:
test = 'PP1909dfgdfgd'
done = re.sub ('(\w\w\d\d\d\d/w*)', '\w\w\d\d\d\d/', test)
but still get the same string ..
example:
dirty = 'AA1001dirtydata'
dirty2 = 'AA1001222%^&*'
Desired output:
clean = 'AA1001'
You can use re.match() instead of re.sub():
re.match('\w\w\d\d\d\d', dirty).group(0) # returns 'AA1001'
Note: match will look for the regular expression at the beginning of the string you provide and only "match" the characters corresponding to the pattern. If you want to find the pattern partway through the string you can use re.search().

Make part of a regex match in python optional [duplicate]

This question already has answers here:
How to use regex with optional characters in python?
(5 answers)
Closed 5 years ago.
I'm trying to match a URL using re but am having trouble in regards to making part of the match optional.
import re
x = raw_input('Link: ')
reg = '(http|https)://(iski|www\.iskis|iskis)\.(in|com)/[A-Za-z0-9?&=/?_]+'
if re.match(reg, x):
print 'True'
Currently, the above code would match something like:
https://iskis.com/?loc=shop_view_item&item=220503032
I would like to alter the regular expression to make the following, [A-Za-z0-9?&=/?_]+ an option - As such, anything after the slash isn't required, so the following should match:
https://iskis.com
I'm sure there is a simple solution but I don't know how to go about solving this.
reg = '(http|https)://(iski|www\.iskis|iskis)\.(in|com)(/[A-Za-z0-9?&=/?_]+)?$'
Should do it. Surround the character class with () so it's a group, put a ? after it to make the text match 0-1 instances of that group, and put a $ at the end so that the regex will match to the end.
EDIT:
Come to think of it, you could use the optional match elsewhere in your regex.
reg = '(https?)://(www\.)?(iskis?)\.(in|com)(/[A-Za-z0-9?&=/?_]+)?$'

regular expression match using python for string with multiple spaces and special character [duplicate]

This question already has answers here:
What is the difference between re.search and re.match?
(9 answers)
Closed 7 years ago.
match() in following string,
string = "(branch=MAIN). See the error log at /home/aswamy/run/test_upgrade/2.0-285979.customer_deployment.22499/test.log.2\n"
m = re.match("See the error",string)
print m ==> (Here m always shows None)
But if I use the same string without any spaces between (branch=MAIN), then match happens as below,
string = "See the error log at /home/kperiyaswamy/runmass/mass_test_upgrade/7.2.0-285979.customer_deployment.22499/infoblox.log.2\n"
m = re.match("See the error",string)
print m ===> (works proper <_sre.SRE_Match object at 0x7fe813825030>)
So if there is a multiple white spaces pattern match doesn't work. Please let me know how to solve above issue
Its not about the whitespaces.match always starts to match from the beginning of the string.In 2nd case the string is at the start.So you get the match.In first case it isnt so you dont get a match.Use findall if you want to get a match in any case.

understanding this python regular expression re.compile(r'[ :]') [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 8 years ago.
Hi I am trying to understand python code which has this regular expression re.compile(r'[ :]'). I tried quite a few strings and couldnt find one. Can someone please give example where a text matches this pattern.
The expression simply matches a single space or a single : (or rather, a string containing either). That’s it. […] is a character class.
The [] matches any of the characters in the brackets. So [ :] will match one character that is either a space or a colon.
So these strings would have a match:
"Hello World"
"Field 1:"
etc...
These would not
"This_string_has_no_spaces_or_colons"
"100100101"
Edit:
For more info on regular expressions: https://docs.python.org/2/library/re.html

Categories

Resources