Regular expression to match an empty string? - python

I want to match and group any of these listed words:
aboutus/,race/,cruise/,westerlies/,weather/,reach/,gear/ or empty_string
Here is a solution, but which will not match the empty_string:
^(aboutus|race|cruise|westerlies|weather|reach|gear)/$
So my question is: How to include Empty string in this matching?
I still don't get a good solution for this.
So I added one more regex specially for empty_string:ie ^$.
Note: these regular expression is for django urls.py.
update: It will be better if the capturing group does not contain /

try this:
^(aboutus|race|cruise|westerlies|weather|reach|gear)?/$
edit:
if '/' is in every case except the empty string try this
^((aboutus|race|cruise|westerlies|weather|reach|gear)(/))?$

Use this
^$|^(aboutus|race|cruise|westerlies|weather|reach|gear)/$

You can make the capturing group optional:
^(aboutus|race|cruise|westerlies|weather|reach|gear)?/$

import re
rgx = re.compile('^((aboutus|race|cruise|westerlies'
'|weather|reach|gear)/|)$')
# or
li = ['aboutus','race','cruise','westerlies',
'weather','reach','gear']
rgx = re.compile('^((%s)/|)$' % '|'.join(li))
for s in ('aboutus/',
'westerlies/',
'westerlies/ ',
''):
m = rgx.search(s)
print '%-21r%r' % (s,rgx.search(s).group() if m else m)
result
'aboutus/' 'aboutus/'
'westerlies/' 'westerlies/'
'westerlies/ ' None
'' ''

Related

Regex : replace url inside string

i have
string = 'Server:xxx-zzzzzzzzz.eeeeeeeeeee.frPIPELININGSIZE'
i need a python regex expression to identify xxx-zzzzzzzzz.eeeeeeeeeee.fr to do a sub-string function to it
Expected output :
string : 'Server:PIPELININGSIZE'
the URL is inside a string, i tried a lot of regex expressions
Not sure if this helps, because your question was quite vaguely formulated. :)
import re
string = 'Server:xxx-zzzzzzzzz.eeeeeeeeeee.frPIPELININGSIZE'
string_1 = re.search('[a-z.-]+([A-Z]+)', string).group(1)
print(f'string: Server:{string_1}')
Output:
string: Server:PIPELININGSIZE
No regex. single line use just to split on your target word.
string = 'Server:xxx-zzzzzzzzz.eeeeeeeeeee.frPIPELININGSIZE'
last = string.split("fr",1)[1]
first =string[:string.index(":")]
print(f'{first} : {last}')
Gives #
Server:PIPELININGSIZE
The wording of the question suggests that you wish to find the hostname in the string, but the expected output suggests that you want to remove it. The following regular expression will create a tuple and allow you to do either.
import re
str = "Server:xxx-zzzzzzzzz.eeeeeeeeeee.frPIPELININGSIZE"
p = re.compile('^([A-Za-z]+[:])(.*?)([A-Z]+)$')
m = re.search(p, str)
result = m.groups()
# ('Server:', 'xxx-zzzzzzzzz.eeeeeeeeeee.fr', 'PIPELININGSIZE')
Remove the hostname:
print(f'{result[0]} {result[2]}')
# Output: 'Server: PIPELININGSIZE'
Extract the hostname:
print(result[1])
# Output: 'xxx-zzzzzzzzz.eeeeeeeeeee.fr'

Delete substring not matching regex in Python

I have a string like:
'class="a", class="b", class="ab", class="body", class="etc"'
I want to delete everything except class="a" and class="b".
How can I do it? I think the problem is easy but I'm stuck.
Here is some one of my attempts but it didn't solve my problem:
re.sub(r'class="also"|class="etc"', '', a)
My string is a very long HTML code with a lot of classes and I want to only keep two of them and drop all the others.
Some times its good to make a break. I found solution for me with bleach
def filter_class(name, value):
if name == 'class' and value == 'aaa':
return True
attrs = {
'div': filter_class,
}
bleach.clean(html, tags=('div'), attributes=attrs, strip_comments=True)
You tried to explicitly enumerate those substrings you wanted to delete. Rather than writing such long patterns, you can just use negative lookaheads that provide a means to add exclusions to some more generic pattern.
Here is a regex you can use to remove those substrings in a clean way and disregarding order:
,? ?\bclass="(?![ab]")[^"]+"
See regex demo
Here, with (?![ab]")[^"]+, we match 1 or more characters other than " ([^"]+), but not those equal to a or b ((?![ab]")).
Here is a sample code:
import re
p = re.compile(r',? ?\bclass="(?![ab]")[^"]+"')
test_str = "class=\"a\", class=\"b\", class=\"ab\", class=\"body\", class=\"etc\"\nclass=\"b\", class=\"ab\", class=\"body\", class=\"etc\", class=\"a\"\nclass=\"b\", class=\"ab\", class=\"body\", class=\"a\", class=\"etc\""
result = re.sub(p, '', test_str)
print(result)
See IDEONE demo
NOTE: If instead of a and b you have longer sequences, use a (?!(?:a|b) non-capturing group in the look-ahead instead of a character class:
,? ?\bclass="(?!(?:arbuz|baklazhan)")[^"]+"
See another demo
another pretty simple solution.. good luck.
st = 'class="a", class="b", class="ab", class="body", class="etc"'
import re
res = re.findall(r'class="[a-b]"', st)
print res
'['class="a"', 'class="b"']'
you can use re.sub very easily
res = re.sub(r'class="[a-zA-Z][a-zA-Z].*"', "", st)
print res
class="a", class="b"
If you only wanted to keep the first two entries, one approach would be to use the split() function. This will split your string into a list at given separator points. In your case, this could be a comma. The first two list elements can then be joined back together with commas.
text = 'class="a", class="b", class="ab", class="body", class="etc"'
print ",".join(text.split(",")[:2])
Would give class="a", class="b"
If the entries can be anywhere, and for an arbitrary list of wanted classes:
def keep(text, keep_list):
keep_set = set(re.findall("class\w*=\w*[\"'](.*?)[\"']", text)).intersection(set(keep_list))
output_list = ['class="%s"' % a_class for a_class in keep_set]
return ', '.join(output_list)
print keep('class="a", class="b", class="ab", class="body", class="etc"', ["a", "b"])
print keep('class="a", class="b", class="ab", class="body", class="etc"', ["body", "header"])
This would print:
class="a", class="b"
class="body"

RegEx For Multiple Search & Replace

I'm trying to do a search and replace (for multiple chars) in the following string:
VAR=%2FlkdMu9zkpE8w7UKDOtkkHhJlYZ6CaEaxqmsA%2B7G3e8%3D&
One or more of these characters: %3D, %2F, %2B, %23, can be found anywhere (beginning, middle, or end of the string) and ideally, I'd like to search for all of them at once (using one regex) and replace them with = or / or + or # respectively, then return the final string.
Example 1:
VAR=%2FlkdMu9zkpE8w7UKDOtkkHhJlYZ6CaEaxqmsA%2B7G3e8%3D&
Should return
VAR=/lkdMu9zkpE8w7UKDOtkkHhJlYZ6CaEaxqmsA+7G3e8=&
Example 2:
VAR=s2P0n6I%2Flonpj6uCKvYn8PCjp%2F4PUE2TPsltCdmA%3DRQPY%3D&
Should return
VAR=s2P0n6I/lonpj6uCKvYn8PCjp/4PUE2TPsltCdmA=RQPY=&
I'm not convinced you need regex for this, but it's fairly easy to do with Python:
x = 'VAR=%2FlkdMu9zkpE8w7UKDOtkkHhJlYZ6CaEaxqmsA%2B7G3e8%3D&'
import re
MAPPING = {
'%3D': '=',
'%2F': '/',
'%2B': '+',
'%23': '#',
}
def replace(match):
return MAPPING[match.group(0)]
print x
print re.sub('%[A-Z0-9]{2}', replace, x)
Output:
VAR=%2FlkdMu9zkpE8w7UKDOtkkHhJlYZ6CaEaxqmsA%2B7G3e8%3D&
VAR=/lkdMu9zkpE8w7UKDOtkkHhJlYZ6CaEaxqmsA+7G3e8=&
There is no need for a regex to do that in your example. A simple replace method will do:
def rep(s):
for pat, txt in [['%2F','/'], ['%2B','+'], ['%3D','='], ['%23','#']]:
s = s.replace(pat, txt)
return s
I'm also not convinced you need regex, but there's a better way to do url-decode with regex. Basically you need that every string in the pattern of %XX will be converted into the char it represents. This can be done with re.sub() like so:
>>> VAR="%2FlkdMu9zkpE8w7UKDOtkkHhJlYZ6CaEaxqmsA%2B7G3e8%3D&"
>>> re.sub(r'%..', lambda x: chr(int(x.group()[1:], 16)), VAR)
'/lkdMu9zkpE8w7UKDOtkkHhJlYZ6CaEaxqmsA+7G3e8=&'
Enjoy.
var = "VAR=s2P0n6I%2Flonpj6uCKvYn8PCjp%2F4PUE2TPsltCdmA%3DRQPY%3D&"
var = var.replace("%2F", "/")
var = var.replace("%2B", "+")
var = var.replace("%3D", "=")
but you got same result with urllib2.unquote
import urllib2
var = "VAR=s2P0n6I%2Flonpj6uCKvYn8PCjp%2F4PUE2TPsltCdmA%3DRQPY%3D&"
var = urllib2.unquote(var)
This can't be done with a regex because there's no way to write any kind of conditional inside of a regex. Regular expressions can only answer the question "Does this string match this pattern?" and not perform the operation "If this string matches this pattern, replace part of it with this. If it matches this pattern, replace it with this. etc..."

Using parentheses as delimiter in re or str.split() python

I am trying to split a string such as: add(ten)sub(one) into add(ten) sub(one).
I can't figure out how to match the close parentheses. I have used re.sub(r'\\)', '\\) ') and every variation of escaping the parentheses,I can think of. It is hard to tell in this font but I am trying to add a space between these commands so I can split it into a list later.
There's no need to escape ) in the replacement string, ) has a special a special meaning only in the regex pattern so it needs to be escaped there in order to match it in the string, but in normal string it can be used as is.
>>> strs = "add(ten)sub(one)"
>>> re.sub(r'\)(?=\S)',r') ', strs)
'add(ten) sub(one)'
As #StevenRumbalski pointed out in comments the above operation can be simply done using str.replace and str.rstrip:
>>> strs.replace(')',') ').strip()
'add(ten) sub(one)'
d = ')'
my_str = 'add(ten)sub(one)'
result = [t+d for t in my_str.split(d) if len(t) > 0]
result = ['add(ten)','sub(one)']
Create a list of all substrings
import re
a = 'add(ten)sub(one)'
print [ b for b in re.findall('(.+?\(.+?\))', a) ]
Output:
['add(ten)', 'sub(one)']

Regex for fixed string followed by characters up to a delimiter character

I've exhausted myself trying to build either the right regex or just to remove the comma from the file that I"m parsing. Essentially, I am matching a specific string, then anything that follows that string, up to the comma. I need up to get the substring before the comma, not including the comma. I suppose I can do this either with the regex or remove the comma further down in the code.
I'm pretty new at this, so probably basic stuff, but can't seem to find the right thing in my searches
Here is my code:
import re
str = "FullName=TECIBW04 TECIBW04, TargetResource=k2vFe6yPvBoEdrmrE9t3i5UE2muLVW,"
match = re.search(r'FullName=.+?,', str)
if match:
print match.group() ##'found a match'
else:
print 'ainnussin zer'
I get:
FullName=TECIBW04 TECIBW04,
Great...I'm getting back what I need (and a little extra). I actually don't want the comma.
What's the best method to get rid of or not include that sucker?
Since, comma , is the delimiter here, just negate it in your regex as
match = re.search(r'FullName=[^,]+', str)
Put everything except comma in a saving group:
match = re.search(r'(FullName=.+?),', str)
if match: print match.group(1) ##'found a match' else: print 'ainnussin zer'
prints
FullName=TECIBW04 TECIBW04
How about using split on ,?
str.split(',')[0]
Edit
These are ways to do it without the regex.
For checking if the string starts with another substring, you can use
if str.startswith("FullName="):
print str.split(',')[0]
else:
print "ainnussin zer"
For doing this in one line, you can try
print str.split(',')[0] if str.startswith("FullName=") else "ainnussin zer"
You can also use str.partition:
>>> str = "FullName=TECIBW04 TECIBW04, TargetResource=k2vFe6yPvBoEdrmrE9t3i5UE2muLVW,"
>>> str.partition(',')
('FullName=TECIBW04 TECIBW04', ',', ' TargetResource=k2vFe6yPvBoEdrmrE9t3i5UE2muLVW,')
>>> str.partition(',')[0]
'FullName=TECIBW04 TECIBW04'
If you are going to use a regex, I would use this:
match=re.search(r'^FullName=[^,]+',str)
if match:
print match.group(0) ##'found a match'
else:
print 'ainnussin zer'
Or this if you are just trying to capture the RH of the =:
match=re.search(r'^FullName=([^,]+)',str)
if match:
print match.group(1) ##'found a match'
else:
print 'ainnussin zer'
Place what you want to match inside of a capturing group.
m = re.search(r'(FullName=.*?),', str)
if m:
print m.group(1)

Categories

Resources