Python REGEX to exclude beggining of string [duplicate] - python

This question already has answers here:
Match text between two strings with regular expression
(3 answers)
Closed 3 years ago.
Given the following string:
dpkg.log.looker.test.2019-09-25
I'd want to be able to extract:
looker.test
or
looker.
I have been trying multiple combinations but none that actually extract only the hostname. If I try to filter the whole beggining of the file (dpkg.log.), it also ignores the subsequent characters:
/[^dpkg.log].+(?=.[0-9]{4}-[0-9]{2}-[0-9]{2})/
returns:
er.test
Is there a way to ignore the whole string "dpkg.log" without ignoring the subsequent repeated characters?

Maybe, the following expression would be working just OK with re.findall:
[^.]+\.[^.]+\.(.+)\.\d{2,4}-\d{2}-\d{2}
Demo
Test
import re
regex = r'[^.]+\.[^.]+\.(.+)\.\d{2,4}-\d{2}-\d{2}'
string = '''
dpkg.log.looker.test.2019-09-25
dpkg.log.looker.test1.test2.2019-09-25
'''
print(re.findall(regex, string))
Output
['looker.test', 'looker.test1.test2']

Related

How to find a character in string and replace it and the following one in python [duplicate]

This question already has answers here:
How to input a regex in string.replace?
(7 answers)
Closed 1 year ago.
I'm using an API that returns § characters with a color code (1-9 or a-h) which I want to eliminate (§ and following character). Their purpose is for color formatting but I'm not using that and my method iterates through a string to remove them but could not find a better way and it fees too hacky and buggy. Is there like a parameter for the str.replace function that removes the letter after the found character?
You can "eliminate" the precise pattern with regular expressions using the sub method:
import re
def clean_string(s):
return re.sub(r"\$[1-9a-h]", "", s)
clean_string("foo $4 - bar $f")
# > 'foo - bar '
If you want more flexibility, you can match any non whitespace character following $ with \S:
import re
def clean_string(s):
return re.sub(r"\$\S", "", s)

Remove everything after regex pattern match but keep pattern [duplicate]

This question already has answers here:
Using regex to remove all text after the last number in a string
(2 answers)
Closed 4 years ago.
I was searching for a way to remove all characters past a certain pattern match. I know that there are many similar questions here on SO but i was unable to find one that works for me. Basically i have a fixed pattern (\w\w\d\d\d\d), and i want to remove everything after that, but keep the pattern.
ive tried using:
test = 'PP1909dfgdfgd'
done = re.sub ('(\w\w\d\d\d\d/w*)', '\w\w\d\d\d\d/', test)
but still get the same string ..
example:
dirty = 'AA1001dirtydata'
dirty2 = 'AA1001222%^&*'
Desired output:
clean = 'AA1001'
You can use re.match() instead of re.sub():
re.match('\w\w\d\d\d\d', dirty).group(0) # returns 'AA1001'
Note: match will look for the regular expression at the beginning of the string you provide and only "match" the characters corresponding to the pattern. If you want to find the pattern partway through the string you can use re.search().

Make part of a regex match in python optional [duplicate]

This question already has answers here:
How to use regex with optional characters in python?
(5 answers)
Closed 5 years ago.
I'm trying to match a URL using re but am having trouble in regards to making part of the match optional.
import re
x = raw_input('Link: ')
reg = '(http|https)://(iski|www\.iskis|iskis)\.(in|com)/[A-Za-z0-9?&=/?_]+'
if re.match(reg, x):
print 'True'
Currently, the above code would match something like:
https://iskis.com/?loc=shop_view_item&item=220503032
I would like to alter the regular expression to make the following, [A-Za-z0-9?&=/?_]+ an option - As such, anything after the slash isn't required, so the following should match:
https://iskis.com
I'm sure there is a simple solution but I don't know how to go about solving this.
reg = '(http|https)://(iski|www\.iskis|iskis)\.(in|com)(/[A-Za-z0-9?&=/?_]+)?$'
Should do it. Surround the character class with () so it's a group, put a ? after it to make the text match 0-1 instances of that group, and put a $ at the end so that the regex will match to the end.
EDIT:
Come to think of it, you could use the optional match elsewhere in your regex.
reg = '(https?)://(www\.)?(iskis?)\.(in|com)(/[A-Za-z0-9?&=/?_]+)?$'

Python regex matching on strings I don't want [duplicate]

This question already has answers here:
Python- how do I use re to match a whole string [duplicate]
(4 answers)
Closed 5 years ago.
This is my first attempt at trying to use regex with Python or at all, and it is not working as expected. I want a regex to match any alphabetic character or underscore as the first character, then any number of alphanumeric characters or underscores after. The regex I am using is '^[a-z_,A-Z][a-z_A-Z0-9]*', which seems to produce what I want at pythex.org, but in my code it is matching strings that I do not want.
My code is as follows:
isMatch = re.match('^[a-z_A-Z][a-z_A-Z0-9]*', someString)
return True if isMatch else False
Two examples of strings that are matching that I don't want are: "qq-q" and "va[r". What am I doing wrong?
I think that you just forgot the $ at the end of your regex to specify the end of the string.
isMatch = re.match('^[a-z_A-Z][a-z_A-Z0-9]*$', someString)
Without that, it will match the beginning of the string and not the entire string, which explains why it worked on "qq-q" ("qq" is a match) and "va[r" ("va" is a match).

Using regex with special characters to find a match in python [duplicate]

This question already has answers here:
What special characters must be escaped in regular expressions?
(13 answers)
Closed 5 years ago.
I am extracting a string and need to check whether it follows a particular pattern
<![any word]>
if so I need to replace it with "". I am trying the following code
string1 = "<![if support]> hello"
string = re.sub(re.compile("<![.*?]>"),"",string1)
print(string)
But I am getting the output as
<![if support]> hello
I wish to get the output as hello. What am I doing wrong here?
[ and ] are treated as meta characters in regex. You'll need to escape them:
In [1]: re.sub(re.compile("<!\[.*?\]>"), "", "<![if support]> hello")
Out[1]: ' hello'
As a simplification (courtesy Wiktor Stribiżew), you can escape just the first left paren, shortening your regex to "<!\[.*?]>".

Categories

Resources