Problem with regexp in ansible module shell

Problem with regexp in ansible module shell - python

In my ansible-playbook, task realize problem replace in file text with characters. I'm using ansible module shell with sed.
i want realize problem
txt.file: Some text ##VAR_NUMBER_ONE##
new txt.file: Some text {{VAR_NUMBER_ONE}}
- name: sed
shell: sed -i 's|##\([a-zA-Z_ ]*\)##|\{{\1}}|g' txt.file
I've got fatal error
fatal: [localhost]: FAILED! => {
"msg": "An unhandled exception occurred while templating 's|##\([a-zA-Z_ ]*\)##|\{{\1}}|g'. Error was a , original message: unexpected char u'\\' at 25"
}

Q: "replace in file text with characters"
txt.file before: Some text ##VAR_NUMBER_ONE##
txt.file after: Some text {{VAR_NUMBER_ONE}}
A: The task below does the job.
- replace:
path: "txt.file"
regexp: '^(.*)##(.*)##$'
replace: '{{ "\1" + "{{" + "\2" + "}}" }}'
The regexp string explained:
^ beginning of the string
(.*) any sequence stored in \1
## matches ##
(.*) any sequence stored in \2
## matches ##
$ end of the string
The replace string is created by concatenation of 4 strings because in YAML {{ and }} is used to expand variables.

Related

Custom regex pattern for matching email addresses

I have content that I am reading in that I need to collect the emails from within. However, I just want to pull the email that comes after From:
Here is an example:
Recip: fhavor#gmail.com
Subject: Report results (Gd)
Headers: Received: from daem.com (unknown [127.1.1.1])
Date: Sat, 13 Feb 2021 13:11:42 +0000 (GMT)
From: Tavon Lo <lt35#gmail.com>
As you can see there are multiple emails but I want to only collect the email that comes after the From: part of the content.Which would be "lt35#gmail.com". So far I have a good regex that collects ALL the emails within the content.
EMAIL = r"((?:^|\b)(?:[^\s]+?\#(?:.+?)\[\.\][a-zA-Z]+)(?:$|\b))"
I am new to regex patterns so any ideas or suggestions as to how to improve the above pattern to only collect the emails that come after from: would highly be appreciated!

You can use
(?m)^From:[^<>\n\r]*<([^<>#]+#[^<>]+)>
See the regex demo.
Details:
(?m) - re.M inline modifier option
^ - start of a line
From: - a literal string
[^<>\n\r]* - zero or more chars other than <, >, CR and LF
< - a < char
([^<>#]+#[^<>]+) - Group 1: one or more chars other than <, > and #, then a # char and then one or more chars other than < and >
> - a > char.
See a Python demo:
import re
rx = re.compile(r'^From:[^<>\n\r]*<([^<>#]+#[^<>]+)>', re.M) # Define the regex
with open(your_file_path, 'r') as f: # Open file for reading
print(rx.findall(f.read())) # Get all the emails after From:

Regex not working in python, but in online regex tools

I am trying to grab a hostname from configs and sometime there is a -p or -s added to the hostname in config, that is not really part of the hostname.
So I wrote this regex to fetch the hostname from the config file:
REGEX_HOSTNAME = re.compile('^hostname\s(?P<hostname>(\w|\W)+?)(-p|-P|-s|-S)?$\n',re.MULTILINE)
hostname = REGEX_HOSTNAME.search(config).group('hostname').lower().strip()
This is a sample part of the config that I using the regex on:
terminal width 120
hostname IGN-HSHST-HSH-01-P
domain-name sample.com
But in my result list of hostnames there is still the -P at the end.
ign-hshst-hsh-01-p
ign-hshst-hsh-02-p
ign-hshst-hsd-10
ign-hshst-hsh-01-S
ign-hshst-hsd-11
ign-hshst-hsh-02-s
In Regex 101 online tester it works and the -P is part of the last group. In my python (2.7) script it does not work.
Strange behavior is that when I use a slightly modified 2 pass regex it works:
REGEX_HOSTNAME = re.compile(r'^hostname\s*(?P<hostname>.*?)\n?$', re.MULTILINE)
REGEXP_CLUSTERNAME = re.compile('(?P<clustername>.*?)(?:-[ps])?$')
hostname = REGEX_HOSTNAME.search(config).group('hostname').lower().strip()
clustername = REGEXP_CLUSTERNAME.match(hostname).group('clustername')
Now Hostname has the full name and the clustername the one without the optional '-P' at the end.

You may use
import re
config=r"""terminal width 120
hostname IGN-HSHST-HSH-01-P
domain-name sample.com"""
REGEX_HOSTNAME = re.compile(r'^hostname\s*(.*?)(?:-[ps])?$', re.MULTILINE|re.I)
hostnames =[ h.lower().strip() for h in REGEX_HOSTNAME.findall(config) ]
print(hostnames) # => ['ign-hshst-hsh-01']
See the Python demo.
The ^hostname\s*(.*?)(?:-[ps])?$ regex matches:
^ - start of a line (due to re.MULTILINE, it matches a position after line breaks, too)
hostname - a word (case insensitive, due to re.I)
\s* - 0+ whitespaces
(.*?) - Group 1: zero or more chars other than line break chars, as few as possible
(?:-[ps])? - an optional occurrence of - and then p or s (case insensitive!)
$ - end of a line (due to re.MULTILINE).
See the regex demo online.

How to remove escape characters from string in python?

I have string that look like this text = u'\xd7\nRecord has been added successfully, record id: 92'. I tried to remove the escape character \xd7 and \n from my string so that I could use it for another purpose.
I tried str(text). It works but it could not remove character \xd7.
UnicodeEncodeError: 'ascii' codec can't encode character u'\xd7' in
position 0: ordinal not in range(128)
Any way I could do to remove any escape character as such above from string? Thanks

You can try the following using replace :
text=u'\xd7\nRecord has been added successfully, record id: 92'
bad_chars = ['\xd7', '\n', '\x99m', "\xf0"]
for i in bad_chars :
text = text.replace(i, '')
text

It seems you have a unicode string like in python 2.x we have unicode strings like
inp_str = u'\xd7\nRecord has been added successfully, record id: 92'
if you want to remove escape charecters which means almost special charecters, i hope this is one of the way for getting only ascii charecters without using any regex or any Hardcoded.
inp_str = u'\xd7\nRecord has been added successfully, record id: 92'
print inp_str.encode('ascii',errors='ignore').strip('\n')
Results : 'Record has been added successfully, record id: 92'
First i did encode because it is already a unicode, So while encoding to ascii if any charecters not in ascii level,It will Ignore.And you just strip '\n'
Hope this helps you :)

I believe Regex can help
import re
text = u'\xd7\nRecord has been added successfully, record id: 92'
res = re.sub('[^A-Za-z0-9]+', ' ', text).strip()
Result:
'Record has been added successfully record id 92'

You could do it by 'slicing' the string:
string = '\xd7\nRecord has been added successfully, record id: 92'
text = string[2:]

Try regex.
import re
def escape_ansi(line):
ansi_escape =re.compile(r'(\xd7|\n)')
return ansi_escape.sub('', line)
text = u'\xd7\nRecord has been added successfully, record id: 92'
print(escape_ansi(text))

You could use the built-in regex library.
import re
text = u'\xd7\nRecord has been added successfully, record id: 92'
result = re.sub('[^A-Za-z0-9]+', ' ', text)
print(result)
That spits out Record has been added successfully record id 92
This seems to pass your test case if you can live without the punctuation.

Format String of Dictionary

I've a string of dictionary as following:
CREDENTIALS = "{\"aaaUser\": {\"attributes\": {\"pwd\": \"cisco123\", \"name\": \"admin\"}}}"
Now I want to format this string to replace the pwd and name dynamically. What I've tried is:
CREDENTIALS = "{\"aaaUser\": {\"attributes\": {\"pwd\": \"{0}\", \"name\": \"{1}\"}}}".format('password', 'username')
But this gives following error:
traceback (most recent call last):
File ".\ll.py", line 4, in <module>
CREDENTIALS = "{\"aaaUser\": {\"attributes\": {\"pwd\": \"{0}\", \"name\": \"{1}\"}}}".format('password', 'username')
KeyError: '"aaaUser"
It is possible by just loading the string as dict using json.loads()and then setting the attributes as required, but this is not what I want. I want to format the string, so that I can use this string in other files/modules.
'
What I'm missing here? Any help would be appreciated.

Don't try to work with the JSON string directly; decode it, update the data structure, and re-encode it:
# Use single quotes instead of escaping all the double quotes
CREDENTIALS = '{"aaaUser": {"attributes": {"pwd": "cisco123", "name": "admin"}}}'
d = json.loads(CREDENTIALS)
attributes = d["aaaUser"]["attributes"]
attributes["name"] = username
attributes["pwd"] = password
CREDENTIALS = json.dumps(d)
With string formatting, you would need to change your string to look like
CREDENTIALS = '{{"aaaUser": {{"attributes": {{"pwd": "{0}", "name": "{1}"}}}}}}'
doubling all the literal braces so that the format method doesn't mistake them for placeholders.
However, formatting also means that the password needs to be pre-escaped if it contains anything that could be mistaken for JSON syntax, such as a double quote.
# This produces invalid JSON
NEW_CREDENTIALS = CREDENTIALS.format('new"password', 'bob')
# This produces valid JSON
NEW_CREDENTIALS = CREDENTIALS.format('new\\"password', 'bob')
It's far easier and safer to just decode and re-encode.

str.format deals with the text enclosed with braces {}. Here variable CREDENTIALS has the starting letter as braces { which follows the str.format rule to replace it's text and find the immediately closing braces since it don't find it and instead gets another opening braces '{' that's why it throws the error.
The string on which this method is called can contain literal text or replacement fields delimited by braces {}
Now to escape braces and replace only which indented can be done if enclosed twice like
'{{ Hey Escape }} {0}'.format(12) # O/P '{ Hey Escape } 12'
If you escape the parent and grandparent {} then it will work.
Example:
'{{Escape Me {n} }}'.format(n='Yes') # {Escape Me Yes}
So following the rule of the str.format, I'm escaping the parents text enclosed with braces by adding one extra brace to escape it.
"{{\"aaaUser\": {{\"attributes\": {{\"pwd\": \"{0}\", \"name\": \"{1}\"}}}}}}".format('password', 'username')
#O/P '{"aaaUser": {"attributes": {"pwd": "password", "name": "username"}}}'
Now Coming to the string formatting to make it work. There is other way of doing it. However this is not recommended in your case as you need to make sure the problem always has the format as you mentioned and never mess with other otherwise the result could change drastically.
So here the solution that I follow is using string replace to convert the format from {0} to %(0)s so that string formatting works without any issue and never cares about braces .
'Hello %(0)s' % {'0': 'World'} # Hello World
SO here I'm using re.sub to replace all occurrence
def myReplace(obj):
found = obj.group(0)
if found:
found = found.replace('{', '%(')
found = found.replace('}', ')s')
return found
CREDENTIALS = re.sub('\{\d{1}\}', myReplace, "{\"aaaUser\": {\"attributes\": {\"pwd\": \"{0}\", \"name\": \"{1}\"}}}"% {'0': 'password', '1': 'username'}
print CREDENTIALS # It should print desirable result

How to read next word or words till next line from file in python?

i'm trying to read words from a line after matching words :
To be exact -
I have a file with below texts:
-- Host: localhost
-- Generation Time: Nov 15, 2006 at 09:58 AM
-- Server version: 5.0.21
-- PHP Version: 5.1.2
I want to search that, if that file contains 'Server version:' sub string, if do then read next characters after 'Server version:' till next line, in this case '5.0.21'.
I tried the following code, but it gives the next line(-- PHP Version: 5.1.2) instead of next word (5.0.21).
with open('/root/Desktop/test.txt', 'r+') as f:
for line in f:
if 'Server version:' in line:
print f.next()

you are using f.next() which will return the next line.
Instead you need:
with open('/root/Desktop/test.txt', 'r+') as f:
for line in f:
found = line.find('Server version:')
if found != -1:
version = line[found+len('Server version:')+1:]
print version

You might want to replace that text like this
if 'Server version: ' in line:
print line.rstrip().replace('-- Server version: ', '')
We do line.rstrip() because the read line will have a new line at the end and we strip that.

Might be overkill, but you could also use the regular expressions module re:
match = re.search("Server version: (.+)", line)
if match: # found a line matching this pattern
print match.group(1) # whatever was matched for (.+ )
The advantage is that you need to type the key only once, but of course you can have the same effect by wrapping any of the other solutions into a function definition. Also, you could do some additional validation.

You can try using the split method on strings, using the string to remove (i.e. 'Server version: ') as separator:
if 'Server version: ' in line:
print line.split('Server version: ', 1)[1]

as you have
line='-- Server version: 5.0.21'
just:
line.split()[-1]
This gives you the last word rather than all the characters after :.
If you want all the characters after :
line.split(':', 1)[-1].strip()
Replace : with other string as needed.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Problem with regexp in ansible module shell - python

Related

Custom regex pattern for matching email addresses

Regex not working in python, but in online regex tools

How to remove escape characters from string in python?

Format String of Dictionary

How to read next word or words till next line from file in python?

Categories

Resources