How to delimit string based on regular expression using Python - python

I have Python string with below mention data.
--- Data-['tag']-['cli'] command ---> show date:
Current time: 2020-03-12 11:36:37 PDT
--- Data-['tag']-['shell'] command ---> show version:
OS Kernel 64-bit
[builder_stable]
--- Data-['tag']-['cli'] command ---> show host:
Model: New
I want to delimit above string based on any line that starts with "--- Data" and ends with ":" irrespective of any contents that is inside between "--- Data" and ":" character.
My python code is shown below.
array = data.split("--- Data")
for word in array:
print(word)
I want delimited data to be returned in order and with the delimiter as well.
For e.g.
First split result should be like:
--- Data-['tag']-['cli'] command ---> show date:
Current time: 2020-03-12 11:36:37 PDT
Second split result be like:
--- Data-['tag']-['shell'] command ---> show version:
OS Kernel 64-bit
[builder_stable]
And so on. Any help?

You can use re.findall with a pattern that looks for the delimiter pattern and then lazily matches any characters until the next delimiter pattern or the end of the string:
import re
s = '''--- Data-['tag']-['cli'] command ---> show date:
Current time: 2020-03-12 11:36:37 PDT
--- Data-['tag']-['shell'] command ---> show version:
OS Kernel 64-bit
[builder_stable]
--- Data-['tag']-['cli'] command ---> show host:
Model: New'''
delimiter = r'--- Data[^\n]*?:'
print(re.findall(r'{0}.*?(?={0}|$)'.format(delimiter), s, re.S))

Another solution:
import re
s = '''--- Data-['tag']-['cli'] command ---> show date:
Current time: 2020-03-12 11:36:37 PDT
--- Data-['tag']-['shell'] command ---> show version:
OS Kernel 64-bit
[builder_stable]
--- Data-['tag']-['cli'] command ---> show host:
Model: New'''
split_start = "--- Data"
l = re.split(split_start, s)
curr_split = [split_start+cs for cs in l if cs != ""]

Related

How to re-sort a list/text with Python?

My bot reads another bot's message, then temporarily saves that message, makes a few changes with .replace and then the bot is supposed to change the format of the entries it finds.
I have tried quite a few things, but have not figured it out.
The text looks like this:
06 6 452872995438985XXX
09 22 160462182344032XXX
11 17 302885091519234XXX
And I want to get the following format:
6/06 452872995438985XXX
22/09 160462182344032XXX
17/11 302885091519234XXX
I have already tried the following things:
splitsprint = test.split(' ') # test is in this case the string we use e.g. the text shown above
for x in splitsprint:
month, day, misc = x
print(f"{day}/{month} {misc}")
---
newline = test.split('\n')
for line in newline:
month, day, misc = line.split(' ')
print(f"{day}/{month} {misc}")
But always I got a ValueError: too many values to unpack (expected 3) error or similar.
Does anyone here see my error?
It's because of the trailing white space in the input, I'm guessing. Use strip
s = '''
06 6 452872995438985XXX
09 22 160462182344032XXX
11 17 302885091519234XXX
'''
lines = s.strip().split('\n')
tokens = [l.split(' ') for l in lines]
final = [f'{day}/{month} {misc}' for month, day, misc in tokens]
for f in final:
print(f)

get filename , file path , get the line when the search string is found and extract only a part followed by search string of that line

may be I will directly explain with example : I am writing my code in python , for grep part also using bash commands.
I have few files , where I need to grep for some pattern , let's say "INFO"
All those files can be present two different dir structure : tyep1, type2
/home/user1/logs/MAIN_JOB/121/patching/a.log (type1)
/home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB:1/patching/b.log (type2)
/home/user1/logs/MAIN_JOB/SUB_JOB1/142/DB:2/patching/c.log (type2)
contents of file :
a.log :
[Thu Jan 20 21:05:00 UTC 2022]: database1: INFO: Subject1: This is subject 1.
b.log :
[Thu Jan 22 18:01:00 UTC 2022]: database1: INFO: Subject2: This is subject 2.
c.log :
[Thu Jan 22 18:01:00 UTC 2022]: database1: ERR: Subject3: This is subject 3.
So I need to know which are all the files does "INFO" string is present. if present I need to get following :
filename : a.log / b.log
filepath : /home/user1/logs/MAIN_JOB/121/patching or /home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB:1/patching
immediate string after search string : Subject1 / Subject2
So I tried using grep command with -r to know what are all the files I can find "INFO"
$ grep -r /home/user1/logs/MAIN_JOB
/home/user1/logs/MAIN_JOB/121/patching/a.log:[Thu Jan 20 21:05:00 UTC 2022]: database1: INFO: Subject1: This is subject 1.
/home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB:1/patching/b.log:[Thu Jan 22 18:01:00 UTC 2022]: database1: INFO: Subject2: This is subject 2.
$
So I will store above grep python variable and need to extract above things from this output.
I tried initially splitting grep o/p with "\n" , so I will get two separate rows
/home/user1/logs/MAIN_JOB/121/patching/a.log:[Thu Jan 20 21:05:00 UTC 2022]: database1: INFO: Subject1: This is subject 1.
/home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB:1/patching/b.log:[Thu Jan 22 18:01:00 UTC 2022]: database1: INFO: Subject2: This is subject 2.
and by taking each row , I can split with ":"
First row: I am able to split properly as ":" is at correct places.
file_with_path : /home/user1/logs/MAIN_JOB/121/patching/a.log(I can get file name separate with os.path.basename(file_with_path))
immediate str after search word : "Subject1"
Second row : This is where I need help , As in the path we have this "DB:1" which has ":" which will break my proper split. If I split I will get as below
file_with_path : /home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB (not correct)
actually should be /home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB:1/patching/b.log
I am unable to apply split here as it doesn't work properly for both the cases.
Can you please help me with this? any command that can do this work in bash or python would be very helpful.
Thank you In Advance. Also let me know if some info is needed from me.
giving code below:
# main dir
patch_log_home = '/home/user1/logs/MAIN_JOB'
cmd = "grep -r 'INFO' {0}"
patch_bug_inc = self._core.exec_os_cmd(cmd.format(patch_log_home))
# if no occurrance reported continue
if len(patch_bug_inc) == 0:
return
if patch_bug_inc:
patch_bug_inc = patch_bug_inc.split("\n");
for inc in patch_bug_inc:
print("_________________________________________________")
inc = inc.split(":")
# to get subject part
patch_bug_str_index = [i for i, s in enumerate(inc) if 'INFO' in s][0]
inc_name = inc[patch_bug_str_index+1]
# file name
log_file_name = os.path.basename(inc[0])
# get file path
log_path = os.path.split(inc[0])
print("log_path :", log_path)
full_path = log_path[0]
print("FULL PATH: ", full_path)
Here's one way you could achieve this without calling out to grep which, as I said in my comment, may not be portable:
import os
import sys
for root, _, files in os.walk('/home/user1/logs/MAIN_JOB'):
for file in files:
if file.endswith('.log'):
path = os.path.join(root, file)
try:
with open(path) as infile:
for line in infile:
if 'INFO:' in line:
print(path)
break
except Exception:
print(f"Unable to process {path}", file=sys.stderr)

Python how to print text with linebreak when it contains '\n'

I have text like n"sdfsdfdsf \n sdfdsfsdf \n sdfdsfdsf..."n. I need to print this text but every time there is a \n, I need to print a linebreak and print the body of text as broken up into separate lines.
how can I do this?
Edit1:
I am getting this text from a socket transaction and I just want to print it in a pretty manner.
b'\r\n\r\nSERIAL Debugger\r\n--------------\r\n>fyi * -\r\nThis is a test: 0 (-)\r\nthis is a test\r\nnew level(-)\r\
You have binary data, and Python doesn't know how you want to print it. So decode it, knowing the encoding the data is in (I used UTF-8):
Python 3.6.1 (default, Mar 23 2017, 16:49:06)
>>> text = b'\r\n\r\nSERIAL Debugger\r\n--------------\r\n>fyi * -\r\nThis is a test: 0 (-)\r\nthis is a test\r\nnew level(-)\r\n'
>>> print(text)
b'\r\n\r\nSERIAL Debugger\r\n--------------\r\n>fyi * -\r\nThis is a test: 0 (-)\r\nthis is a test\r\nnew level(-)\r\n'
>>> print(text.decode())
SERIAL Debugger
--------------
>fyi * -
This is a test: 0 (-)
this is a test
new level(-)
But converting the data for the sake of printing sounds wrong.

grep with python subprocess replacement

On a switch, i run ntpq -nc rv and get an output:
associd=0 status=0715 leap_none, sync_ntp, 1 event, clock_sync,
version="ntpd 4.2.6p3-RC10#1.2239-o Mon Mar 21 02:53:48 UTC 2016 (1)",
processor="x86_64", system="Linux/3.4.43.Ar-3052562.4155M", leap=00,
stratum=2, precision=-21, rootdelay=23.062, rootdisp=46.473,
refid=17.253.24.125,
reftime=dbf98d39.76cf93ad Mon, Dec 12 2016 20:55:21.464,
clock=dbf9943.026ea63c Mon, Dec 12 2016 21:28:03.009, peer=43497,
tc=10, mintc=3, offset=-0.114, frequency=27.326, sys_jitter=0.151,
clk_jitter=0.162, clk_wander=0.028
I am attempting to create a bash shell command using Python's subprocess module to extract only the value for "offset", or -0.114 in the example above
I noticed that I can use the subprocess replacement mod or sh for this such that:
import sh
print(sh.grep(sh.ntpq("-nc rv"), 'offset'))
and I get:
mintc=3, offset=-0.114, frequency=27.326, sys_jitter=0.151,
which is incorrect as I just want the value for 'offset', -0.114.
Not sure what I am doing wrong here, whether its my grep function or I am not using the sh module correctly.
grep reads line by line; it returns every line matching any part of the input. But I think grep is overkill. Once you get shell output, just search for the thing after output:
items = sh.ntpq("-nc rv").split(',')
for pair in items:
name, value = pair.split('=')
# strip because we weren't careful with whitespace
if name.strip() == 'offset':
print(value.strip())

Python print both the matching groups in regex

I want to find two fixed patterns from a log file. Here is a line in a log file looks like
passed dangerb.xavier64.423181.k000.drmanhattan_resources.log Aug 23
04:19:37 84526 362
From this log, I want to extract drmanhattan and 362 which is a number just before the line ends.
Here is what I have tried so far.
import sys
import re
with open("Xavier.txt") as f:
for line in f:
match1 = re.search(r'((\w+_\w+)|(\d+$))',line)
if match1:
print match1.groups()
However, everytime I run this script, I always get drmanhattan as output and not drmanhattan 362.
Is it because of | sign?
How do I tell regex to catch this group and that group ?
I have already consulted this and this links however, it did not solve my problem.
line = 'Passed dangerb.xavier64.423181.r000.drmanhattan_resources.log Aug 23 04:19:37 84526 362'
match1 = re.search(r'(\w+_\w+).*?(\d+$)', line)
if match1:
print match1.groups()
# ('drmanhattan_resources', '362')
If you have a test.txt file that contains the following lines:
Passed dangerb.xavier64.423181.r000.drmanhattan_resources.log Aug 23
04:19:37 84526 362 Passed
dangerb.xavier64.423181.r000.drmanhattan_resources.log Aug 23 04:19:37
84526 363 Passed
dangerb.xavier64.423181.r000.drmanhattan_resources.log Aug 23 04:19:37
84526 361
you can do:
with open('test.txt', 'r') as fil:
for line in fil:
match1 = re.search(r'(\w+_\w+).*?(\d+)\s*$', line)
if match1:
print match1.groups()
# ('drmanhattan_resources', '362')
# ('drmanhattan_resources', '363')
# ('drmanhattan_resources', '361')
| mean OR so your regex catch (\w+_\w+) OR (\d+$)
Maybe you want something like this :
((\w+_\w+).*?(\d+$))
With re.search you only get the first match, if any, and with | you tell re to look for either this or that pattern. As suggested in other answers, you could replace the | with .* to match "anything in between" those two pattern. Alternatively, you could use re.findall to get all matches:
>>> line = "passed dangerb.xavier64.423181.k000.drmanhattan_resources.log Aug 23 04:19:37 84526 362"
>>> re.findall(r'\w+_\w+|\d+$', line)
['drmanhattan_resources', '362']

Categories

Resources