grep with python subprocess replacement

grep with python subprocess replacement - python

On a switch, i run ntpq -nc rv and get an output:
associd=0 status=0715 leap_none, sync_ntp, 1 event, clock_sync,
version="ntpd 4.2.6p3-RC10#1.2239-o Mon Mar 21 02:53:48 UTC 2016 (1)",
processor="x86_64", system="Linux/3.4.43.Ar-3052562.4155M", leap=00,
stratum=2, precision=-21, rootdelay=23.062, rootdisp=46.473,
refid=17.253.24.125,
reftime=dbf98d39.76cf93ad Mon, Dec 12 2016 20:55:21.464,
clock=dbf9943.026ea63c Mon, Dec 12 2016 21:28:03.009, peer=43497,
tc=10, mintc=3, offset=-0.114, frequency=27.326, sys_jitter=0.151,
clk_jitter=0.162, clk_wander=0.028
I am attempting to create a bash shell command using Python's subprocess module to extract only the value for "offset", or -0.114 in the example above
I noticed that I can use the subprocess replacement mod or sh for this such that:
import sh
print(sh.grep(sh.ntpq("-nc rv"), 'offset'))
and I get:
mintc=3, offset=-0.114, frequency=27.326, sys_jitter=0.151,
which is incorrect as I just want the value for 'offset', -0.114.
Not sure what I am doing wrong here, whether its my grep function or I am not using the sh module correctly.

grep reads line by line; it returns every line matching any part of the input. But I think grep is overkill. Once you get shell output, just search for the thing after output:
items = sh.ntpq("-nc rv").split(',')
for pair in items:
name, value = pair.split('=')
# strip because we weren't careful with whitespace
if name.strip() == 'offset':
print(value.strip())

Related

get filename , file path , get the line when the search string is found and extract only a part followed by search string of that line

may be I will directly explain with example : I am writing my code in python , for grep part also using bash commands.
I have few files , where I need to grep for some pattern , let's say "INFO"
All those files can be present two different dir structure : tyep1, type2
/home/user1/logs/MAIN_JOB/121/patching/a.log (type1)
/home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB:1/patching/b.log (type2)
/home/user1/logs/MAIN_JOB/SUB_JOB1/142/DB:2/patching/c.log (type2)
contents of file :
a.log :
[Thu Jan 20 21:05:00 UTC 2022]: database1: INFO: Subject1: This is subject 1.
b.log :
[Thu Jan 22 18:01:00 UTC 2022]: database1: INFO: Subject2: This is subject 2.
c.log :
[Thu Jan 22 18:01:00 UTC 2022]: database1: ERR: Subject3: This is subject 3.
So I need to know which are all the files does "INFO" string is present. if present I need to get following :
filename : a.log / b.log
filepath : /home/user1/logs/MAIN_JOB/121/patching or /home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB:1/patching
immediate string after search string : Subject1 / Subject2
So I tried using grep command with -r to know what are all the files I can find "INFO"
$ grep -r /home/user1/logs/MAIN_JOB
/home/user1/logs/MAIN_JOB/121/patching/a.log:[Thu Jan 20 21:05:00 UTC 2022]: database1: INFO: Subject1: This is subject 1.
/home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB:1/patching/b.log:[Thu Jan 22 18:01:00 UTC 2022]: database1: INFO: Subject2: This is subject 2.
$
So I will store above grep python variable and need to extract above things from this output.
I tried initially splitting grep o/p with "\n" , so I will get two separate rows
/home/user1/logs/MAIN_JOB/121/patching/a.log:[Thu Jan 20 21:05:00 UTC 2022]: database1: INFO: Subject1: This is subject 1.
/home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB:1/patching/b.log:[Thu Jan 22 18:01:00 UTC 2022]: database1: INFO: Subject2: This is subject 2.
and by taking each row , I can split with ":"
First row: I am able to split properly as ":" is at correct places.
file_with_path : /home/user1/logs/MAIN_JOB/121/patching/a.log(I can get file name separate with os.path.basename(file_with_path))
immediate str after search word : "Subject1"
Second row : This is where I need help , As in the path we have this "DB:1" which has ":" which will break my proper split. If I split I will get as below
file_with_path : /home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB (not correct)
actually should be /home/user1/logs/MAIN_JOB/SUB_JOB1/121/DB:1/patching/b.log
I am unable to apply split here as it doesn't work properly for both the cases.
Can you please help me with this? any command that can do this work in bash or python would be very helpful.
Thank you In Advance. Also let me know if some info is needed from me.
giving code below:
# main dir
patch_log_home = '/home/user1/logs/MAIN_JOB'
cmd = "grep -r 'INFO' {0}"
patch_bug_inc = self._core.exec_os_cmd(cmd.format(patch_log_home))
# if no occurrance reported continue
if len(patch_bug_inc) == 0:
return
if patch_bug_inc:
patch_bug_inc = patch_bug_inc.split("\n");
for inc in patch_bug_inc:
print("_________________________________________________")
inc = inc.split(":")
# to get subject part
patch_bug_str_index = [i for i, s in enumerate(inc) if 'INFO' in s][0]
inc_name = inc[patch_bug_str_index+1]
# file name
log_file_name = os.path.basename(inc[0])
# get file path
log_path = os.path.split(inc[0])
print("log_path :", log_path)
full_path = log_path[0]
print("FULL PATH: ", full_path)

Here's one way you could achieve this without calling out to grep which, as I said in my comment, may not be portable:
import os
import sys
for root, _, files in os.walk('/home/user1/logs/MAIN_JOB'):
for file in files:
if file.endswith('.log'):
path = os.path.join(root, file)
try:
with open(path) as infile:
for line in infile:
if 'INFO:' in line:
print(path)
break
except Exception:
print(f"Unable to process {path}", file=sys.stderr)

Strange behavior of TextIOWrapper.tell() with Python 3.6.9 in context of 0D/0A

ENVIRONMENT:
Intel/88-Ubuntu SMP Tue Feb 11 20:11:34 UTC 2020
Python 3.6.9
GIVEN:
A tiny program stored in test.py which shows input position and input character code for the consecutive reading of single characters.
fh = open("tmp.txt", "r")
while 1 + 1 == 2:
tmp = fh.read(1)
if not tmp: break
print(fh.tell(), "%x" % ord(tmp))
Fill a tmp.txt in bash to contain some data
echo -e "\x41\x42\x3b\x0d\x0a\x0d\x0a" > tmp.txt
OUTPUT:
Running python3 test.py delivers
1 41
2 42
18446744073709551620 3b
5 a
7 a
8 a
QUESTION:
Where does the excessively high value 18446744073709551620 for fh.tell() come from? Interestingly,
this does not happen in the following cases.
echo -e "\x41\x42\x3b\x0d\x0a" > tmp.txt # only one 0x0d/0x0a
echo -e "\x42\x3b\x0d\x0a\x0d\x0a" > tmp.txt # no 'A' at the beginning of the file

Python print both the matching groups in regex

I want to find two fixed patterns from a log file. Here is a line in a log file looks like
passed dangerb.xavier64.423181.k000.drmanhattan_resources.log Aug 23
04:19:37 84526 362
From this log, I want to extract drmanhattan and 362 which is a number just before the line ends.
Here is what I have tried so far.
import sys
import re
with open("Xavier.txt") as f:
for line in f:
match1 = re.search(r'((\w+_\w+)|(\d+$))',line)
if match1:
print match1.groups()
However, everytime I run this script, I always get drmanhattan as output and not drmanhattan 362.
Is it because of | sign?
How do I tell regex to catch this group and that group ?
I have already consulted this and this links however, it did not solve my problem.

line = 'Passed dangerb.xavier64.423181.r000.drmanhattan_resources.log Aug 23 04:19:37 84526 362'
match1 = re.search(r'(\w+_\w+).*?(\d+$)', line)
if match1:
print match1.groups()
# ('drmanhattan_resources', '362')
If you have a test.txt file that contains the following lines:
Passed dangerb.xavier64.423181.r000.drmanhattan_resources.log Aug 23
04:19:37 84526 362 Passed
dangerb.xavier64.423181.r000.drmanhattan_resources.log Aug 23 04:19:37
84526 363 Passed
dangerb.xavier64.423181.r000.drmanhattan_resources.log Aug 23 04:19:37
84526 361
you can do:
with open('test.txt', 'r') as fil:
for line in fil:
match1 = re.search(r'(\w+_\w+).*?(\d+)\s*$', line)
if match1:
print match1.groups()
# ('drmanhattan_resources', '362')
# ('drmanhattan_resources', '363')
# ('drmanhattan_resources', '361')

| mean OR so your regex catch (\w+_\w+) OR (\d+$)
Maybe you want something like this :
((\w+_\w+).*?(\d+$))

With re.search you only get the first match, if any, and with | you tell re to look for either this or that pattern. As suggested in other answers, you could replace the | with .* to match "anything in between" those two pattern. Alternatively, you could use re.findall to get all matches:
>>> line = "passed dangerb.xavier64.423181.k000.drmanhattan_resources.log Aug 23 04:19:37 84526 362"
>>> re.findall(r'\w+_\w+|\d+$', line)
['drmanhattan_resources', '362']

How do I get the "biggest" path?

I need to write some Python code to get the latest version of Android from a path. For example:
$ ls -l android_tools/sdk/platforms/
total 8
drwxrwxr-x 5 deqing deqing 4096 Mar 21 11:42 android-18
drwxrwxr-x 5 deqing deqing 4096 Mar 21 11:42 android-19
$
In this case I'd like to have android_tools/sdk/platforms/android-19.

The max function can take a key=myfunc parameter to specify a function that will return a comparison value. So you could do something like:
import os, re
dirname = 'android_tools/sdk/platforms'
files = os.listdir(my_dir)
def mykeyfunc(fname):
digits = re.search(r'\d+$', fname).group()
return int(digits)
print max(files, mykeyfunc)
Adjust that regular expression as needed for the actual files you're dealing with, and that should get you started.

Using Sed through subprocess.call in python to conduct in file replacements

I've got a column in one file that I'd like to replace with a column in another file. I'm trying to use sed to do this within python, but I'm not sure I'm doing it correctly. Maybe the code will make things more clear:
20 for line in infile1.readlines()[1:]:
21 element = re.split("\t", line)
22 IID.append(element[1])
23 FID.append(element[0])
24
25 os.chdir(binary_dir)
26
27 for files in os.walk(binary_dir):
28 for file in files:
29 for name in file:
30 if name.endswith(".fam"):
31 infile2 = open(name, 'r+')
32
33 for line in infile2.readlines():
34 parts = re.split(" ", line)
35 Part1.append(parts[0])
36 Part2.append(parts[1])
37
38 for i in range(len(Part2)):
39 if Part2[i] in IID:
40 regex = '"s/\.*' + Part2[i] + '/' + Part1[i] + ' ' + Part2[i] + '/"' + ' ' + phenotype
41 print regex
42 subprocess.call(["sed", "-i.orig", regex], shell=True)
This is what print regex does. The system appears to hang during the sed process, as it remains there for quite some time without doing anything.
"s/\.*131006/201335658-01 131006/" /Users/user1/Desktop/phenotypes2
Thanks for your help, and let me know if you need further clarification!

You don't need sed if you have Python and the re module. Here is an example of how to use re to replace a given pattern in a string.
>>> import re
>>> line = "abc def ghi"
>>> new_line = re.sub("abc", "123", line)
>>> new_line
'123 def ghi'
>>>
Of course this is only one way to do that in Python. I feel that for you str.replace() will do the job too.

The first issue is shell=True that is used together with a list argument. Either drop shell=True or use a string argument (the complete shell command) instead:
from subprocess import check_call
check_call(["sed", "-i.orig", regex])
otherwise the arguments ('-i.orig' and regex) are passed to /bin/sh instead of sed.
The second issue is that you haven't provided input files and therefore sed expects data from stdin that it is why it appears to hang.
If you want to make changes in files inplace, you could use fileinput module:
#!/usr/bin/env python
import fileinput
files = ['/Users/user1/Desktop/phenotypes2'] # if it is None it behaves like sed
for line in fileinput.input(files, backup='.orig', inplace=True):
print re.sub(r'\.*131006', '201335658-01 13100', line),
fileinput.input() redirects stdout to the current file i.e., print changes the file.
The comma sets sys.stdout.softspace to avoid duplicate newlines.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

grep with python subprocess replacement - python

Related

get filename , file path , get the line when the search string is found and extract only a part followed by search string of that line

Strange behavior of TextIOWrapper.tell() with Python 3.6.9 in context of 0D/0A

Python print both the matching groups in regex

How do I get the "biggest" path?

Using Sed through subprocess.call in python to conduct in file replacements

Categories

Resources