dont want character by character printing - python

File content:
aditya#aditya-virtual-machine:~/urlcat$ cat http_resp
telnet 10.192.67.40 80
Trying 10.192.67.40...
Connected to 10.192.67.40.
Escape character is '^]'.
GET /same_domain HTTP/1.1
Host: www.google.com
HTTP/1.1 200 OK
Date: Tue, 09 Feb 2016 00:25:36 GMT
Server: Apache/2.4.7 (Ubuntu)
Last-Modified: Fri, 08 Jan 2016 20:10:52 GMT
ETag: "81-528d82f2644f1"
Accept-Ranges: bytes
Content-Length: 129
My code:
f1 = open('http_resp')
read=f1.read()
for line in read:
# line=line.rstrip()
line=line.strip()
if not '.com' in line:
continue
print line
When if not logic is removed, the output is something like this:
it prints only single character line by line.
t
e
l
n
e
t
1
0
.
1
9
2
.
6
7
.
4
0
8
0
T
r
y
i
n
g
I don't want character-by-character printing.

The problem is that read() returns the entire file as a string. Thus, your loop
for line in read:
iterates through the characters, one at a time. The simplest change is this:
f1 = open('http_resp')
for line in f1.readlines():

Related

Python filter log file and sum up the filter

Jun 23 10:10:03 145.89.109.1 : %SEC-6-IPACCESSLOGDP: list 120 denied icmp 145.89.182.65 -> 145.89.109.49 (0/0), 6 packets
Jun 23 10:18:40 145.89.109.1 : %SEC-6-IPACCESSLOGDP: list 120 denied icmp 95.9.149.232 -> 145.89.109.49 (0/0), 1 packet
Jun 23 10:57:05 145.89.109.1 : %SEC-6-IPACCESSLOGDP: list 120 denied icmp 169.244.237.130 -> 145.89.109.49 (0/0), 3 packets
this is a part of the log file. My question is, how can I filter the amount of packets and sum them up. So I need to sum 6, 1 and 3 up. How do I first filter the information of the amount of packets and than sum them up?
Input data:
data = StringIO(
"""
Jun 23 10:10:03 145.89.109.1 : %SEC-6-IPACCESSLOGDP: list 120 denied icmp 145.89.182.65 -> 145.89.109.49 (0/0), 6 packets
Jun 23 10:18:40 145.89.109.1 : %SEC-6-IPACCESSLOGDP: list 120 denied icmp 95.9.149.232 -> 145.89.109.49 (0/0), 1 packet
Jun 23 10:57:05 145.89.109.1 : %SEC-6-IPACCESSLOGDP: list 120 denied icmp 169.244.237.130 -> 145.89.109.49 (0/0), 3 packets
""")
UPDATE without regex:
Since you don't know regex, here is a solution without it. Notice, that this relys on the fact, that everytime "%SEC-6-IPACCESSLOGDP" is found in a line, at the end has to be some kind of number with packets.
with open('router1.log', 'r') as data:
out = []
for line in data:
if "%SEC-6-IPACCESSLOGDP" in line:
# split the line and the last chunk after a "," will assigned to the variable "pack"
*_, pack = line.split(",")
# split the number from the word packet
num, word = pack.strip().split(" ")
out.append(int(num))
result = sum(out)
print(result)
# [6, 1, 3]
old answer:
You could do it with a regex search and extract the number before packet.
(update converted code into function as in the comments asked)
import re
def icmp_packets(filename):
pattern = r"(\d+)\s?packets?"
with open(filename, 'r') as data:
total_sum = 0
for line in data:
if "%SEC-6-IPACCESSLOGDP" in line:
res = re.search(pattern, line)
if res:
total_sum += int(res.group(1))
return total_sum
result = icmp_packets('router1.log')
print(result)
# 10
You could also do a list, append each match to it and build the sum of that list at the end if you want to keep track of the seperate packets.
pattern = r"(\d+)\s?packets?"
out = []
with open('router1.log', 'r') as data:
for line in data: # search line by line for the pattern
m = re.search(pattern, line)
if m: # if there is a match in a line
out.append(int(m.group(1))) # convert the match to an int and add it
result = sum(out) # [6, 1, 3]
print(result)
# 10
check out that regex pattern at Regex101

Python: how to extract string from file - only once

I have the below output from router stored in a file
-#- --length-- -----date/time------ path
3 97103164 Feb 7 2016 01:36:16 +05:30 taas/NN41_R11_Golden_Image
4 1896 Sep 27 2019 14:22:08 +05:30 taas/NN41_R11_Golden_Config
5 1876 Nov 27 2017 20:07:50 +05:30 taas/nfast_default.cfg
I want to search for substring 'Golden_Image' from the file & get the complete path. So here, the required output would be this string:
taas/NN41_R11_Golden_Image
First attempt:
import re
with open("outlog.out") as f:
for line in f:
if "Golden_Image" in line:
print(line)
Output:
3 97103164 Feb 7 2016 01:36:16 +05:30 taas/NN41_R11_Golden_Image
Second attempt
import re
hand = open('outlog.out')
for line in hand:
line = line.rstrip()
x = re.findall('.*?Golden_Image.*?',line)
if len(x) > 0:
print x
Output:
['3 97103164 Feb 7 2016 01:36:16 +05:30 taas/NN41_R11_Golden_Image']
Neither of these give the required output. How can I fix this?
This is actually surprisingly fiddly to do if the path can contain spaces.
You need to use the maxsplit argument to split to identify the path field.
with open("outlog.out") as f:
for line in f:
field = line.split(None,7)
if "Golden_Image" in field:
print(field)
Do split on the line and check for the "Golden_Image" string exists in the splitted parts.
import re
with open("outlog.out") as f:
for line in f:
if not "Golden_Image" in i:
continue
print re.search(r'\S*Golden_Image\S*', line).group()
or
images = re.findall(r'\S*Golden_Image\S*', open("outlog.out").read())
Example:
>>> s = '''
-#- --length-- -----date/time------ path
3 97103164 Feb 7 2016 01:36:16 +05:30 taas/NN41_R11_Golden_Image
4 1896 Sep 27 2019 14:22:08 +05:30 taas/NN41_R11_Golden_Config
5 1876 Nov 27 2017 20:07:50 +05:30 taas/nfast_default.cfg'''.splitlines()
>>> for line in s:
for i in line.split():
if "Golden_Image" in i:
print i
taas/NN41_R11_Golden_Image
>>>
Reading full content at once and then doing the search will not be efficient. Instead, file can be read line by line and if line matches the criteria then path can be extracted without doing further split and using RegEx.
Use following RegEx to get path
\s+(?=\S*$).*
Link: https://regex101.com/r/zuH0Zv/1
Here if working code:
import re
data = "3 97103164 Feb 7 2016 01:36:16 +05:30 taas/NN41_R11_Golden_Image"
regex = r"\s+(?=\S*$).*"
test_str = "3 97103164 Feb 7 2016 01:36:16 +05:30 taas/NN41_R11_Golden_Image"
matches = re.search(regex, test_str)
print(matches.group().strip())
Follow you code, if you just want get the right output, you can more simple.
with open("outlog.out") as f:
for line in f:
if "Golden_Image" in line:
print(line.split(" ")[-1])
the output is :
taas/NN41_R11_Golden_Image
PS: if you want some more complex operations, you may need try the re module like the #Avinash Raj answered.

grep with python subprocess replacement

On a switch, i run ntpq -nc rv and get an output:
associd=0 status=0715 leap_none, sync_ntp, 1 event, clock_sync,
version="ntpd 4.2.6p3-RC10#1.2239-o Mon Mar 21 02:53:48 UTC 2016 (1)",
processor="x86_64", system="Linux/3.4.43.Ar-3052562.4155M", leap=00,
stratum=2, precision=-21, rootdelay=23.062, rootdisp=46.473,
refid=17.253.24.125,
reftime=dbf98d39.76cf93ad Mon, Dec 12 2016 20:55:21.464,
clock=dbf9943.026ea63c Mon, Dec 12 2016 21:28:03.009, peer=43497,
tc=10, mintc=3, offset=-0.114, frequency=27.326, sys_jitter=0.151,
clk_jitter=0.162, clk_wander=0.028
I am attempting to create a bash shell command using Python's subprocess module to extract only the value for "offset", or -0.114 in the example above
I noticed that I can use the subprocess replacement mod or sh for this such that:
import sh
print(sh.grep(sh.ntpq("-nc rv"), 'offset'))
and I get:
mintc=3, offset=-0.114, frequency=27.326, sys_jitter=0.151,
which is incorrect as I just want the value for 'offset', -0.114.
Not sure what I am doing wrong here, whether its my grep function or I am not using the sh module correctly.
grep reads line by line; it returns every line matching any part of the input. But I think grep is overkill. Once you get shell output, just search for the thing after output:
items = sh.ntpq("-nc rv").split(',')
for pair in items:
name, value = pair.split('=')
# strip because we weren't careful with whitespace
if name.strip() == 'offset':
print(value.strip())

Python print both the matching groups in regex

I want to find two fixed patterns from a log file. Here is a line in a log file looks like
passed dangerb.xavier64.423181.k000.drmanhattan_resources.log Aug 23
04:19:37 84526 362
From this log, I want to extract drmanhattan and 362 which is a number just before the line ends.
Here is what I have tried so far.
import sys
import re
with open("Xavier.txt") as f:
for line in f:
match1 = re.search(r'((\w+_\w+)|(\d+$))',line)
if match1:
print match1.groups()
However, everytime I run this script, I always get drmanhattan as output and not drmanhattan 362.
Is it because of | sign?
How do I tell regex to catch this group and that group ?
I have already consulted this and this links however, it did not solve my problem.
line = 'Passed dangerb.xavier64.423181.r000.drmanhattan_resources.log Aug 23 04:19:37 84526 362'
match1 = re.search(r'(\w+_\w+).*?(\d+$)', line)
if match1:
print match1.groups()
# ('drmanhattan_resources', '362')
If you have a test.txt file that contains the following lines:
Passed dangerb.xavier64.423181.r000.drmanhattan_resources.log Aug 23
04:19:37 84526 362 Passed
dangerb.xavier64.423181.r000.drmanhattan_resources.log Aug 23 04:19:37
84526 363 Passed
dangerb.xavier64.423181.r000.drmanhattan_resources.log Aug 23 04:19:37
84526 361
you can do:
with open('test.txt', 'r') as fil:
for line in fil:
match1 = re.search(r'(\w+_\w+).*?(\d+)\s*$', line)
if match1:
print match1.groups()
# ('drmanhattan_resources', '362')
# ('drmanhattan_resources', '363')
# ('drmanhattan_resources', '361')
| mean OR so your regex catch (\w+_\w+) OR (\d+$)
Maybe you want something like this :
((\w+_\w+).*?(\d+$))
With re.search you only get the first match, if any, and with | you tell re to look for either this or that pattern. As suggested in other answers, you could replace the | with .* to match "anything in between" those two pattern. Alternatively, you could use re.findall to get all matches:
>>> line = "passed dangerb.xavier64.423181.k000.drmanhattan_resources.log Aug 23 04:19:37 84526 362"
>>> re.findall(r'\w+_\w+|\d+$', line)
['drmanhattan_resources', '362']

Using Python's ftplib to get a directory listing, portably

You can use ftplib for full FTP support in Python. However the preferred way of getting a directory listing is:
# File: ftplib-example-1.py
import ftplib
ftp = ftplib.FTP("www.python.org")
ftp.login("anonymous", "ftplib-example-1")
data = []
ftp.dir(data.append)
ftp.quit()
for line in data:
print "-", line
Which yields:
$ python ftplib-example-1.py
- total 34
- drwxrwxr-x 11 root 4127 512 Sep 14 14:18 .
- drwxrwxr-x 11 root 4127 512 Sep 14 14:18 ..
- drwxrwxr-x 2 root 4127 512 Sep 13 15:18 RCS
- lrwxrwxrwx 1 root bin 11 Jun 29 14:34 README -> welcome.msg
- drwxr-xr-x 3 root wheel 512 May 19 1998 bin
- drwxr-sr-x 3 root 1400 512 Jun 9 1997 dev
- drwxrwxr-- 2 root 4127 512 Feb 8 1998 dup
- drwxr-xr-x 3 root wheel 512 May 19 1998 etc
...
I guess the idea is to parse the results to get the directory listing. However this listing is directly dependent on the FTP server's way of formatting the list. It would be very messy to write code for this having to anticipate all the different ways FTP servers might format this list.
Is there a portable way to get an array filled with the directory listing?
(The array should only have the folder names.)
Try using ftp.nlst(dir).
However, note that if the folder is empty, it might throw an error:
files = []
try:
files = ftp.nlst()
except ftplib.error_perm as resp:
if str(resp) == "550 No files found":
print "No files in this directory"
else:
raise
for f in files:
print f
The reliable/standardized way to parse FTP directory listing is by using MLSD command, which by now should be supported by all recent/decent FTP servers.
import ftplib
f = ftplib.FTP()
f.connect("localhost")
f.login()
ls = []
f.retrlines('MLSD', ls.append)
for entry in ls:
print entry
The code above will print:
modify=20110723201710;perm=el;size=4096;type=dir;unique=807g4e5a5; tests
modify=20111206092323;perm=el;size=4096;type=dir;unique=807g1008e0; .xchat2
modify=20111022125631;perm=el;size=4096;type=dir;unique=807g10001a; .gconfd
modify=20110808185618;perm=el;size=4096;type=dir;unique=807g160f9a; .skychart
...
Starting from python 3.3, ftplib will provide a specific method to do this:
http://bugs.python.org/issue11072
http://hg.python.org/cpython/file/67053b135ed9/Lib/ftplib.py#l535
I found my way here while trying to get filenames, last modified stamps, file sizes etc and wanted to add my code. It only took a few minutes to write a loop to parse the ftp.dir(dir_list.append) making use of python std lib stuff like strip() (to clean up the line of text) and split() to create an array.
ftp = FTP('sick.domain.bro')
ftp.login()
ftp.cwd('path/to/data')
dir_list = []
ftp.dir(dir_list.append)
# main thing is identifing which char marks start of good stuff
# '-rw-r--r-- 1 ppsrt ppsrt 545498 Jul 23 12:07 FILENAME.FOO
# ^ (that is line[29])
for line in dir_list:
print line[29:].strip().split(' ') # got yerself an array there bud!
# EX ['545498', 'Jul', '23', '12:07', 'FILENAME.FOO']
There's no standard for the layout of the LIST response. You'd have to write code to handle the most popular layouts. I'd start with Linux ls and Windows Server DIR formats. There's a lot of variety out there, though.
Fall back to the nlst method (returning the result of the NLST command) if you can't parse the longer list. For bonus points, cheat: perhaps the longest number in the line containing a known file name is its length.
I happen to be stuck with an FTP server (Rackspace Cloud Sites virtual server) that doesn't seem to support MLSD. Yet I need several fields of file information, such as size and timestamp, not just the filename, so I have to use the DIR command. On this server, the output of DIR looks very much like the OP's. In case it helps anyone, here's a little Python class that parses a line of such output to obtain the filename, size and timestamp.
import datetime
class FtpDir:
def parse_dir_line(self, line):
words = line.split()
self.filename = words[8]
self.size = int(words[4])
t = words[7].split(':')
ts = words[5] + '-' + words[6] + '-' + datetime.datetime.now().strftime('%Y') + ' ' + t[0] + ':' + t[1]
self.timestamp = datetime.datetime.strptime(ts, '%b-%d-%Y %H:%M')
Not very portable, I know, but easy to extend or modify to deal with various different FTP servers.
This is from Python docs
>>> from ftplib import FTP_TLS
>>> ftps = FTP_TLS('ftp.python.org')
>>> ftps.login() # login anonymously before securing control
channel
>>> ftps.prot_p() # switch to secure data connection
>>> ftps.retrlines('LIST') # list directory content securely
total 9
drwxr-xr-x 8 root wheel 1024 Jan 3 1994 .
drwxr-xr-x 8 root wheel 1024 Jan 3 1994 ..
drwxr-xr-x 2 root wheel 1024 Jan 3 1994 bin
drwxr-xr-x 2 root wheel 1024 Jan 3 1994 etc
d-wxrwxr-x 2 ftp wheel 1024 Sep 5 13:43 incoming
drwxr-xr-x 2 root wheel 1024 Nov 17 1993 lib
drwxr-xr-x 6 1094 wheel 1024 Sep 13 19:07 pub
drwxr-xr-x 3 root wheel 1024 Jan 3 1994 usr
-rw-r--r-- 1 root root 312 Aug 1 1994 welcome.msg
That helped me with my code.
When I tried feltering only a type of files and show them on screen by adding a condition that tests on each line.
Like this
elif command == 'ls':
print("directory of ", ftp.pwd())
data = []
ftp.dir(data.append)
for line in data:
x = line.split(".")
formats=["gz", "zip", "rar", "tar", "bz2", "xz"]
if x[-1] in formats:
print ("-", line)

Categories

Resources