In my attempt to get an Apache log parser, I try to filter IP adresses, with the following code:
for r in log:
host_line = "'",r['host'],"'"
for line in host_line:
if not line.startswith("178.255.20.20"):
print line.strip()
The result of this code is:
p4fdf6780.dip0.t-ipconnect.de
'
'
79.223.103.128
'
'
p4fdf6780.dip0.t-ipconnect.de
'
'
With line.replace("'", "") I remove the single quotes.
print line.replace("'", "")
The result:
p4fdf6780.dip0.t-ipconnect.de
79.223.103.128
p4fdf6780.dip0.t-ipconnect.de
This leaves me with the two line breaks.
How can a avoid those line breaks?
And is there a work around, or a better solution - more pythonic way to get what I want?
What do you want the program to do? What is the intended purpose of the for line in host_line loop?
If you're just looking to print hosts other than 178.255.20.20, would the following not work?
for r in log:
host = str(r['host']).strip() # not sure if the str() is required, depends on type of r['host']
if not host.startswith("178.255.20.20"):
print host
Simply change your code like below. You don't need to go for a replace function.
for r in log:
host_line = "'",r['host'],"'"
for line in host_line:
if not line.startswith("178.255.20.20"):
if not line == "'":
print line.strip()
One way is to use bash and a dedicated search tool like Ag or just a standard grep which will make it really fast because it is C:
grep -v "178.255.20.20" your_log.txt | grep -v -E "^'"
If you need to stick to python then try to better use strip so that it removes also the quote character and print the line only if not empty:
for r in log:
host_line = "'",r['host'],"'"
for line in host_line:
if not line.startswith("178.255.20.20"):
line = line.strip("'\n")
if len(line) > 0: print line
Related
looking forward to print Matching line in a file on Linux host and one line before from the matched line included into one line.
Below is just the content from the log file:
[2020/02/18 08:25:21.229198, 1] ../source3/lib/smbldap.c:1206(get_cached_ldap_connect)
Connection to LDAP server failed for the 1 try!
[2020/02/18 08:25:21.229221, 2] ../source3/passdb/pdb_ldap_util.c:287(smbldap_search_domain_info)
smbldap_search_domain_info: Problem during LDAPsearch: Timed out
What i have tried:
I have tried following with grep and sed which somehow works..
$ egrep -B 1 "failed|Timed" /var/log/samba/smbd.log.old |tr -d "\n" | sed "s/--/\n/g"
[2020/02/18 08:25:21.229198, 1] ../source3/lib/smbldap.c:1206(get_cached_ldap_connect) Connection to LDAP server failed for the 1 try!
[2020/02/18 08:25:21.229221, 2] ../source3/passdb/pdb_ldap_util.c:287(smbldap_search_domain_info) smbldap_search_domain_info: Problem during LDAPsearch: Timed out
This does not looks to be a cleaner solution, i'm looking forward some expert one lines, one liner is acceptable with awk, sed, grep or even python.
It can be done with awk alone:
awk ' /Timed|failed/ { print previous, $0; }; {previous = $0;}' /var/log/samba/smbd.log.old
This might work for you (GNU sed):
sed -n 'N;/\n.*\(failed\|Timed\)/s/\n//p;D' file
Turn off implicit printing. Append the next line. If the appended line contains failed or Timed, delete the newline and print the result. Delete the first line in the pattern space and repeat.
Could you please try following tac + awk solution:
tac Input_file | awk '/failed/{found=1;val=$0;next} found && NF{print $0,val;val=found=""}'
OR adding a non-one liner form of solution:
tac Input_file |
awk '
/failed/{
found=1
val=$0
next
}
found && NF{
print $0,val
val=found=""
}
'
This is the code. Somehow the output is not consistent. There is a new line for the first 2 lines in ip.txt while the third is working as expected.
code.py
import subprocess
with open('ip.txt') as f:
for IPAddr in f:
ping = subprocess.Popen(['ping','-c','1',IPAddr],stdout=f).wait()
if ping == 0:
print(f'{IPAddr} is up')
else:
print(f'{IPAddr} is down')
ip.txt
127.0.0.1
10.0.0.1
127.0.0.1
Output
user#linux:~$ python 01.py
127.0.0.1
is up
10.0.0.1
is down
127.0.0.1 is up
user#linux:~$
Desired Output
user#linux:~$ python code.py
127.0.0.1 is up
10.0.0.1 is down
127.0.0.1 is up
user#linux:~$
What's wrong with this code and how to fix it?
Update
The following solutions work! Many thanks
IPAddr = IPAddr.replace('\n','')
IPAddr = IPAddr.rstrip("\n")
IPAddr = IPAddr.strip()
You're including the newline characters from your file in your print.
Remove the \n like this:
import subprocess
with open('ip.txt') as f:
for IPAddr in f:
IPAddr = IPAddr.replace('\n', '') # Remove the newline
ping = subprocess.Popen(['ping','-c','1',IPAddr],stdout=f).wait()
if ping == 0:
print(f'{IPAddr} is up')
else:
print(f'{IPAddr} is down')
Or if you want to do it more broadly, you can remove all whitespace by using:
IPAddr = IPAddr.strip()
Or if you want to be super duper efficient, just strip the \n from the right:
IPAddr = IPAddr.rstrip("\n")
When iterating over a file line by line, each line ends with the newline marker ("\n"), so what you pass to print() is actually "127.0.0.1\n is up", not "127.0.0.1 is up".
The solution is quite simple: remove the newline:
for IPAddr in f:
IPAddr = IPAddr.rstrip("\n")
# etc
Note that since external inputs (files, user inputs etc) are totally unreliable, you would be better stripping all whitespaces from the line, check it's not empty (it's common to have empty lines in text files, specially at the end) and then skip that line (with a continue statement), and if not empty you probably want to validate the value is a valid IP address (and if not skip it too)...
I'm starting to work on problems for google's Code Jam. However I there seams to be a problem with my submission. Whenever I submit I am told "Your output should start with 'Case #1: '". My output a print statement starts with ""Case #%s: %s"%(y + 1, p)" which says Case #1: ext... when I run my code.
I looked into it and it said "Your output should start with 'Case #1: ': If you get this message, make sure you did not upload the source file in place of the output file, and that you're outputting case numbers properly. The first line of the output file should always start with "Case #1:", followed by a space or the end of the line."
So what is an output file and how would I incorporate it into my code?
Extra info: This is my code I'm saving it as GoogleCode1.py and submitting that file. I wrote it in the IDLE.
import string
firstimput = raw_input ("cases ")
for y in range(int(first)):
nextimput = raw_input ("imput ")
firstlist = string.split(nextimput)
firstlist.reverse()
p = ""
for x in range(len(firstlist)):
p = p +firstlist[x] + " "
p = p [:-1]
print "Case #%s: %s"%(y + 1, p)
Run the script in a shell, and redirect the output.
python GoogleCode1.py > GoogleCode1.out
I/O redirection aside, the other way to do this would be to read from and write to various files. Lookup file handling in python
input_file = open('/path/to/input_file')
output_file = open('/path/to/output_file', 'w')
for line in input_file:
answer = myFunction(line)
output_file.write("Case #x: "+str(answer))
input_file.close()
output_file.close()
Cheers
Make sure you're submitting a file containing what your code outputs -- don't submit the code itself during a practice round.
I'm still learning python, and one of the first projects I decided to dive into was something to sort through large nmap logs, pull out the OPEN ports, and dump them to a separate text file in IP:Port format. It works, but is there a better way to write this? Here's what I ended up with:
import sys
import string
"""
Written 6/24/2011 to pull out OPEN ports of an nmap proxy scan
Command:
nmap 218.9-255.0-255.0-255 -p 8080,3128,1080 -M 50 -oG PLog3.txt
"""
if len(sys.argv) != 3:
print 'Usage: python proxy.py <input file> <output file>'
print 'nmap 218.1-255.0-255.0-255 -p 8080,3128,1080 -M 50 -oG PLog.txt'
print 'Example: python ./proxy.py PLog.txt proxies.txt'
sys.exit(1)
r = open(sys.argv[1], 'r')
o = open(sys.argv[2], 'w')
pat80 = '80/open/'
pat8080 = '8080/open'
pat3128 = '3128/open'
for curline in r.xreadlines():
sift = string.split(curline, ' ')
ip = sift[1]
if curline.find(pat3128) >= 0:
curport = '3128'
elif curline.find(pat8080) >= 0:
curport = '8080'
elif curline.find(pat80) >= 0:
curport = '80'
else:
curport = '100'
pass
if (curport == '3128') or (curport == '8080') or (curport == '80'):
o.write(ip + ':' + curport + '\n')
print ip + ':' + curport
else:
pass
You can loop over a file like this. There is no need to use xreadlines(). with makes sure the file is closed when r goes out of scope
with open(sys.argv[1], 'r') as r:
for curline in r:
sift = string.split(curline, ' ')
ip = sift[1]
...
Looking in a tuple is neater than the chain of or
if curport in ('3128', '8080', '80'):
Since I seem to remember using python to parse nmap output files was one of my first python applications, I can make a couple of recommendations:
1) If you'd like to learn XML parsing and python, using the alternate XML format of nmap would be advised. This has the advantage that the XML output is less like to change in small but script breaking ways unlike the plain text output. (Basically, matching on string fields is great for a quick hack but is almost guaranteed to bite you down the road, as I found out when nmap was updated and they slightly changed the format of one of the columns I was parsing on... also think I got bit when we upgraded one of the Windows boxes and some of the text in the OS or services fields matched something I was matching on. If you're interested in going down this path, I can see if I have my nmap parser using xpath lying around
2) If you want to stick with text output and regexp, I'd suggest learning about grouping.
Specifically, rather than creating custom patterns for each port, you can define a group and check that out instead.
import re
r = re.compile("(/d+)/open") # match one or more digits followed by /open
mm = r.match(line) #mm will either be None or a match result object, if mm is not None, you can do mm.groups()[0] to get the port #.
import sys
import string
"""
Written 6/24/2011 to pull out OPEN ports of an nmap proxy scan
Command:
nmap 218.9-255.0-255.0-255 -p 8080,3128,1080 -M 50 -oG PLog3.txt
"""
def get_port(line):
port_mapping = {
'80/open/': '80', # Is the backslash special here?
# If they're really all supposed to have the same form,
# then we can simplify more.
'8080/open': '8080',
'3128/open': '3128'
}
for pattern, port in port_mapping:
if pattern in line: return port
return None # this would be implied otherwise,
# but "explicit is better than implicit"
# and this function intends to return a value.
def main(in_name, out_name):
with file(in_name, 'r') as in_file:
ips = (get_port(line.split(' ')[1]) for line in in_file)
with file(out_name, 'w') as out_file:
for ip in ips:
if ip == None: continue
output = '%s:%s' % (ip, curport)
out_file.write(output + '\n')
print output
def usage():
print 'Usage: python proxy.py <input file> <output file>'
print 'nmap 218.1-255.0-255.0-255 -p 8080,3128,1080 -M 50 -oG PLog.txt'
print 'Example: python ./proxy.py PLog.txt proxies.txt'
if __name__ == '__main__':
if len(sys.argv) != 3: usage()
else: main(*sys.argv[1:])
Check out argparse for handling the arguments.
Split into functions.
Use the main construct.
Look at the csv module. You can set the delimiter to a space.
Look again at the re expression. You can do it with one re expression where it is an 'or' of the different patterns.
I need to put different codes in one file to many files.
The file is apparantly shared by AWK's creators at their homepage.
The file is also here for easy use.
My attempt to the problem
I can get the lines where each code locate by
awk '{ print $1 }'
However, I do no know how
to get the exact line numbers so that I can use them
to collect codes between the specific lines so that the first word of each line is ignored
to put these separate codes into new files which are named by the first word at the line
I am sure that the problem can be solved by AWK and with Python too. Perhaps, we need to use them together.
[edit] after the first answer
I get the following error when I try to execute it with awk
$awk awkcode.txt
awk: syntax error at source line 1
context is
>>> awkcode <<< .txt
awk: bailing out at source line 1
Did you try to:
Create a file unbundle.awk with the following content:
$1 != prev { close(prev); prev = $1 }
{ print substr($0, index($0, " ") + 1) >$1 }
Remove the following lines form the file awkcode.txt:
# unbundle - unpack a bundle into separate files
$1 != prev { close(prev); prev = $1 }
{ print substr($0, index($0, " ") + 1) >$1 }
Run the following command:
awk -f unbundle.awk awkcode.txt
Are you trying to unpack a file in that format? It's a kind of shell archive. For more information, see http://en.wikipedia.org/wiki/Shar
If you execute that program with awk, awk will create all those files. You don't need to write or rewrite much. You can simply run that awk program, and it should still work.
First, view the file in "plain" format. http://dpaste.com/12282/plain/
Second, save the plain version of the file as 'awkcode.shar'
Third, I think you need to use the following command.
awk -f awkcode.shar
If you want to replace it with a Python program, it would be something like this.
import urllib2, sys
data= urllib2.urlopen( "http://dpaste.com/12282/plain/" )
currName, currFile = None, sys.stdout
for line in data:
fileName, _, text= line.strip().partition(' ')
if fileName == currName:
currFile.write(line+"\n")
else:
if currFile is not None:
currFile.close()
currName= fileName
currFile= open( currName, "w" )
if currFile is not None:
currFile.close()
Awk file awkcode.txt should not contain ANY BLANK line. If any blank line is encountered, the awk program fails. There is no error check to filter out blank line in the code. This I could find out after several days of struggle.