Find all occurences of a pattern in a text file - python

I have a text file which looks like this
Nmap scan report for 192.168.2.1
Host is up (0.023s latency).
PORT STATE SERVICE
5001/tcp closed commplex-link
MAC Address: EC:1A:59:A2:84:80 (Belkin International)
Nmap scan report for 192.168.2.2
Host is up (0.053s latency).
PORT STATE SERVICE
5001/tcp closed commplex-link
MAC Address: 94:35:0A:F0:47:C2 (Samsung Electronics Co.)
Nmap scan report for 192.168.2.3
Host is up (0.18s latency).
PORT STATE SERVICE
5001/tcp filtered commplex-link
MAC Address: 00:13:CE:C0:E5:F3 (Intel Corporate)
Nmap scan report for 192.168.2.6
Host is up (0.062s latency).
PORT STATE SERVICE
5001/tcp closed commplex-link
MAC Address: 90:21:55:7D:53:4F (HTC)
I want to find all the IPs with port 5001 closed (not filtered). I tried to use the following logic to find all such IPs
fp = open('nmap_op.txt').read()
ip = re.compile('([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)(.*)closed',re.S)
matched = ip.findall(fp)
for item in matched:
print item
I was expecting the output to be
192.168.2.1
192.168.2.2
192.168.2.6
But I'm not getting the desired output. The output is just one item which looks like this:
('192.168.2.1', '\nHost is up (0.023s latency).\nPORT STATE SERVICE\n5001/tcp closed commplex-link\nMAC Address: EC:1A:59:A2:84:80 (Belkin International)\n\nNmap scan report for 192.168.2.2\nHost is up (0.053s latency).\nPORT STATE SERVICE\n5001/tcp closed commplex-link\nMAC Address: 94:35:0A:F0:47:C2 (Samsung Electronics Co.)\n\nNmap scan report for 192.168.2.3\nHost is up (0.18s latency).\nPORT STATE SERVICE\n5001/tcp filtered commplex-link\nMAC Address: 00:13:CE:C0:E5:F3 (Intel Corporate)\n\nNmap scan report for 192.168.2.6\nHost is up (0.062s latency).\nPORT STATE SERVICE\n5001/tcp )
Where am I going wrong?
Solution:
Below logic worked for me. If anyone has a better answer, please let me know.
fp = open('nmap_op.txt').read()
entries = re.split('\n\n',fp)
ip = re.compile('([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+).*?closed',re.S)
matched = []
for item in entries:
if ip.search(item):
matched.append(ip.search(item).group(1))

You don't need re.S here. The s modifier changes the meaning of the dot meta-character (.) from "match everything except newline characters" to "match everything including newline characters". You don't need that here.
The second capturing group isn't required either. You can just remove it to have only the IPs returned:
>>> matched = re.findall('([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+).*closed', fp)
>>> matched
['192.168.2.1', '192.168.2.2', '192.168.2.6']

Since the lines format seems to be always the same (the ip starts at offset 21 and ends at the next space), you can try this another way without regex:
for block in data.split("\n\n"):
if block.find('5001/tcp closed')>0:
print block[21:block.find('\n', 27)]

You can do:
>>> re.findall(r'^Nmap.*?(\d+\.\d+\.\d+\.\d+).*?5001\/tcp closed', fp, re.M)
# ['192.168.2.1', '192.168.2.2', '192.168.2.6']

Solution: Below logic worked for me.
fp = open('nmap_op.txt').read()
entries = re.split('\n\n',fp)
ip = re.compile('([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+).*?closed',re.S)
matched = []
for item in entries:
if ip.search(item):
matched.append(ip.search(item).group(1))

Related

Replace network address with host address in python

Im having a file with ip address in this format
192.168.1.9
192.168.1.10
192.168.1.8
that i read to a list like this
with open("file.txt") as f:
ipaddr = f.read().splitlines()
And then run some functions on.
However, i would also be able to put in network address in this document as in
192.168.0.0/25 and somehow get them translated in the list as
192.168.0.1
192.168.0.2
192.168.0.3
I dont even have a clue how to accomplish this? (running Python 2.6)
The netaddr is one of the best ways to do this:
import netaddr
with open('file.txt') as f:
for line in f:
try:
ip_network = netaddr.IPNetwork(line.strip())
except netaddr.AddrFormatError:
# Not an IP address or subnet!
continue
else:
for ip_addr in ip_network:
print ip_addr
For the example file of:
10.0.0.1
192.168.0.230
192.168.1.0/29
The output it gives is:
10.0.0.1
192.168.0.230
192.168.1.0
192.168.1.1
192.168.1.2
192.168.1.3
192.168.1.4
192.168.1.5
192.168.1.6
192.168.1.7
You need to parse your text file with a regular expression. Look up for the 're' module in Python. A quick implementation of this idea is:
import re
with open("ips.txt") as f:
ip_raw_list = f.read().splitlines()
#Only takes the string after the '/'
reg_ex_1 = r'(?<=/)[0-9]*'
#Only take the first three numbers "0.0.0" of the IP address
reg_ex_2 = r'.*\..*\..*\.'
ip_final_list = list()
for ip_raw in ip_raw_list:
appendix = re.findall(reg_ex_1, ip_raw)
#Ip with no backslash create on input
if not appendix:
ip_final_list.append(ip_raw)
#Ip with backslash create several inputs
else:
for i in range(int(appendix[0])):
ip_final_list.append(re.findall(reg_ex_2, ip_raw)[0] + str(i))
This code uses the power of regular expression to separate IPs of the form '0.0.0.0' from IPs of the form '0.0.0.0/00'. Then for IPs of the first form, you put the IP directly on the final list. For IPs of the second for, you run a for loop to put several inputs in the final list.

How to get output from a subprocess command as a list in python?

I am trying to retrieve few information regarding my perforce client using python script. I only want to fetch information related to server address, cleint root etc. as follows:
ping_string = subprocess.Popen(['p4', 'info','ls'], stdout=subprocess.PIPE ).communicate()[0]
print ping_string
So i get output as:
User name: hello
Client name: My_machine
Client host: XYZ
Current directory: c:\
Peer address: 1.2...
Client address: 1.102....
Server address: abcd
Server root: D:\scc\data
But as i want to retrieve server address, client address etc so, for that i want output to be in form of a list. So, please suggest how can i get the output as list type.
Using check_output simplifies getting the command output, so you could do:
out = subprocess.check_output(cmd)
lines = out.splitlines()
Note that each line will contain the trailing new line character.
Or if you want the data after the colons:
lines = [l.split(':', 1)[1].strip() for l in out.splitlines()
if ':' in l]
l.split(':', 1)[1] is taking the whatever is after the colon.
.strip() removes the surrounding whitespaces.
if ':' in l is a protection against lines that don't contain a colon.

Python Valid IPs from each line on a text file

I have a text file which include many ips in this format
Host : x.x.x.x , DNS : resolved dns , Location : USA
Host : x.x.x.x , DNS : resolved dns , Location : USA
Host : x.x.x.x , DNS : resolved dns , Location : USA
I want to take the VALID ips after the phrase "Host : " which is the first word in the line and move it to file ipclear.txt , discarding any ip from the same line just the Valid ip after the phrase Host .
f = open('inputfile.txt','r')
clearip = open('clearip.txt','w')
for line in f:
ip = line.split(',')[0].split(':')[1].strip()
clearip.write(ip+'\n')
f.close() # you can omit in most cases as the destructor will call if
clearip.close()
This will open two files, one is the file you're reading from, the other is the file you're writing to. Then it will go through the input file line by line. For each line we split it at the ,s, then the :s, assuming the file is in the same format that you posted, this will leave us with the IP address, which we then call strip() on to remove any trailing or leading whitespace. We then write this IP to the output file, and add the newline character. After this we close the text files.
Python's socket package has a function that converts a valid IP of dotted octets to an integer. It's called inet_aton, which is short for 'internet address to number'.
The try: [...] except: attempts to convert the string between 'Host :' and ' , DNS : ' to an IP integer, and it if fails, it just quietly moves on to the next line. It's easier to leverage socket than write your own regex to parse out all the possible valid IP's.
import re
import socket
ipPattern = re.compile('Host : (.*) , DNS : .*')
outfile = open('ipclear.txt', 'w')
for line in open('iplog.txt').readlines():
ipString = ipPattern.match(line).group(1)
try:
socket.inet_aton(ipString)
outfile.write(ipString + '\n')
except:
pass
outfile.close()

Extract ip addres and other stuff from a bind log string using python

Have a bind log string like this one
'09-Sep-2013 10:22:42.540 queries: info: client 10.12.12.66#39177: query: google.com IN AXFR -T (10.10.10.11)\n',
Use a regex to extract the date,ip address and query
re.compile("(.*?) queries.*client (.*?): query: (.*?) IN")
and get the following output
[('09-Sep-2013 10:22:42.540','10.12.12.66#39177','google.com')]
Almost great, but just can't get rid of the hash port tail off the ip address. Like this one #39177. Maybe someone can help me with the right pattern, that returns the ip address without the hash and port stuff.
Thank you.
Try this one (just added #\d+ after the IP address saving group):
"(.*?) queries.*client (.*?)#\d+: query: (.*?) IN"
DEMO:
>>> s = '09-Sep-2013 10:22:42.540 queries: info: client 10.12.12.66#39177: query: google.com IN AXFR -T (10.10.10.11)\n'
>>> re.search("(.*?) queries.*client (.*?)#\d+: query: (.*?) IN", s).groups()
('09-Sep-2013 10:22:42.540', '10.12.12.66', 'google.com')

How to find if the user have entered hostname or IP address?

The user will input either hostname or the IP address. If the user enters the IP address, I want to leave as it is but if the user enters the hostname I want to convert it into IP address using the following method:
def convert(hostname):
command = subprocess.Popen(['host', hostname],
stdout=subprocess.PIPE).communicate()[0]
progress1 = re.findall(r'\d+.', command)
progress1 = ''.join(progress1)
return progress1
How do I do it?
To get ip whether input is ip or hostname:
ip4 = socket.gethostbyname(ip4_or_hostname)
you can use a regex to match your input and test if it is a ip address or not
test = re.compile('\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b')
result = test.match(hostname)
if not result:
# no match -> must be an hostname #
convert(hostname)
that regex allows invalid ip addresses (like 999.999.999.999) so you may want to tweak it a bit, it's just a quick example
There are a number of questions on stackoverflow already about validating an IP address.
IP Address validation in python
Validating IP Addresses in python
I would like to ask why you are communicating with a subprocess when you can do this within the standard python library.
I would recommend resolving a host name into a IP address by using some of pythons built in functionality.
You can do this by importing and using the python sockets library
For example using the code found in link 1:
import socket
import re
regex = re.compile(r"^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$")
result = regex.match(address)
if not result:
address = socket.gethostbyname(address)
In my case, host name can only contain - as a separator. So you can uncomment and use it according to your requirement.
import re
regex = "^((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.){3}(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])$"
# string_check= re.compile('[#_!#$%^&*()<>?/\|}{~:.]')
string_check= re.compile('[-]')
ip_host_detail = {}
def is_valid_hostname_ip(IpHost):
# pass regular expression and ip string into search() method
if (re.search(regex, IpHost)):
print("Valid Ip address")
ip_host_detail['is_ip'] = 'True'
ip_host_detail['is_hostname'] = 'False'
return True
elif(string_check.search(IpHost)):
print("Contain hostname")
ip_host_detail['is_hostname'] = 'True'
ip_host_detail['is_ip'] = 'False'
return True
else:
print("Invalid Ip address or hostname:- " + str(IpHost))
ip_host_detail['is_hostname'] = 'False'
ip_host_detail['is_ip'] = 'False'
return False
IpHost = sys.argv[1]
# IpHost = 'RACDC1-VM123'
is_valid_hostname_ip(IpHost)
print(ip_host_detail)

Categories

Resources