Python regex removing port number from IP string

Python regex removing port number from IP string - python

I have text file which contains lines of text and IPs with port number and I want to remove port number and print just IP.
Example text file:
77.55.211.77:8080
NoIP
79.127.57.42:80
Desired output:
77.55.211.77
79.127.57.42
My code:
import re
with open('IPs.txt', 'r') as infile:
for ip in infile:
ip = ip.strip('\n')
IP_without_port_number = re.sub(r'((?::))(?:[0-9]+)$', "", ip)
re_for_IP = re.match(r'^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$',ip)
print(IP_without_port_number)
I am not understand why I see all lines as output when I am printing to console "IP_without_port_number"

All you need is the second match:
import re
with open('IPs.txt', 'r') as infile:
for ip in infile:
re_for_IP = re.match(r'(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})', ip)
if re_for_IP:
print(re_for_IP[0])
Output:
77.55.211.77
79.127.57.42
One-liner:
import re
ips = []
with open('IPs.txt', 'r') as infile:
ips = [ip[0] for ip in [re.match(r'(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})', ip) for ip in infile] if ip]
print(ips)

You don't need regex, use the split function on the : character when reading the line. Then you would be left with an array with two positions, the first containing only the IP address and the other containing the port.

Try this:
import re
regex = '''^(25[0-5]|2[0-4][0-9]|[0-1]?[0-9][0-9]?)\.(
25[0-5]|2[0-4][0-9]|[0-1]?[0-9][0-9]?)\.(
25[0-5]|2[0-4][0-9]|[0-1]?[0-9][0-9]?)\.(
25[0-5]|2[0-4][0-9]|[0-1]?[0-9][0-9]?)$'''
with open('IP.txt', 'r') as infile:
for ip in infile:
ip = ip.strip('\n')
IP_without_port_number = re.sub(r':.*$', "", ip)
re_for_IP = re.match(r'^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$',ip)
if(re.search(regex, IP_without_port_number)):
print(IP_without_port_number)
Output:
77.55.211.77
79.127.57.42

I came up wit this regex code, it works for me and its easy.
import re
text = input("Input text: ")
pattern = re.findall(r'\d+\.\d+\.\d+\.\d+', text)
print(pattern)

Related

Extract Url From a file

I am trying to extract the URL from a file which has the following format.
[CertSpotter] wwwqa.xyz.abc.com,1.1.1.1
[CertSpotter] origin.xyz.abc.com,1.1.1.1
[CertSpotter] wwwqa.xyz.abc.com,1.1.1.1
[CertSpotter] wwwmg4.xyz.abc.com,1.1.1.1
I have found the python script but in that, I am getting the URL and IP both but I need the only URL.
import re
file_path = input("Enter the File Path: ")
f = open(file_path, 'r')
raw_text= str(f.readlines())
f.close()
domain = r"\b((?:https?://)?(?:(?:www\.)?(?:[\da-z\.-]+)\.(?:[a-z]{2,6})|(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)|(?:(?:[0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|(?:[0-9a-fA-F]{1,4}:){1,7}:|(?:[0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|(?:[0-9a-fA-F]{1,4}:){1,5}(?::[0-9a-fA-F]{1,4}){1,2}|(?:[0-9a-fA-F]{1,4}:){1,4}(?::[0-9a-fA-F]{1,4}){1,3}|(?:[0-9a-fA-F]{1,4}:){1,3}(?::[0-9a-fA-F]{1,4}){1,4}|(?:[0-9a-fA-F]{1,4}:){1,2}(?::[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:(?:(?::[0-9a-fA-F]{1,4}){1,6})|:(?:(?::[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(?::[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(?:ffff(?::0{1,4}){0,1}:){0,1}(?:(?:25[0-5]|(?:2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(?:25[0-5]|(?:2[0-4]|1{0,1}[0-9]){0,1}[0-9])|(?:[0-9a-fA-F]{1,4}:){1,4}:(?:(?:25[0-5]|(?:2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(?:25[0-5]|(?:2[0-4]|1{0,1}[0-9]){0,1}[0-9])))(?::[0-9]{1,4}|[1-5][0-9]{4}|6[0-4][0-9]{3}|65[0-4][0-9]{2}|655[0-2][0-9]|6553[0-5])?(?:/[\w\.-]*)*/?)\b"
foundip = re.findall( domain, raw_text )
for ip in foundip:
print(ip)
after running the script I get the following output.
wwwqa.xyz.abc.com
1.1.1.1
origin.xyz.abc.com
1.1.1.1
wwwmg4.xyz.abc.com
1.1.1.1
Desired output.
wwwqa.xyz.abc.com
origin.xyz.abc.com
wwwmg4.xyz.abc.com
Can anyone help me to figure this out?
Thanks

Without Regex. Using just str methods.
Ex:
with open(filename) as infile:
for line in infile:
val = line.strip().split()[-1].split(",")[0]
print(val)
Output:
wwwqa.xyz.abc.com
origin.xyz.abc.com
wwwqa.xyz.abc.com
wwwmg4.xyz.abc.com

import re
with open('file.txt') as f:
result = re.findall(' +(.*),', f.read())
Output:
['wwwqa.xyz.abc.com', 'origin.xyz.abc.com', 'wwwqa.xyz.abc.com', 'wwwmg4.xyz.abc.com']

import re
f = open('test.txt', 'r')
content = f.read()
pattern = r"^\[.*\]\s*(.*),.*"
matches = re.findall(pattern, content, re.MULTILINE|re.IGNORECASE)
print(matches)
Output:
['wwwqa.xyz.abc.com', 'origin.xyz.abc.com', 'wwwqa.xyz.abc.com', 'wwwmg4.xyz.abc.com']

Exact match word from text file and print line containing word

Text file:
InheritedFrom: abc#aol.com
InheritedAltFrom: abc#aol.com
From: CN=deepak sethi/O=MHI
INetFrom: xwy.com
code I am using to extract line containing "From:" only
import re
with open('abc.txt', 'r') as file:
raw = file.readlines()
for line in raw :
if re.search(r'/b' + "From:" + r'/b', line):
print (line)
expecting :-
From: CN=deepak sethi/O=MHI
I dont understand what's going wrong?

Regexp's word boundary is presented with \b, not /b:
with open('abc.txt', 'r') as f:
for l in f.readlines():
if re.search(r'\bFrom\b', l):
print(l)
The output:
From: CN=deepak sethi/O=MHI

import re
with open('abc.txt', 'r') as file:
raw = file.readlines()
for line in raw :
if re.search(r'^From:', line):
print line
Will solve your problem

Obtaining unique list using set function with multiple elements

Two part question from someone quite new to python (and scripting in general):
I've worked out how to get a list of IP addresses from a file, then output a unique set of those IPs to a file as follows:
ip_list = []
with open('testfile', 'r') as file:
for line in file:
if line not in ip_list:
ip_list.append(line)
with open('testoutput', 'w') as file:
for line in ip_list:
file.write("%s\n" % line)
I then saw that I could do this an alternate way, and I'm wondering if this is sane?
ip_list = []
with open('testfile', 'r') as file:
for line in file:
ip_list.append(line)
with open('testoutput', 'w') as file:
for line in set(ip_list):
file.write("%s\n" % line)
Next, I now want to get a list of IP addresses coupled with PERMIT/DENY strings, given that the opened file is something like:
1.1.1.1 PERMIT
2.2.2.2 PERMIT
3.3.3.3 DENY
1.1.1.1 PERMIT
I still want to output only the unique IPs, so I can do this with the first method:
ip_list = []
with open('testfile', 'r') as file:
for line in file:
elements = line.split(' ')
if elements[0] not in ip_list
ip_list.append(elements)
with open('testoutput', 'w') as file:
for line in ip_list:
file.write("%s %s\n" % (line[0], line[1]))
But can I do something using the set command instead? Or can I do something better than the above snippet?
And for this example, assume that I don't want to compare entire lines for uniqueness (i.e. '1.1.1.1 PERMIT')

You can do something like:
ip_set = set()
ip_list = []
with open('testfile', 'r') as file:
for line in file:
elements = line.split()
ip = elements[0]
if ip not in ip_set
ip_set.add(ip)
ip_list.append(elements)
with open('testoutput', 'w') as file:
for line in ip_list:
file.write("%s %s\n" % (line[0], line[1]))
Note that I removed the argument to split(), so that it will handle all whitespace, not just spaces.

Extract IPs after specific delimiter

I'm trying to extract only the IPs from a file, organize them numerically and put the result in another file.
The data looks like this:
The Spammer (and all his/her info):
Username: user
User ID Number: 0
User Registration IP Address: 77.123.134.132
User IP Address for Selected Post: 177.43.168.35
User Email: email#address.com
Here is my code, which does not sort the IPs correctly (i.e. it lists 177.43.168.35 before 77.123.134.132):
import re
spammers = open('spammers.txt', "r")
ips = []
for text in spammers.readlines():
text = text.rstrip()
print text
regex = re.findall(r'(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})$',text)
if regex is not None and regex not in ips:
ips.append(regex)
for ip in ips:
OrganizedIPs = open("Organized IPs.txt", "a")
addy = "".join(ip)
if addy is not '':
print "IP: %s" % (addy)
OrganizedIPs.write(addy)
OrganizedIPs.write("\n")
spammers.close()
OrganizedIPs.close()
organize = open("Organized IPs.txt", "r")
ips = organize.readlines();
ips = list(set(ips))
print ips
for i in range(len(ips)):
ips[i] = ips[i].replace('\n', '')
print ips
ips.sort()
finish = open('organized IPs.txt', 'w')
finish.write('\n'.join(ips))
finish.close()
clean = open('spammers.txt', 'w')
clean.close()
I had tried using this IP sorter code but it needs a string were as the regex returns a list.

Or this (saving you string formatting cost):
def ipsort (ip):
return tuple (int (t) for t in ip.split ('.') )
ips = ['1.2.3.4', '100.2.3.4', '62.1.2.3', '62.1.22.4']
print (sorted (ips, key = ipsort) )

import re
LOG = "spammers.txt"
IPV4 = re.compile(r"(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})")
RESULT = "organized_ips.txt"
def get_ips(fname):
with open(fname) as inf:
return IPV4.findall(inf.read())
def numeric_ip(ip):
return [int(i) for i in ip.split(".")]
def write_to(fname, iterable, fmt):
with open(fname, "w") as outf:
for i in iterable:
outf.write(fmt.format(i))
def main():
ips = get_ips(LOG)
ips = list(set(ips)) # uniquify
ips.sort(key=numeric_ip)
write_to(RESULT, ips, "IP: {}\n")
if __name__=="__main__":
main()

Try this:
sorted_ips = sorted(ips, key=lambda x: '.'.join(["{:>03}".format(octet) for octet in x.split(".")])

file.write() in python

import os
import sys
import re
import string
f=open('./iprange','r')
s=f.readline()
f.close()
pattern='inet addr:'+s
pattern=pattern.split('x')[0]
pattern='('+pattern+'...'+')'
os.system('ifconfig -a >> interfaces')
f=open('./interfaces','r')
s=f.readline()
while (len(s))!=0:
i=re.search(pattern,s)
if i!=None:
sp=re.split(pattern,s)[1]
ip=re.split('inet addr:',sp)[1]
break
s=f.readline()
f.close()
os.system('rm ./interfaces')
f=open('./userip','w')
f.write(ip)
f.close()
NameError;name 'ip' is not defined
I split pattern by s and store the result in sp, then I find the IP address and store the result in ip. But the error says ip is not defined - what's going on?

while (len(s))!=0:
i=re.search(pattern,s)
if i!=None:
sp=re.split(pattern,s)[1]
ip=re.split('inet addr:',sp)[1]
break
s=f.readline()
The ip assignment is inside the if closure, which is apparently never being executed.

I'd do something more like this:
import os
import sys
import re
from itertools import takewhile
with open('./iprange','r') as f:
s = f.readline()
prefix = 'inet addr:'
pattern = prefix + s
pattern = pattern.split('x')[0]
pattern = '(%s...)' % pattern
os.system('ifconfig -a >> interfaces')
with open('interfaces', 'r') as f:
# Put all lines up to the first empty line into a list
# http://docs.python.org/2/library/itertools.html#itertools.takewhile
# `[line.rstrip() for line in f]` could be a generator instead:
# (line.rstrip() for line in f)
lines = list(takewhile(lambda x: x, [line.rstrip() for line in f]))
os.remove('interfaces')
for s in lines:
if re.search(pattern, s):
sp = re.split(pattern, s)[1]
ip = sp[len(prefix):]
with open('./userip', 'w') as f:
f.write(ip)
break
else:
print "No match found"
For one thing, you only write to the file userip if you find a match and you get a message if no match was found.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python regex removing port number from IP string - python

You don't need regex, use the split function on the : character when reading the line. Then you would be left with an array with two positions, the first containing only the IP address and the other containing the port.

I came up wit this regex code, it works for me and its easy. import re text = input("Input text: ") pattern = re.findall(r'\d+\.\d+\.\d+\.\d+', text) print(pattern)

Related

Extract Url From a file

Exact match word from text file and print line containing word

Obtaining unique list using set function with multiple elements

Extract IPs after specific delimiter

file.write() in python

Categories

Resources