Program to extract valid ips and valid ftp addresses - python

i'm still pretty new in programming. I'm trying to write a code that can find all valid IP addresses and all ftp addresses in a text file and write only those addresses to an other text file. The informations i'm trying to extract are in a pretty big and unsorted file. Here's a few lines of that file without going crazy:
1331903561.260000 CptK3340W66OKHK3Rd 192.168.202.96 43740 192.168.28.103 21 <unknown>
1331905499.220000 Cup8D83JUM166udWb 192.168.202.102 4379 192.168.21.101 21 ftp password#example.com DELE ftp://192.168.21.101/.cache/ - 550 /.cache/.ftpduBnga4: Operation not permitted - - - - -
1331905499.220000 Cup8D83JUM166udWb 192.168.202.102 4379 192.168.21.101 21 ftp password#example.com PASV - - - 227 Entering Passive Mode (192,168,21,101,189,111). T 192.168.202.102 192.168.21.101 48495
The code I have gives me the informations I need but I am wondering if I can make my output more clean. My output gives me every ip in 1 line seperated with a comma. I would like to have 1 IP per line to make looking at it easier.
import os
from os import chdir
import re
import socket
chdir("filepath")
x= open('filepath')
fichip = open('ip.txt', 'w', encoding='utf8')
fichftp = open('ftp.txt', 'w', encoding='utf8')
ipvalide = r"(?:2(?:5[0-5]|[0-4][0-9])|[0-1]?[0-9]{1,2})(?:\.(?:2(?:5[0-5]|[0-4][0-9])|[0-1]?[0-9]{1,2})){3}"
ftpvalide = r"ftp:\/\/(?:2(?:5[0-5]|[0-4][0-9])|[0-1]?[0-9]{1,2})(?:\.(?:2(?:5[0-5]|[0-4][0-9])|[0-1]?[0-9]{1,2})){3}" #(ftp:\/\/)
txt = x.readlines()
ipmatch = re.findall(ipvalide, str(txt))
ftpmatch = re.findall(ftpvalide, str(txt))
in_listip = set(ipmatch) #not to have any duplicates ip
in_listftp = set(ftpmatch)#not to have any duplicates ftp
fichip.write(str(in_listip))
fichftp.write(str(in_listftp))
fichip.close()
fichftp.close()
x.close()

Related

How to optimize checking IP adresses and getting a country?

Using Python 3.9.2 on Win10 I´m trying to get the country for IP-adresses in a log file (about 65000 lines). I have a .csv containing IP ranges (22000 lines) and respective countries, looking like:
[...]
2.16.9.0,2.16.9.255,DE,Germany
2.16.11.0,2.16.11.255,FR,France
2.16.12.0,2.16.13.255,CH,Switzerland
2.16.23.0,2.16.23.255,DE,Germany
2.16.30.0,2.16.33.255,DE,Germany
2.16.34.0,2.16.34.255,FR,France
[...]
I'm using python's ipaddress and iterate through the list of ranges and check if the current IP is within a range to get the country. Before, I check for two conditions to be true.
My goal is to count how many connections came from each of the three countrys. An example:
import ipaddress
import csv
with open (PATH) as logfile
logfile_lines = [line.split('\t') for line in logfile]
with open (PATH,r) as ipdaten
ipdaten_lines = [line.split(',') for line in ipdaten]
streams_france=0
for line in logfile_lines:
line2 = int(line[9])
stream = str(line[3])
iplog = line[1]
ipobj= ipaddress.ip_address(iplog)
[...]
if line2 > 60 and stream == "stream2":
for ips in ipdaten_lines:
if ipobj >= ipaddress.IPv4Address(ips[0]) and ipobj <= ipaddress.IPv4Address(ips[1]):
land = ips[3]
if land == "France\n":
streams_france+=1
break
[...]
The code works, but it is very slow. After far over 1 hour it is still running. For line2 > 60 and stream == "stream2" there are about 9000 cases in which both are True.

Python - PING a list of IP Address from database

Python - PING a list of IP Address from database
I have a list of ip addresses consisting of 200 locations, which in that location there are 4 ip addresses that I need to do ping testing. I intend to make a command which when I write the name or code of a particular location then it will directly ping to 4 ip address at that location. I have learned a bit to create a list that contains the ip address I entered through the command input () like this :
import os
import socket
ip = []
y = ['IP 1 : ','IP 2 : ', 'IP 3 : ', 'IP 4 : ']
while True:
for x in y:
server_ip = input(x)
ip.append(server_ip)
break
for x in ip:
print("\n")
rep = os.system('ping ' + x + " -c 3")
please give me a little advice about the command I want to make so that I no longer need to enter the ip address one by one. which still makes me confused, especially on how to make the existing items in the database into a variable x which we will insert into this command;
rep = os.system ('ping' + x + "-c 3")
EDIT: It now iterates over a CSV file rather than a hard-coded Python dictionary.
I believe you will be better off using python dictionaries rather than python lists. Assuming you are using Python 3.X, this is what you want to run:
import os
import csv
# Save the IPs you want to ping inside YOURFILE.csv
# Then iterate over the CSV rows using a For Loop
# Ensure your ip addresses are under a column titled ip_address
with open('YOURFILE.csv', newline='') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
rep = os.system("ping " + row['ip_address'] + " -c 3")

python nesting loops

I am trying perform a nested loop to combine data into a line by using matched MAC Addresses in both files.
I am able to pull the loop fine without the regex, however, when using the search regex below, it will only loop through the MAC_Lines once and print the correct results using the first entry in the MAC_Lines and stop. I'm unsure how to make the MAC_Lines go to the next line and repeat the process for all of the entries in the MAC_Lines.
try:
for mac in MAC_Lines:
MAC_address = re.search(r'([a-fA-F0-9]{2}[:|\-]?){6}', mac, re.I)
MAC_address_final = MAC_address.group()
for arp in ARP_Lines:
ARP_address = re.search(r'([a-fA-F0-9]{2}[:|\-]?){6}', arp, re.I)
ARP_address_final = ARP_address.group()
if MAC_address_final == ARP_address_final:
print mac + arp
continue
except Exception:
print 'completed.'
Results:
13,64,00:0c:29:36:9f:02,giga-swx 0/213,172.20.13.70, 00:0c:29:36:9f:02, vlan 64
completed.
I learned that the issue was how I opened the file. I should have used the 'open':'as' keywords when opening both files to allow the files to properly close and reopen for the next loop. Below is the code I was looking for.
Below is the code:
with open('MAC_List.txt', 'r') as read0:for items0 in read0:
MAC_address = re.search(r'([a-fA-F0-9]{2}[:|\-]?){6}', items0, re.I)
if MAC_address:
mac_addy = MAC_address.group().upper()
with open('ARP_List.txt', 'r') as read1:
for items1 in read1:
ARP_address = re.search(r'([a-fA-F0-9]{2}[:|\-]?){6}', items1, re.I)
if ARP_address:
arp_addy = ARP_address.group()
if mac_addy == arp_addy:
print(items0.strip() + ' ' + items1.strip())

Create multiple files based on user input in python

i am new to python and I've written a code which create configuration files for my application. I've created the code which works for 2 IP's but it may happen that user may input more IP's and for each increase in Ip the config file will be changed. There are authentication servers and they can be either 1 or 2 only.
I am passing input to python code by a file name "inputfile", below is how it look like:
EnterIp_list: ip_1 ip_2
authentication_server: as_1 as_2
Below is how the final configuration files are created:
configfile1: configfile2:
App_ip: ip_1 App_ip: ip_2
app_number: 1 app_number: 2
authen_server: as_1 authen_server: as_2
Below is how python3 code looks:
def createconfig(filename, app_ip, app_number, authen_server)
with open(filename, 'w') as inf:
inf.write("App_ip=" + app_ip + "\n")
inf.write("app_numbber=" + app_number)
inf.write("authen_server="+ authen_server)
with open("inputfile") as f:
for line in f:
if EnterIP_list in line:
a= line.split("=")
b = a[1].split()
if authentiation_server in line:
c= line.split("=")
d=c[1].split()
createconfig(configfile1, b[0], 1, d[0])
createconfig(configfile2, b[1], 2, d[1])
Users has freedom to input as many IP's as they wish for. Can someone please suggest what need to be done to make code more generic and robust so that it will work for any number of input ip's ??? also value for app_number increases with each new ip added.
There will always be two authentication server and they go in round robin e.g. the third app ip will be associated to "as_1" again.
You just need to iterate over your ip list in b, be aware that your current code only works for the last line of your "inputfile". As long as there is only one line, thats ok.
with open("inputfile") as f:
for line in f:
a= line.split("=")
b = a[1].split()
app_count = 1
for ip in b:
createconfig("configfile%s" % app_count , ip, app_count)
app_count += 1
Edit: Solution updated regarding your code change.
with open("inputfile") as f:
for line in f:
if EnterIP_list in line:
ips = line.split("=")[1].split()
if authentiation_server in line:
auth_servers = line.split("=")[1].split()
app_count = 1
for ip, auth_server in zip(ips, auth_servers):
createconfig("configfile%s" % app_count , ip, app_count, auth_server)
app_count += 1
A not so great way of doing it without modifying so much of your code would be to remove the last two createconfig() calls and instead do it in a loop once you have b as follows:
with open("inputfile") as f:
for line in f:
a= line.split("=")
b = a[1].split()
for app_number in b:
createconfig("configfile{}".format(app_number), b[app_number], app_number)

I am getting corrupt files after parsing through them

I have many output files from a molecular dynamics code (Lammps) and I have a script that parses through the files and calculates a bunch of parameters from the output files. Each file is between 4 -17MB and the script parses through 768 of these files in around 3 minutes. The format of the files are text and not binaries.
Here is the problem:
The script stops after processing 4 to 5 of these folders (4~5*768) complaining about a reshape error problem. But the real reason is that there are a bunch of "special characters" are inserted inside the text without any good reason. I am certain that this file has not been corrupt before the script parsed through it and the only reason these characters are inserted inside the text is actually running the script.
The script is written in python and I use the 'r' key in command below to make sure that the script does not have write access to the file.
fid = open(file_path, 'r')
I re-wrote the same script in Matlab and the same problem exists there too making me believe that it is a I/O issue related to my hardware rather than a coding issue.
I run this script on an ubuntu workstation with ext4 format hard drives and 64 GB of ram.
Interestingly when I run the same script on a cluster I am not able to replicate this problem. Even the parallel version of this script runs perfectly fine on the cluster without a problem but not on my local machine.
Here is the part of the script which does all the reading with the opened file the rest of the file does the calculations. In the end I write the results in a separate file which is a very small file (50 KB).
for file_name in list_files:
if 'dump' in file_name:
print file_name
gb_name = dir_list[j]
gb_num = int(file_name[file_name.find('.')+1:len(file_name)])
i += 1
file_path = read_path + '/' + file_name
fid = open(file_path, 'r', 1)
junk = fid.readline(); junk = fid.readline(); junk = fid.readline();
num_lines = int(fid.readline())
junk = fid.readline()
if len(junk) > 26:
x_lims = map(float, fid.readline().split())
y_lims = map(float, fid.readline().split())
z_lims = map(float, fid.readline().split())
a_tri = abs(x_lims[1] - (x_lims[0]-y_lims[2]))
c_tri = abs(z_lims[1] - z_lims[0])
gb_area = a_tri * c_tri
junk = fid.readline()
else:
x_lims = map(float, fid.readline().split())
y_lims = map(float, fid.readline().split())
z_lims = map(float, fid.readline().split())
gb_area = abs(x_lims[1] - x_lims[0]) * abs(z_lims[1] - z_lims[0])
junk = fid.readline()
tmp = fid.readlines()
fid.close()
if len(tmp) == num_lines:
coords = np.array(map(float, ''.join(tmp).split())).reshape(num_lines, 7)
coords[::, 0:2].astype(int)
else:
raise Exception('Number of lines is not consistent with number of data points.')
Here is an example of the corrupt file. Note the special characters:
99463 2 51.5597 211.814 41.7614k4.26088e-13 -3.35999
99881 2 52.1696 212.526 39.0575 4.91923e-/ef3 -3.35998
Please leave a comment here if you have had similar experiences.

Categories

Resources