Searching for a given string IP-address in a logfile

Searching for a given string IP-address in a logfile - python

I am working on a project to search an IP address and see if it is in the logfile. I made some good progress but got stuck when dealing with searching certain items in the logfile format.
Here is what I have:
IP = raw_input('Enter IP Address:')
with open ('RoutingTable.txt', 'r') as searchIP:
for line in searchIP:
if IP in line:
ipArray = line.split()
print ipArray
if IP == ipArray[0]:
print "Success"
else:
print "Fail"
As you can see this is very bad code but I am new to Python and programming so I used this to make sure I can at least open file and compare first item to string I enter.
Her is an example file content (my actual file has like thousands of entries):
https://pastebin.com/ff40sij5
I would like a way to store all IPs (just IP and not other junk) in an array and then a loop to go through all items in array and compare with user defined IP.
For example, for this line all care care about is 10.20.70.0/23
D EX 10.20.70.0/23 [170/3072] via 10.10.10.2, 6d06h, Vlan111
[170/3072] via 10.10.10.2, 6d06h, Vlan111
[170/3072] via 10.10.10.2, 6d06h, Vlan111
[170/3072] via 10.10.10.2, 6d06h, Vlan111
Please help.
Thanks
Damon
Edit: I am digging setting flags but that only works in some cases as you can see all lines do not start with D but there are some that start with O (for OSFP routes) and C (directly connected).
Here is how is what I am doing:
f = open("RoutingTable.txt")
Pr = False
for line in f.readlines():
if Pr: print line
if "EX" in line:
Pr = True
print line
if "[" in line:
Pr = False
f.close()
That gives me a bit cleaner result but still whole line instead of just IP.

Do you necessarily need to store all the IPs by themselves? You can do the following, where you grab all the data into a list and check if your input string resides inside the list:
your_file = 'RoutingTable.txt'
IP = input('Enter IP Address:')
with open(your_file, 'r') as f:
data = f.readlines()
for d in data:
if IP in d:
print 'success'
break
else:
print 'fail'
The else statement only triggers when you don't break, i.e. there is no success case.
If you cannot read everything into memory, you can iterate over each line like you did in your post, but thousands of lines should be easily doable.
Edit
import re
your_file = 'RoutingTable.txt'
ip_addresses = []
IP = input('Enter IP Address:')
with open(your_file, 'r') as f:
data = f.readlines()
for d in data:
res = re.search('(\d+\.\d+\.\d+\.\d+\/\d+)', d)
if res:
ip_addresses.append(res.group(1))
for ip_addy in ip_addresses:
if IP == ip_addy:
print 'success'
break
else:
print 'fail'
print ip_addresses

First up, I'd like to mention that your initial way of handling the file opening and closing (where you used a context manager, the "with open(..)" part) is better. It's cleaner and stops you from forgetting to close it again.
Second, I would personally approach this with a regular expression. If you know you'll be getting the same pattern of it beginning with D EX or O, etc. and then an address and then the bracketed section, a regular expression shouldn't be much work, and they're definitely worth understanding.
This is a good resource to learn generally about them: http://regular-expressions.mobi/index.html?wlr=1
Different languages have different ways to interpret the patterns. Here's a link for python specifics for it (remember to import re): https://docs.python.org/3/howto/regex.html
There is also a website called regexr (I don't have enough reputation for another link) that you can use to mess around with expressions on to get to grips with it.
To summarise, I'd personally keep the initial context manager for opening the file, then use the readlines method from your edit, and inside that, use a regular expression to get out the address from the line, and stick the address you get back into a list.

Related

How to search txt file external IP [duplicate]

I have a file with several IP addresses. There are about 900 IPs on 4 lines of txt. I would like the output to be 1 IP per line. How can I accomplish this? Based on other code, I have come up wiht this, but it fails becasue multiple IPs are on single lines:
import sys
import re
try:
if sys.argv[1:]:
print "File: %s" % (sys.argv[1])
logfile = sys.argv[1]
else:
logfile = raw_input("Please enter a log file to parse, e.g /var/log/secure: ")
try:
file = open(logfile, "r")
ips = []
for text in file.readlines():
text = text.rstrip()
regex = re.findall(r'(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})$',text)
if regex is not None and regex not in ips:
ips.append(regex)
for ip in ips:
outfile = open("/tmp/list.txt", "a")
addy = "".join(ip)
if addy is not '':
print "IP: %s" % (addy)
outfile.write(addy)
outfile.write("\n")
finally:
file.close()
outfile.close()
except IOError, (errno, strerror):
print "I/O Error(%s) : %s" % (errno, strerror)

The $ anchor in your expression is preventing you from finding anything but the last entry. Remove that, then use the list returned by .findall():
found = re.findall(r'(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})',text)
ips.extend(found)
re.findall() will always return a list, which could be empty.
if you only want unique addresses, use a set instead of a list.
If you need to validate IP addresses (including ignoring private-use networks and local addresses), consider using the ipaddress.IPV4Address() class.

The findall function returns an array of matches, you aren't iterating through each match.
regex = re.findall(r'(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})$',text)
if regex is not None:
for match in regex:
if match not in ips:
ips.append(match)

Extracting IP Addresses From File
I answered a similar question in this discussion. In short, it's a solution based on one of my ongoing projects for extracting Network and Host Based Indicators from different types of input data (e.g. string, file, blog posting, etc.): https://github.com/JohnnyWachter/intel
I would import the IPAddresses and Data classes, then use them to accomplish your task in the following manner:
#!/usr/bin/env/python
"""Extract IPv4 Addresses From Input File."""
from Data import CleanData # Format and Clean the Input Data.
from IPAddresses import ExtractIPs # Extract IPs From Input Data.
def get_ip_addresses(input_file_path):
""""
Read contents of input file and extract IPv4 Addresses.
:param iput_file_path: fully qualified path to input file. Expecting str
:returns: dictionary of IPv4 and IPv4-like Address lists
:rtype: dict
"""
input_data = [] # Empty list to house formatted input data.
input_data.extend(CleanData(input_file_path).to_list())
results = ExtractIPs(input_data).get_ipv4_results()
return results
Now that you have a dictionary of lists, you can easily access the data you want and output it in whatever way you desire. The below example makes use of the above function; printing the results to console, and writing them to a specified output file:
# Extract the desired data using the aforementioned function.
ipv4_list = get_ip_addresses('/path/to/input/file')
# Open your output file in 'append' mode.
with open('/path/to/output/file', 'a') as outfile:
# Ensure that the list of valid IPv4 Addresses is not empty.
if ipv4_list['valid_ips']:
for ip_address in ipv4_list['valid_ips']:
# Print to console
print(ip_address)
# Write to output file.
outfile.write(ip_address)

Without re.MULTILINE flag $ matches only at the end of string.
To make debugging easier split the code into several parts that you could test independently.
def extract_ips(data):
return re.findall(r"\d{1,3}(?:\.\d{1,3}){3}", data)
the regex filters out some valid ips e.g., 2130706433, "1::1".
And in reverse, the regex matches invalid strings e.g., 999.999.999.999. You could validate an ip string using socket.inet_aton() or more general socket.inet_pton(). You could even split the input into pieces without searching for ip and use these functions to keep valid ips.
If input file is small and you don't need to preserve original order of ips:
with open(filename) as infile, open(outfilename, "w") as outfile:
outfile.write("\n".join(set(extract_ips(infile.read()))))
Otherwise:
with open(filename) as infile, open(outfilename, "w") as outfile:
seen = set()
for line in infile:
for ip in extract_ips(line):
if ip not in seen:
seen.add(ip)
print >>outfile, ip

Python conditional statement based on text file string

Noob question here. I'm scheduling a cron job for a Python script for every 2 hours, but I want the script to stop running after 48 hours, which is not a feature of cron. To work around this, I'm recording the number of executions at the end of the script in a text file using a tally mark x and opening the text file at the beginning of the script to only run if the count is less than n.
However, my script seems to always run regardless of the conditions. Here's an example of what I've tried:
with open("curl-output.txt", "a+") as myfile:
data = myfile.read()
finalrun = "xxxxx"
if data != finalrun:
[CURL CODE]
with open("curl-output.txt", "a") as text_file:
text_file.write("x")
text_file.close()
I think I'm missing something simple here. Please advise if there is a better way of achieving this. Thanks in advance.

The problem with your original code is that you're opening the file in a+ mode, which seems to set the seek position to the end of the file (try print(data) right after you read the file). If you use r instead, it works. (I'm not sure that's how it's supposed to be. This answer states it should write at the end, but read from the beginning. The documentation isn't terribly clear).
Some suggestions: Instead of comparing against the "xxxxx" string, you could just check the length of the data (if len(data) < 5). Or alternatively, as was suggested, use pickle to store a number, which might look like this:
import pickle
try:
with open("curl-output.txt", "rb") as myfile:
num = pickle.load(myfile)
except FileNotFoundError:
num = 0
if num < 5:
do_curl_stuff()
num += 1
with open("curl-output.txt", "wb") as myfile:
pickle.dump(num, myfile)
Two more things concerning your original code: You're making the first with block bigger than it needs to be. Once you've read the string into data, you don't need the file object anymore, so you can remove one level of indentation from everything except data = myfile.read().
Also, you don't need to close text_file manually. with will do that for you (that's the point).

Sounds more for a job scheduling with at command?
See http://www.ibm.com/developerworks/library/l-job-scheduling/ for different job scheduling mechanisms.

The first bug that is immediately obvious to me is that you are appending to the file even if data == finalrun. So when data == finalrun, you don't run curl but you do append another 'x' to the file. On the next run, data will be not equal to finalrun again so it will continue to execute the curl code.
The solution is of course to nest the code that appends to the file under the if statement.

Well there probably is an end of line jump \n character which makes that your file will contain something like xx\n and not simply xx. Probably this is why your condition does not work :)
EDIT
What happens if through the python command line you type
open('filename.txt', 'r').read() # where filename is the name of your file
you will be able to see whether there is an \n or not

Try using this condition along with if clause instead.
if data.count('x')==24
data string may contain extraneous data line new line characters. Check repr(data) to see if it actually a 24 x's.

Python: Check one element in csv, use another to remove from second file

I am trying to get a script working, where it will check the existance of an IP in a lookup csv file, and then if it exists take the third element and remove that third element from another (second) file. Here is a extract of what I have:
for line in fileinput.input(hostsURLFileLoc,inplace =1):
elements = open(hostsLookFileLoc, 'r').read().split(".").split("\n")
first = elements[0].strip()
third = elements[2].strip()
if first == hostIP:
if line != third:
print line.strip()
This obviously doesn't work, I have tried playing with a few options, but here is my latest (crazy) attempt.
I think the problem is that there are two input files open at once.
Any thoughts welcome,
Cheers

All right, even though I haven't got any response to my comment on the question, here's my shot at a general answer. If I've got something wrong, just say so and I'll edit to try to address the errors.
First, here are my assumptions. You have two files, who's names are stored in the HostsLookFileLoc and HostsURLFileLoc variables.
The file at HostsLookFileLoc is a CSV file, with an IP address in the third column of each row. Something like this:
HostsLookFile.csv:
blah,blah,192.168.1.1,whatever,stuff
spam,spam,82.94.164.162,eggs,spam
me,myself,127.0.0.1,and,I
...
The file at HostsURLFileLoc is a flat text file with one IP address per line, like so:
HostsURLFile.txt:
10.1.1.2
10.1.1.3
10.1.2.253
127.0.0.1
8.8.8.8
192.168.1.22
82.94.164.162
64.34.119.12
...
Your goal is to read and then rewrite the HostsURLFile.txt file, excluding all of the IP addresses that are found in the third column of a row in the CSV file. In the example lists above, localhost (127.0.0.1) and python.org (82.94.164.162) would be excluded, but the rest of the IPs in the list would remain.
Here's how I'd do it, in three steps:
Read in the CSV file and parse it using the csv module to find the IP addresses. Stick them into a set.
Open the flat file and read the IP addresses into a list, closing the file afterwards.
Reopen the flat file and overwrite it with the loaded list of addresses, skipping any that are contained in the set from the first step.
Code:
import csv
def cleanURLFile(HostsLookFileLoc, HostsURLFileLoc):
"""
Remove IP addresses from file at HostsURLFileLoc if they are in
the third column of the file at HostsLookFileLoc.
"""
with open(HostsLookFileLoc, "r") as hostsLookFile:
reader = csv.reader(hostsLookFile)
ipsToExclude = set(line[2].strip() for line in reader)
with open(HostsURLFileLoc, "r") as hostsURLFile:
ipList = [line.strip() for line in hostsURLFile]
with open(HostsURLFileLoc, "w") as hostsURLFile: # truncates the file!
hostsURLFile.write("\n".join(ip for ip in ipList
if ip not in ipsToExclude))
This code is deliberately simple. There are a few things that could be improved, if they are important to your use case:
If something crashes the program during the rewriting step, HostsURLFile.txt may be clobbered. A safer way of rewriting (at least, on Unix-style systems) is to write to a temp file, then after the writing has finished (and the file has been closed), rename the temp file over the top of the old file. That way if the program crashes, you'll still have the original version or a completely written replacement, but never anything in between.
If the checking you needed to do was more complicated than set membership, I'd add an extra step between 2 and 3 to do the actual processing, then write the results out without further manipulation (other than adding newlines).
Speaking of newlines, if you have a trailing newline, it will be passed through as an empty string in the list of IP addresses, which should be OK for this scenario (it won't be in the set of IPs to exclude, unless your CSV file has a messed up line), but might cause trouble if you were doing something more complicated.

In test file test.csv (note there is an IP address in there):
'aajkwehfawe;fh192.168.0.1awefawrgaer'
(I am pretty much ignoring that it is CSV for now. I am just going to use regex matches.)
# Get the file data
with open('test.csv', 'r') as f:
data = f.read()
# Look for the IP:
find_ip = '192.168.0.1'
import re
m = re.search('[^0-9]({})[^0-9]'.format(find_ip), data)
if m: # found!
# this is weird, because you already know the value in find_ip, but anyway...
ip = m.group(1).split('.')
print('Third ip = ' + ip[2])
else:
print('Did not find a match for {}'.format(find_ip))
I do not understand the second part of your question, i.e. removing the third value from a second file. Are there numbers listed line by line, and you want to find the line that contains this number above and delete the line? If yes:
# Make a new list of lines that omits the matched one
new_lines=[]
for line in open('iplist.txt','r'):
if line.strip()!=ip[2]: # skip the matched line
new_lines.append(line)
# Replace the file with the new list of lines
with open('iplist.txt', 'w') as f:
f.write('\n'.join(new_lines))

If, once you have found values in the first file that need to be removed in the second file, I suggest something like this pseudocode:
Load first file into memory
Search string representing first file for matches using a regular expression
(in python, check for re.find(regex, string), where regex = re.compile("[0-9]{3}\\.[0-9]{3}\\.[0-9]\\.[0-9]"), I am not entirely certain that you need the double backslash here, try with and without)
Build up a list of all matches
Exit first file
Load second file into memory
Search string representing second file for the start index and end index of each match
For each match, use the expression string = string[:start_of_match] + string[end_of_match:]
Re-write the string representing the second (now trimmed) file to the second file
Essentially whenever you find a match, redefine the string to be the slices on either side of it, excluding it from the new string assignment. Then rewrite your string to a file.

python working with files as they are written

So I'm trying to create a little script to deal with some logs. I'm just learning python, but know about loops and such in other languages. It seems that I don't understand quite how the loops work in python.
I have a raw log from which I'm trying to isolate just the external IP addresses. An example line:
05/09/2011 17:00:18 192.168.111.26 192.168.111.255 Broadcast packet dropped udp/netbios-ns 0 0 X0 0 0 N/A
And heres the code I have so far:
import os,glob,fileinput,re
def parseips():
f = open("126logs.txt",'rb')
r = open("rawips.txt",'r+',os.O_NONBLOCK)
for line in f:
rf = open("rawips.txt",'r+',os.O_NONBLOCK)
ip = line.split()[3]
res=re.search('192.168.',ip)
if not res:
rf.flush()
for line2 in rf:
if ip not in line2:
r.write(ip+'\n')
print 'else write'
else:
print "no"
f.close()
r.close()
rf.close()
parseips()
I have it parsing out the external ip's just fine. But, thinking like a ninja, I thought how cool would it be to handle dupes? The idea or thought process was that I can check the file that the ips are being written to against the current line for a match, and if there is a match, don't write. But this produces many more times the dupes than before :) I could probably use something else, but I'm liking python and it makes me look busy.
Thanks for any insider info.

DISCLAIMER: Since you are new to python, I am going to try to show off a little, so you can lookup some interesting "python things".
I'm going to print all the IPs to console:
def parseips():
with open("126logs.txt",'r') as f:
for line in f:
ip = line.split()[3]
if ip.startswith('192.168.'):
print "%s\n" %ip,
You might also want to look into:
f = open("126logs.txt",'r')
IPs = [line.split()[3] for line in f if line.split()[3].startswith('192.168.')]
Hope this helps,
Enjoy Python!

Something along the lines of this might do the trick:
import os,glob,fileinput,re
def parseips():
prefix = '192.168.'
#preload partial IPs from existing file.
if os.path.exists('rawips.txt'):
with open('rawips.txt', 'rt') as f:
partial_ips = set([ip[len(prefix):] for ip in f.readlines()])
else:
partial_ips = set()
with open('126logs.txt','rt') as input, with open('rawips.txt', 'at') as output:
for line in input:
ip = line.split()[3]
if ip.startswith(prefix) and not ip[len(prefix):] in partial_ips:
partial_ips.add(ip[len(prefix):])
output.write(ip + '\n')
parseips()

Rather than looping through the file you're writing, you might try just using a set. It might consume more memory, but your code will be much nicer, so it's probably worth it unless you run into an actual memory constraint.

Assuming you're just trying to avoid duplicate external IPs, consider creating an additional data structure in order to keep track of which IPs have already been written. Since they're in string format, a dictionary would be good for this.
externalIPDict = {}
#code to detect external IPs goes here- when you get one;
if externalIPString in externalIPDict:
pass # do nothing, you found a dupe
else:
externalIPDict[externalIPDict] = 1
#your code to add the external IP to your file goes here

Finding new IP in a file

I have a file of IP addresses called "IPs". When I parse a new IP from my logs, I'd like to see if the new IP is already in file IPs, before I add it. I know how to add the new IP to the file, but I'm having trouble seeing if the new IP is already in the file.
!/usr/bin/python
from IPy import IP
IP = IP('192.168.1.2')
#f=open(IP('IPs', 'r')) #This line doesn't work
f=open('IPs', 'r') # this one doesn't work
for line in f:
if IP == line:
print "Found " +IP +" before"
f.close()
In the file "IPs", each IP address is on it's own line. As such:
222.111.222.111
222.111.222.112
Also tried to put the file IPs in to an array, but not having good luck with that either.
Any ideas?
Thank you,
Gary

iplist = []
# With takes care of all the fun file handling stuff (closing, etc.)
with open('ips.txt', 'r') as f:
for line in f:
iplist.append(line.strip()) # Gets rid of the newlines at the end
# Change the above to this for Python versions < 2.6
f = open('ips.txt', 'r')
for line in f:
iplist.append(line.strip())
f.close()
newip = '192.168.1.2'
if newip not in iplist:
f = open('ips.txt', 'a') # append mode, please
f.write(newip+'\n')
Now you have your IPs in a list (iplist) and you can easily add your newip to it iplist.append(newip) or do anything else you please.
Edit:
Some excellent books for learning Python:
If you're worried about programming being difficult, there's a book that's geared towards kids, but I honestly found it both easy-to-digest and informative.
Snake Wrangling for Kids
Another great resource for learning Python is How to Think Like a Computer Scientist.
There's also the tutorial on the official Python website. It's a little dry compared to the previous ones.
Alan Gauld, one of the foremost contributors to the tutor#python.org mailing list has this tutorial that's really good and also is adapted to Python 3. He also includes some other languages for comparison.
If you want a good dead-tree book, I've heard that Core Python Programming by Wesley Chun is a really good resource. He also contributes to the python tutor list every so often.
The tutor list is another good place to learn about python - reading, replying, and asking your own questions. I actually learned most of my python by trying to answer as many of the questions I could. I'd seriously recommend subscribing to the tutor list if you want to learn Python.

It's a trivial code but i think it is short and pretty in Python, so here is how i'd write it:
ip = '192.168.1.2'
lookFor = ip + '\n'
f = open('ips.txt', 'a+')
for line in f:
if line == lookFor:
print 'found', ip, 'before.'
break
else:
print ip, 'not found, adding to file.'
print >>f, ip
f.close()
It opens the file in append mode, reads and if not found (that's what else to a for does - executes if the loop exited normally and not via break) - appends the new IP. ta-da!
Now will be ineffective when you have a lot of IPs. Here is another hack i thought of, it uses 1 file per 1 IP as a flag:
import os
ip = '192.168.1.2'
fname = ip + '.ip'
if os.access(fname, os.F_OK):
print 'found', ip, 'before.'
else:
print ip, 'not found, registering.'
open(fname, 'w').close()
Why is this fast? Because most file systems these days (except FAT on Windows but NTFS is ok) organize the list of files in a directory into a B-tree structure, so checking for a file existence is a fast operation O(log N) instead of enumerating whole list.
(I am not saying this is practical - depends on amount of IPs you expect to see and your sysadmin benevolence.)

Why do you need this IP thing? Use simple strings.
!#/usr/bin/env python
ip = "192.168.1.2" + "\n" ### Fixed -- see comments
f = open('IPs', 'r')
for line in f:
if line.count(ip):
print "Found " + ip
f.close()
Besides, this looks more like a task for grep and friends.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.