Finding new IP in a file - python

I have a file of IP addresses called "IPs". When I parse a new IP from my logs, I'd like to see if the new IP is already in file IPs, before I add it. I know how to add the new IP to the file, but I'm having trouble seeing if the new IP is already in the file.
!/usr/bin/python
from IPy import IP
IP = IP('192.168.1.2')
#f=open(IP('IPs', 'r')) #This line doesn't work
f=open('IPs', 'r') # this one doesn't work
for line in f:
if IP == line:
print "Found " +IP +" before"
f.close()
In the file "IPs", each IP address is on it's own line. As such:
222.111.222.111
222.111.222.112
Also tried to put the file IPs in to an array, but not having good luck with that either.
Any ideas?
Thank you,
Gary

iplist = []
# With takes care of all the fun file handling stuff (closing, etc.)
with open('ips.txt', 'r') as f:
for line in f:
iplist.append(line.strip()) # Gets rid of the newlines at the end
# Change the above to this for Python versions < 2.6
f = open('ips.txt', 'r')
for line in f:
iplist.append(line.strip())
f.close()
newip = '192.168.1.2'
if newip not in iplist:
f = open('ips.txt', 'a') # append mode, please
f.write(newip+'\n')
Now you have your IPs in a list (iplist) and you can easily add your newip to it iplist.append(newip) or do anything else you please.
Edit:
Some excellent books for learning Python:
If you're worried about programming being difficult, there's a book that's geared towards kids, but I honestly found it both easy-to-digest and informative.
Snake Wrangling for Kids
Another great resource for learning Python is How to Think Like a Computer Scientist.
There's also the tutorial on the official Python website. It's a little dry compared to the previous ones.
Alan Gauld, one of the foremost contributors to the tutor#python.org mailing list has this tutorial that's really good and also is adapted to Python 3. He also includes some other languages for comparison.
If you want a good dead-tree book, I've heard that Core Python Programming by Wesley Chun is a really good resource. He also contributes to the python tutor list every so often.
The tutor list is another good place to learn about python - reading, replying, and asking your own questions. I actually learned most of my python by trying to answer as many of the questions I could. I'd seriously recommend subscribing to the tutor list if you want to learn Python.

It's a trivial code but i think it is short and pretty in Python, so here is how i'd write it:
ip = '192.168.1.2'
lookFor = ip + '\n'
f = open('ips.txt', 'a+')
for line in f:
if line == lookFor:
print 'found', ip, 'before.'
break
else:
print ip, 'not found, adding to file.'
print >>f, ip
f.close()
It opens the file in append mode, reads and if not found (that's what else to a for does - executes if the loop exited normally and not via break) - appends the new IP. ta-da!
Now will be ineffective when you have a lot of IPs. Here is another hack i thought of, it uses 1 file per 1 IP as a flag:
import os
ip = '192.168.1.2'
fname = ip + '.ip'
if os.access(fname, os.F_OK):
print 'found', ip, 'before.'
else:
print ip, 'not found, registering.'
open(fname, 'w').close()
Why is this fast? Because most file systems these days (except FAT on Windows but NTFS is ok) organize the list of files in a directory into a B-tree structure, so checking for a file existence is a fast operation O(log N) instead of enumerating whole list.
(I am not saying this is practical - depends on amount of IPs you expect to see and your sysadmin benevolence.)

Why do you need this IP thing? Use simple strings.
!#/usr/bin/env python
ip = "192.168.1.2" + "\n" ### Fixed -- see comments
f = open('IPs', 'r')
for line in f:
if line.count(ip):
print "Found " + ip
f.close()
Besides, this looks more like a task for grep and friends.

Related

Searching for a given string IP-address in a logfile

I am working on a project to search an IP address and see if it is in the logfile. I made some good progress but got stuck when dealing with searching certain items in the logfile format.
Here is what I have:
IP = raw_input('Enter IP Address:')
with open ('RoutingTable.txt', 'r') as searchIP:
for line in searchIP:
if IP in line:
ipArray = line.split()
print ipArray
if IP == ipArray[0]:
print "Success"
else:
print "Fail"
As you can see this is very bad code but I am new to Python and programming so I used this to make sure I can at least open file and compare first item to string I enter.
Her is an example file content (my actual file has like thousands of entries):
https://pastebin.com/ff40sij5
I would like a way to store all IPs (just IP and not other junk) in an array and then a loop to go through all items in array and compare with user defined IP.
For example, for this line all care care about is 10.20.70.0/23
D EX 10.20.70.0/23 [170/3072] via 10.10.10.2, 6d06h, Vlan111
[170/3072] via 10.10.10.2, 6d06h, Vlan111
[170/3072] via 10.10.10.2, 6d06h, Vlan111
[170/3072] via 10.10.10.2, 6d06h, Vlan111
Please help.
Thanks
Damon
Edit: I am digging setting flags but that only works in some cases as you can see all lines do not start with D but there are some that start with O (for OSFP routes) and C (directly connected).
Here is how is what I am doing:
f = open("RoutingTable.txt")
Pr = False
for line in f.readlines():
if Pr: print line
if "EX" in line:
Pr = True
print line
if "[" in line:
Pr = False
f.close()
That gives me a bit cleaner result but still whole line instead of just IP.
Do you necessarily need to store all the IPs by themselves? You can do the following, where you grab all the data into a list and check if your input string resides inside the list:
your_file = 'RoutingTable.txt'
IP = input('Enter IP Address:')
with open(your_file, 'r') as f:
data = f.readlines()
for d in data:
if IP in d:
print 'success'
break
else:
print 'fail'
The else statement only triggers when you don't break, i.e. there is no success case.
If you cannot read everything into memory, you can iterate over each line like you did in your post, but thousands of lines should be easily doable.
Edit
import re
your_file = 'RoutingTable.txt'
ip_addresses = []
IP = input('Enter IP Address:')
with open(your_file, 'r') as f:
data = f.readlines()
for d in data:
res = re.search('(\d+\.\d+\.\d+\.\d+\/\d+)', d)
if res:
ip_addresses.append(res.group(1))
for ip_addy in ip_addresses:
if IP == ip_addy:
print 'success'
break
else:
print 'fail'
print ip_addresses
First up, I'd like to mention that your initial way of handling the file opening and closing (where you used a context manager, the "with open(..)" part) is better. It's cleaner and stops you from forgetting to close it again.
Second, I would personally approach this with a regular expression. If you know you'll be getting the same pattern of it beginning with D EX or O, etc. and then an address and then the bracketed section, a regular expression shouldn't be much work, and they're definitely worth understanding.
This is a good resource to learn generally about them: http://regular-expressions.mobi/index.html?wlr=1
Different languages have different ways to interpret the patterns. Here's a link for python specifics for it (remember to import re): https://docs.python.org/3/howto/regex.html
There is also a website called regexr (I don't have enough reputation for another link) that you can use to mess around with expressions on to get to grips with it.
To summarise, I'd personally keep the initial context manager for opening the file, then use the readlines method from your edit, and inside that, use a regular expression to get out the address from the line, and stick the address you get back into a list.

Python conditional statement based on text file string

Noob question here. I'm scheduling a cron job for a Python script for every 2 hours, but I want the script to stop running after 48 hours, which is not a feature of cron. To work around this, I'm recording the number of executions at the end of the script in a text file using a tally mark x and opening the text file at the beginning of the script to only run if the count is less than n.
However, my script seems to always run regardless of the conditions. Here's an example of what I've tried:
with open("curl-output.txt", "a+") as myfile:
data = myfile.read()
finalrun = "xxxxx"
if data != finalrun:
[CURL CODE]
with open("curl-output.txt", "a") as text_file:
text_file.write("x")
text_file.close()
I think I'm missing something simple here. Please advise if there is a better way of achieving this. Thanks in advance.
The problem with your original code is that you're opening the file in a+ mode, which seems to set the seek position to the end of the file (try print(data) right after you read the file). If you use r instead, it works. (I'm not sure that's how it's supposed to be. This answer states it should write at the end, but read from the beginning. The documentation isn't terribly clear).
Some suggestions: Instead of comparing against the "xxxxx" string, you could just check the length of the data (if len(data) < 5). Or alternatively, as was suggested, use pickle to store a number, which might look like this:
import pickle
try:
with open("curl-output.txt", "rb") as myfile:
num = pickle.load(myfile)
except FileNotFoundError:
num = 0
if num < 5:
do_curl_stuff()
num += 1
with open("curl-output.txt", "wb") as myfile:
pickle.dump(num, myfile)
Two more things concerning your original code: You're making the first with block bigger than it needs to be. Once you've read the string into data, you don't need the file object anymore, so you can remove one level of indentation from everything except data = myfile.read().
Also, you don't need to close text_file manually. with will do that for you (that's the point).
Sounds more for a job scheduling with at command?
See http://www.ibm.com/developerworks/library/l-job-scheduling/ for different job scheduling mechanisms.
The first bug that is immediately obvious to me is that you are appending to the file even if data == finalrun. So when data == finalrun, you don't run curl but you do append another 'x' to the file. On the next run, data will be not equal to finalrun again so it will continue to execute the curl code.
The solution is of course to nest the code that appends to the file under the if statement.
Well there probably is an end of line jump \n character which makes that your file will contain something like xx\n and not simply xx. Probably this is why your condition does not work :)
EDIT
What happens if through the python command line you type
open('filename.txt', 'r').read() # where filename is the name of your file
you will be able to see whether there is an \n or not
Try using this condition along with if clause instead.
if data.count('x')==24
data string may contain extraneous data line new line characters. Check repr(data) to see if it actually a 24 x's.

Parse Apache log with Python 2.7

I'm trying to read a log file from a github url, add some geographic info using the IP as a lookup key, and then write some log info and the geographic info to a file. I've got the reading from and writing to file from the log, but I'm not sure what lib to use for looking up coordinates and such from an IP address, nor how to really go about this part. I found the regex module, and by the time I started to understand it, I found out it's deprecated. Here's what I've, got, any help would be great.
import urllib2
apacheLog = 'https://raw.githubusercontent.com/myAccessLog.log'
data = urllib2.urlopen(apacheLog)
for line in data:
with open('C:\LogCopy.txt','a') as f:
f.write(line)
The re module isn't deprecated, and is part of the standard library. Edit: here's the link for the 2.7 module
Your for loop is opening and closing the file at each iteration. Probably not a big deal but it might be faster for large files to open the file once and write what needs to be written. Just swap the locations of the for and with lines.
So
data = urllib2.urlopen(apacheLog)
for line in data:
with open('C:\LogCopy.txt','a') as f: # probably need a double backslash
f.write(line)
becomes
data = urllib2.urlopen(apacheLog)
with open('C:\LogCopy.txt','a') as f: # probably need a double backslash
for line in data.splitlines():
f.write(line) # might need a newline character
# f.write(line + '\n')
Similar question regarding geolocation Python library
Best of luck!
Edit: added the data.splitlines() call after reading Piotr Kempa's answer
Well the first part is simple. Just use for line in data.split('\n') assuming the lines end with a normal newline (they should).
Then you use the re module (import re) - I hope it was still in use in python 2.7... You can extract the IP address with something like re.search(r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}", line), look up the re.search() function for details how to use it.
As for locating the IP geographically, it was already asked I think, try this question: What python libraries can tell me approximate location and time zone given an IP address?

USing a Scanner Import

Say a file contains as set of records, and the first line of the record is:
# 2014 2 14 00:03:01 Matt "login" 0.01
I'm trying to print that entire first line then come back and loop the rest of the remaining files which I could do perfectly fine, But I was recently just informed that our teacher wants us to use a scanner import, and basically what a scanner is, is a reading subsystem that allows you to read whitespace-delimited tokens from a file. I'm pretty much confused on how a scanner can be used to read single lines at a time... any help on Scanners would be great
There is a lexical scanner in python standard library which is called tokenize: http://docs.python.org/2/library/tokenize.html
You have to pass a parameter which is a function used by the scanner to read a line, thus interfacing with any kind of input (string, file,...).
(read first line)
with file("...", 'r') as f:
g = generate_tokens(f.readline())
or (whole file)
with file("...", 'r') as f:
g = generate_tokens(f.read())
or (line by line)
with file("...", 'r') as f:
for l in f:
g = generate_tokens(StringIO(l).readline)
should do the trick.
You can go back to the begining of the file using f.seek(0)

python working with files as they are written

So I'm trying to create a little script to deal with some logs. I'm just learning python, but know about loops and such in other languages. It seems that I don't understand quite how the loops work in python.
I have a raw log from which I'm trying to isolate just the external IP addresses. An example line:
05/09/2011 17:00:18 192.168.111.26 192.168.111.255 Broadcast packet dropped udp/netbios-ns 0 0 X0 0 0 N/A
And heres the code I have so far:
import os,glob,fileinput,re
def parseips():
f = open("126logs.txt",'rb')
r = open("rawips.txt",'r+',os.O_NONBLOCK)
for line in f:
rf = open("rawips.txt",'r+',os.O_NONBLOCK)
ip = line.split()[3]
res=re.search('192.168.',ip)
if not res:
rf.flush()
for line2 in rf:
if ip not in line2:
r.write(ip+'\n')
print 'else write'
else:
print "no"
f.close()
r.close()
rf.close()
parseips()
I have it parsing out the external ip's just fine. But, thinking like a ninja, I thought how cool would it be to handle dupes? The idea or thought process was that I can check the file that the ips are being written to against the current line for a match, and if there is a match, don't write. But this produces many more times the dupes than before :) I could probably use something else, but I'm liking python and it makes me look busy.
Thanks for any insider info.
DISCLAIMER: Since you are new to python, I am going to try to show off a little, so you can lookup some interesting "python things".
I'm going to print all the IPs to console:
def parseips():
with open("126logs.txt",'r') as f:
for line in f:
ip = line.split()[3]
if ip.startswith('192.168.'):
print "%s\n" %ip,
You might also want to look into:
f = open("126logs.txt",'r')
IPs = [line.split()[3] for line in f if line.split()[3].startswith('192.168.')]
Hope this helps,
Enjoy Python!
Something along the lines of this might do the trick:
import os,glob,fileinput,re
def parseips():
prefix = '192.168.'
#preload partial IPs from existing file.
if os.path.exists('rawips.txt'):
with open('rawips.txt', 'rt') as f:
partial_ips = set([ip[len(prefix):] for ip in f.readlines()])
else:
partial_ips = set()
with open('126logs.txt','rt') as input, with open('rawips.txt', 'at') as output:
for line in input:
ip = line.split()[3]
if ip.startswith(prefix) and not ip[len(prefix):] in partial_ips:
partial_ips.add(ip[len(prefix):])
output.write(ip + '\n')
parseips()
Rather than looping through the file you're writing, you might try just using a set. It might consume more memory, but your code will be much nicer, so it's probably worth it unless you run into an actual memory constraint.
Assuming you're just trying to avoid duplicate external IPs, consider creating an additional data structure in order to keep track of which IPs have already been written. Since they're in string format, a dictionary would be good for this.
externalIPDict = {}
#code to detect external IPs goes here- when you get one;
if externalIPString in externalIPDict:
pass # do nothing, you found a dupe
else:
externalIPDict[externalIPDict] = 1
#your code to add the external IP to your file goes here

Categories

Resources