So I'm trying to create a little script to deal with some logs. I'm just learning python, but know about loops and such in other languages. It seems that I don't understand quite how the loops work in python.
I have a raw log from which I'm trying to isolate just the external IP addresses. An example line:
05/09/2011 17:00:18 192.168.111.26 192.168.111.255 Broadcast packet dropped udp/netbios-ns 0 0 X0 0 0 N/A
And heres the code I have so far:
import os,glob,fileinput,re
def parseips():
f = open("126logs.txt",'rb')
r = open("rawips.txt",'r+',os.O_NONBLOCK)
for line in f:
rf = open("rawips.txt",'r+',os.O_NONBLOCK)
ip = line.split()[3]
res=re.search('192.168.',ip)
if not res:
rf.flush()
for line2 in rf:
if ip not in line2:
r.write(ip+'\n')
print 'else write'
else:
print "no"
f.close()
r.close()
rf.close()
parseips()
I have it parsing out the external ip's just fine. But, thinking like a ninja, I thought how cool would it be to handle dupes? The idea or thought process was that I can check the file that the ips are being written to against the current line for a match, and if there is a match, don't write. But this produces many more times the dupes than before :) I could probably use something else, but I'm liking python and it makes me look busy.
Thanks for any insider info.
DISCLAIMER: Since you are new to python, I am going to try to show off a little, so you can lookup some interesting "python things".
I'm going to print all the IPs to console:
def parseips():
with open("126logs.txt",'r') as f:
for line in f:
ip = line.split()[3]
if ip.startswith('192.168.'):
print "%s\n" %ip,
You might also want to look into:
f = open("126logs.txt",'r')
IPs = [line.split()[3] for line in f if line.split()[3].startswith('192.168.')]
Hope this helps,
Enjoy Python!
Something along the lines of this might do the trick:
import os,glob,fileinput,re
def parseips():
prefix = '192.168.'
#preload partial IPs from existing file.
if os.path.exists('rawips.txt'):
with open('rawips.txt', 'rt') as f:
partial_ips = set([ip[len(prefix):] for ip in f.readlines()])
else:
partial_ips = set()
with open('126logs.txt','rt') as input, with open('rawips.txt', 'at') as output:
for line in input:
ip = line.split()[3]
if ip.startswith(prefix) and not ip[len(prefix):] in partial_ips:
partial_ips.add(ip[len(prefix):])
output.write(ip + '\n')
parseips()
Rather than looping through the file you're writing, you might try just using a set. It might consume more memory, but your code will be much nicer, so it's probably worth it unless you run into an actual memory constraint.
Assuming you're just trying to avoid duplicate external IPs, consider creating an additional data structure in order to keep track of which IPs have already been written. Since they're in string format, a dictionary would be good for this.
externalIPDict = {}
#code to detect external IPs goes here- when you get one;
if externalIPString in externalIPDict:
pass # do nothing, you found a dupe
else:
externalIPDict[externalIPDict] = 1
#your code to add the external IP to your file goes here
Related
I am working on a project to search an IP address and see if it is in the logfile. I made some good progress but got stuck when dealing with searching certain items in the logfile format.
Here is what I have:
IP = raw_input('Enter IP Address:')
with open ('RoutingTable.txt', 'r') as searchIP:
for line in searchIP:
if IP in line:
ipArray = line.split()
print ipArray
if IP == ipArray[0]:
print "Success"
else:
print "Fail"
As you can see this is very bad code but I am new to Python and programming so I used this to make sure I can at least open file and compare first item to string I enter.
Her is an example file content (my actual file has like thousands of entries):
https://pastebin.com/ff40sij5
I would like a way to store all IPs (just IP and not other junk) in an array and then a loop to go through all items in array and compare with user defined IP.
For example, for this line all care care about is 10.20.70.0/23
D EX 10.20.70.0/23 [170/3072] via 10.10.10.2, 6d06h, Vlan111
[170/3072] via 10.10.10.2, 6d06h, Vlan111
[170/3072] via 10.10.10.2, 6d06h, Vlan111
[170/3072] via 10.10.10.2, 6d06h, Vlan111
Please help.
Thanks
Damon
Edit: I am digging setting flags but that only works in some cases as you can see all lines do not start with D but there are some that start with O (for OSFP routes) and C (directly connected).
Here is how is what I am doing:
f = open("RoutingTable.txt")
Pr = False
for line in f.readlines():
if Pr: print line
if "EX" in line:
Pr = True
print line
if "[" in line:
Pr = False
f.close()
That gives me a bit cleaner result but still whole line instead of just IP.
Do you necessarily need to store all the IPs by themselves? You can do the following, where you grab all the data into a list and check if your input string resides inside the list:
your_file = 'RoutingTable.txt'
IP = input('Enter IP Address:')
with open(your_file, 'r') as f:
data = f.readlines()
for d in data:
if IP in d:
print 'success'
break
else:
print 'fail'
The else statement only triggers when you don't break, i.e. there is no success case.
If you cannot read everything into memory, you can iterate over each line like you did in your post, but thousands of lines should be easily doable.
Edit
import re
your_file = 'RoutingTable.txt'
ip_addresses = []
IP = input('Enter IP Address:')
with open(your_file, 'r') as f:
data = f.readlines()
for d in data:
res = re.search('(\d+\.\d+\.\d+\.\d+\/\d+)', d)
if res:
ip_addresses.append(res.group(1))
for ip_addy in ip_addresses:
if IP == ip_addy:
print 'success'
break
else:
print 'fail'
print ip_addresses
First up, I'd like to mention that your initial way of handling the file opening and closing (where you used a context manager, the "with open(..)" part) is better. It's cleaner and stops you from forgetting to close it again.
Second, I would personally approach this with a regular expression. If you know you'll be getting the same pattern of it beginning with D EX or O, etc. and then an address and then the bracketed section, a regular expression shouldn't be much work, and they're definitely worth understanding.
This is a good resource to learn generally about them: http://regular-expressions.mobi/index.html?wlr=1
Different languages have different ways to interpret the patterns. Here's a link for python specifics for it (remember to import re): https://docs.python.org/3/howto/regex.html
There is also a website called regexr (I don't have enough reputation for another link) that you can use to mess around with expressions on to get to grips with it.
To summarise, I'd personally keep the initial context manager for opening the file, then use the readlines method from your edit, and inside that, use a regular expression to get out the address from the line, and stick the address you get back into a list.
The solution to the below may seem pretty "basic" to some of you; I've tried tons of source code and tons of reading to accomplish this task and constantly receive output that's barely readable to me, which simply doesn't execute, or just doesn't let me out of the loop.
I have tried using: split(), splitlines(), import re - re.sub(), replace(), etc.
But I have only been able to make them succeed using basic strings, but not when it has come to using text files, which have delimiters, involve new lines. I'm not perfectly sure how to use for loops to iterate through text files although I have used them in Python to create batch files which rely on increments. I am very confused about the current task.
=========================================================================
Problem:
I've created a text file (file.txt) that features the following info:
2847:784 3637354:
347263:9379 4648292:
63:38940 3547729:
I would like to use the first colon (:) as my delimiter and have my output print only the numbers that appear before it on each individual line. I want it to look like the following:
2847
347263
63
I've read several topics and have tried to play around with the coded solutions but have not received the output I've desired, nor do I think I fully understand what many of these solutions are saying. I've read several books and websites on the topic to no avail so what i am resorting to now is asking in order to retrieve code that may help me, then I will attempt to play around with it to form my own understanding. I hope that does not make anyone feel as though they are working too hard on my behalf. What I have tried so far is:
tt = open('file.txt', 'r').read()
[i for i in tt if ':' not in i]
vv = open('file.txt', 'r').read()
bb = vv.split(':')
print(bb)
vv = open('file.txt', 'r').read()
bb = vv.split(':')
for e in bb:
print(e)
vv = open('file.txt', 'r').read()
bb = vv.split(':')
lines = [line.rstrip('\n') for line in bb]
print(lines)
io = open('file.txt', 'r').read()
for line in io.splitlines():
print(line.split(" ",1)[0]
with open('file.txt') as f:
lines = f.readlines()
print(lines)
The output from each of these doesn't give me what I desire, but I'm not sure what I'm doing wrong at all. Is there a source I can consult for guidance. I have been reading the forum along with, "Fluent Python," "Data Wrangling with Python," "Automate the Boring Stuff," and "Learn Python the Hard Way," and I have not been able to figure this problem out. Thanks in advance for the assistance.
Try this:
with open('file.txt') as myfile:
for line in myfile:
print(line.split(':')[0])
Noob question here. I'm scheduling a cron job for a Python script for every 2 hours, but I want the script to stop running after 48 hours, which is not a feature of cron. To work around this, I'm recording the number of executions at the end of the script in a text file using a tally mark x and opening the text file at the beginning of the script to only run if the count is less than n.
However, my script seems to always run regardless of the conditions. Here's an example of what I've tried:
with open("curl-output.txt", "a+") as myfile:
data = myfile.read()
finalrun = "xxxxx"
if data != finalrun:
[CURL CODE]
with open("curl-output.txt", "a") as text_file:
text_file.write("x")
text_file.close()
I think I'm missing something simple here. Please advise if there is a better way of achieving this. Thanks in advance.
The problem with your original code is that you're opening the file in a+ mode, which seems to set the seek position to the end of the file (try print(data) right after you read the file). If you use r instead, it works. (I'm not sure that's how it's supposed to be. This answer states it should write at the end, but read from the beginning. The documentation isn't terribly clear).
Some suggestions: Instead of comparing against the "xxxxx" string, you could just check the length of the data (if len(data) < 5). Or alternatively, as was suggested, use pickle to store a number, which might look like this:
import pickle
try:
with open("curl-output.txt", "rb") as myfile:
num = pickle.load(myfile)
except FileNotFoundError:
num = 0
if num < 5:
do_curl_stuff()
num += 1
with open("curl-output.txt", "wb") as myfile:
pickle.dump(num, myfile)
Two more things concerning your original code: You're making the first with block bigger than it needs to be. Once you've read the string into data, you don't need the file object anymore, so you can remove one level of indentation from everything except data = myfile.read().
Also, you don't need to close text_file manually. with will do that for you (that's the point).
Sounds more for a job scheduling with at command?
See http://www.ibm.com/developerworks/library/l-job-scheduling/ for different job scheduling mechanisms.
The first bug that is immediately obvious to me is that you are appending to the file even if data == finalrun. So when data == finalrun, you don't run curl but you do append another 'x' to the file. On the next run, data will be not equal to finalrun again so it will continue to execute the curl code.
The solution is of course to nest the code that appends to the file under the if statement.
Well there probably is an end of line jump \n character which makes that your file will contain something like xx\n and not simply xx. Probably this is why your condition does not work :)
EDIT
What happens if through the python command line you type
open('filename.txt', 'r').read() # where filename is the name of your file
you will be able to see whether there is an \n or not
Try using this condition along with if clause instead.
if data.count('x')==24
data string may contain extraneous data line new line characters. Check repr(data) to see if it actually a 24 x's.
All,
I am rather new and am looking for assistance. I need to perform a string search on a data set that compressed is about 20 GB of data. I have an eight core ubuntu box with 32 GB of RAM that I can use to crunch through this but am not able to implement nor determine the best possible code for such a task. Would Threading or multiprocessing be best for such a task? Please provide code samples. Thank you.
Please see my current code;
#!/usr/bin/python
import sys
logs = []
iplist = []
logs = open(sys.argv[1], 'r').readlines()
iplist = open(sys.argv[2], 'r').readlines()
print "+Loaded {0} entries for {1}".format(len(logs), sys.argv[1])
print "+Loaded {0} entries for {1}".format(len(iplist), sys.argv[2])
for a in logs:
for b in iplist:
if a.lower().strip() in b.lower().strip()
print "Match! --> {0}".format(a.lower().strip())
I'm not sure if multithreading can help you, but your code has a problem that is bad for performance: Reading the logs in one go consumes incredible amounts of RAM and thrashes your cache. Instead, open it and read it sequentially, after all you are making a sequential scan, don't you? Then, don't repeat any operations on the same data. In particular, the iplist doesn't change, but for every log entry, you are repeatedly calling b.lower().strip(). Do that once, after reading the file with the IP addresses.
In short, this looks like this:
with open(..) as f:
iplist = [l.lower().strip() for l in f]
with open(..) as f:
for l in f:
l = l.lower().strip()
if l in iplist:
print('match!')
You can improve performance even more by using a set for iplist, because looking things up there will be faster when there are many elements. That said, I'm assuming that the second file is huge, while iplist will remain relatively small.
BTW: You could improve performance with multiple CPUs by using one to read the file and the other to scan for matches, but I guess the above will already give you a sufficient performance boost.
I have a file of IP addresses called "IPs". When I parse a new IP from my logs, I'd like to see if the new IP is already in file IPs, before I add it. I know how to add the new IP to the file, but I'm having trouble seeing if the new IP is already in the file.
!/usr/bin/python
from IPy import IP
IP = IP('192.168.1.2')
#f=open(IP('IPs', 'r')) #This line doesn't work
f=open('IPs', 'r') # this one doesn't work
for line in f:
if IP == line:
print "Found " +IP +" before"
f.close()
In the file "IPs", each IP address is on it's own line. As such:
222.111.222.111
222.111.222.112
Also tried to put the file IPs in to an array, but not having good luck with that either.
Any ideas?
Thank you,
Gary
iplist = []
# With takes care of all the fun file handling stuff (closing, etc.)
with open('ips.txt', 'r') as f:
for line in f:
iplist.append(line.strip()) # Gets rid of the newlines at the end
# Change the above to this for Python versions < 2.6
f = open('ips.txt', 'r')
for line in f:
iplist.append(line.strip())
f.close()
newip = '192.168.1.2'
if newip not in iplist:
f = open('ips.txt', 'a') # append mode, please
f.write(newip+'\n')
Now you have your IPs in a list (iplist) and you can easily add your newip to it iplist.append(newip) or do anything else you please.
Edit:
Some excellent books for learning Python:
If you're worried about programming being difficult, there's a book that's geared towards kids, but I honestly found it both easy-to-digest and informative.
Snake Wrangling for Kids
Another great resource for learning Python is How to Think Like a Computer Scientist.
There's also the tutorial on the official Python website. It's a little dry compared to the previous ones.
Alan Gauld, one of the foremost contributors to the tutor#python.org mailing list has this tutorial that's really good and also is adapted to Python 3. He also includes some other languages for comparison.
If you want a good dead-tree book, I've heard that Core Python Programming by Wesley Chun is a really good resource. He also contributes to the python tutor list every so often.
The tutor list is another good place to learn about python - reading, replying, and asking your own questions. I actually learned most of my python by trying to answer as many of the questions I could. I'd seriously recommend subscribing to the tutor list if you want to learn Python.
It's a trivial code but i think it is short and pretty in Python, so here is how i'd write it:
ip = '192.168.1.2'
lookFor = ip + '\n'
f = open('ips.txt', 'a+')
for line in f:
if line == lookFor:
print 'found', ip, 'before.'
break
else:
print ip, 'not found, adding to file.'
print >>f, ip
f.close()
It opens the file in append mode, reads and if not found (that's what else to a for does - executes if the loop exited normally and not via break) - appends the new IP. ta-da!
Now will be ineffective when you have a lot of IPs. Here is another hack i thought of, it uses 1 file per 1 IP as a flag:
import os
ip = '192.168.1.2'
fname = ip + '.ip'
if os.access(fname, os.F_OK):
print 'found', ip, 'before.'
else:
print ip, 'not found, registering.'
open(fname, 'w').close()
Why is this fast? Because most file systems these days (except FAT on Windows but NTFS is ok) organize the list of files in a directory into a B-tree structure, so checking for a file existence is a fast operation O(log N) instead of enumerating whole list.
(I am not saying this is practical - depends on amount of IPs you expect to see and your sysadmin benevolence.)
Why do you need this IP thing? Use simple strings.
!#/usr/bin/env python
ip = "192.168.1.2" + "\n" ### Fixed -- see comments
f = open('IPs', 'r')
for line in f:
if line.count(ip):
print "Found " + ip
f.close()
Besides, this looks more like a task for grep and friends.