Obtaining unique list using set function with multiple elements - python

Two part question from someone quite new to python (and scripting in general):
I've worked out how to get a list of IP addresses from a file, then output a unique set of those IPs to a file as follows:
ip_list = []
with open('testfile', 'r') as file:
for line in file:
if line not in ip_list:
ip_list.append(line)
with open('testoutput', 'w') as file:
for line in ip_list:
file.write("%s\n" % line)
I then saw that I could do this an alternate way, and I'm wondering if this is sane?
ip_list = []
with open('testfile', 'r') as file:
for line in file:
ip_list.append(line)
with open('testoutput', 'w') as file:
for line in set(ip_list):
file.write("%s\n" % line)
Next, I now want to get a list of IP addresses coupled with PERMIT/DENY strings, given that the opened file is something like:
1.1.1.1 PERMIT
2.2.2.2 PERMIT
3.3.3.3 DENY
1.1.1.1 PERMIT
I still want to output only the unique IPs, so I can do this with the first method:
ip_list = []
with open('testfile', 'r') as file:
for line in file:
elements = line.split(' ')
if elements[0] not in ip_list
ip_list.append(elements)
with open('testoutput', 'w') as file:
for line in ip_list:
file.write("%s %s\n" % (line[0], line[1]))
But can I do something using the set command instead? Or can I do something better than the above snippet?
And for this example, assume that I don't want to compare entire lines for uniqueness (i.e. '1.1.1.1 PERMIT')

You can do something like:
ip_set = set()
ip_list = []
with open('testfile', 'r') as file:
for line in file:
elements = line.split()
ip = elements[0]
if ip not in ip_set
ip_set.add(ip)
ip_list.append(elements)
with open('testoutput', 'w') as file:
for line in ip_list:
file.write("%s %s\n" % (line[0], line[1]))
Note that I removed the argument to split(), so that it will handle all whitespace, not just spaces.

Related

How to insert a file with a list into a script?

I am trying to add a list file to a script.
I need to make it so that to take the public key data from the "list.txt" file and save all the results to the "save.txt" file?
from bitcoinlib.keys import Address
master = Address ("0341b40ab5b2972161f2ff3d5487e0fb8260f2d98221cc2eb4fa3f28b6ad10d81e", encoding = 'bech32', script_type = 'p2wpkh')
print (master.address)
At the moment I am getting one value
bc1q7wdz5dcs553f2y6qgf38xdgqs2kqgkhn5ydn9l
How to fix that in place of this value: 0341b40ab5b2972161f2ff3d5487e0fb8260f2d98221cc2eb4fa3f28b6ad10d81e
There was a list of this file "list.txt"
02485a4e62913be3db116d1ab15f84110599ea8905cd7dbae7be6fa02033fdb54e
0315da5f8f47787f6e8294bd369a4dd81aea97429630ecae831a9f6362a6917106
023741e71ebddc5eca046c9b23ac7c5230160fe1335e655c9bbe0b8a20c8d89802
037782a3fcc6c0ca092658a513c9f051cc95d540d215f0c965176c664d49d3e732
029c6c7748107fc9584a838df6a2c8224ae2339e2a95b15b4cd8bcc67c2d149cd5
To get all the value and save to the file "save.txt"
bc1q6jxrahx3rw6lt2nlv5fpsdtllyzaa03m4d98xv
bc1qct3fu8543tryapkq4kpgw5ph8cj74zhtrdp5sx
bc1q5a3h25vu4kn90sc70rkm65narezzw97khu4dhu
bc1qutzkrtc7tqqjgrzns3s3h92f8wfxvfhp99ppnn
bc1ql2slqxzp7c9hdxhlp0ehlzdg2qa94xh5lk2anw
Please help me with fixing the code!
As far as i'm concerned you want to use each line of file separately.
First read all the lines to list:
with open('list.txt', 'r') as f:
lists = [i.replace('\n', '') for i in f.readlines()]
Then for each line, create Address instance, and save it to another list:
addresses = []
for l in lists:
master = Address(l, encoding = 'bech32', script_type = 'p2wpkh')
addresses.append(master.address)
The last part is to save all to file save.txt
with open('save.txt', 'w+') as f:
for a in addresses:
f.write(a + '\n')

Python regex removing port number from IP string

I have text file which contains lines of text and IPs with port number and I want to remove port number and print just IP.
Example text file:
77.55.211.77:8080
NoIP
79.127.57.42:80
Desired output:
77.55.211.77
79.127.57.42
My code:
import re
with open('IPs.txt', 'r') as infile:
for ip in infile:
ip = ip.strip('\n')
IP_without_port_number = re.sub(r'((?::))(?:[0-9]+)$', "", ip)
re_for_IP = re.match(r'^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$',ip)
print(IP_without_port_number)
I am not understand why I see all lines as output when I am printing to console "IP_without_port_number"
All you need is the second match:
import re
with open('IPs.txt', 'r') as infile:
for ip in infile:
re_for_IP = re.match(r'(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})', ip)
if re_for_IP:
print(re_for_IP[0])
Output:
77.55.211.77
79.127.57.42
One-liner:
import re
ips = []
with open('IPs.txt', 'r') as infile:
ips = [ip[0] for ip in [re.match(r'(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})', ip) for ip in infile] if ip]
print(ips)
You don't need regex, use the split function on the : character when reading the line. Then you would be left with an array with two positions, the first containing only the IP address and the other containing the port.
Try this:
import re
regex = '''^(25[0-5]|2[0-4][0-9]|[0-1]?[0-9][0-9]?)\.(
25[0-5]|2[0-4][0-9]|[0-1]?[0-9][0-9]?)\.(
25[0-5]|2[0-4][0-9]|[0-1]?[0-9][0-9]?)\.(
25[0-5]|2[0-4][0-9]|[0-1]?[0-9][0-9]?)$'''
with open('IP.txt', 'r') as infile:
for ip in infile:
ip = ip.strip('\n')
IP_without_port_number = re.sub(r':.*$', "", ip)
re_for_IP = re.match(r'^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$',ip)
if(re.search(regex, IP_without_port_number)):
print(IP_without_port_number)
Output:
77.55.211.77
79.127.57.42
I came up wit this regex code, it works for me and its easy.
import re
text = input("Input text: ")
pattern = re.findall(r'\d+\.\d+\.\d+\.\d+', text)
print(pattern)

Python search a file for text using input from another file

I'm new to python and programming. I need some help with a python script. There are two files each containing email addresses (more than 5000 lines). Input file contains email addresses that I want to search in the data file(also contains email addresses). Then I want to print the output to a file or display on the console. I search for scripts and was able to modify but I'm not getting the desired results. Can you please help me?
dfile1 (50K lines)
yyy#aaa.com
xxx#aaa.com
zzz#aaa.com
ifile1 (10K lines)
ccc#aaa.com
vvv#aaa.com
xxx#aaa.com
zzz#aaa.com
Output file
xxx#aaa.com
zzz#aaa.com
datafile = 'C:\\Python27\\scripts\\dfile1.txt'
inputfile = 'C:\\Python27\\scripts\\ifile1.txt'
with open(inputfile, 'r') as f:
names = f.readlines()
outputlist = []
with open(datafile, 'r') as fd:
for line in fd:
name = fd.readline()
if name[1:-1] in names:
outputlist.append(line)
else:
print "Nothing found"
print outputlist
New Code
with open(inputfile, 'r') as f:
names = f.readlines()
outputlist = []
with open(datafile, 'r') as f:
for line in f:
name = f.readlines()
if name in names:
outputlist.append(line)
else:
print "Nothing found"
print outputlist
Maybe I'm missing something, but why not use a pair of sets?
#!/usr/local/cpython-3.3/bin/python
data_filename = 'dfile1.txt'
input_filename = 'ifile1.txt'
with open(input_filename, 'r') as input_file:
input_addresses = set(email_address.rstrip() for email_address in input_file.readlines())
with open(data_filename, 'r') as data_file:
data_addresses = set(email_address.rstrip() for email_address in data_file.readlines())
print(input_addresses.intersection(data_addresses))
mitan8 gives the problem you have, but this is what I would do instead:
with open(inputfile, "r") as f:
names = set(i.strip() for i in f)
output = []
with open(datafile, "r") as f:
for name in f:
if name.strip() in names:
print name
This avoids reading the larger datafile into memory.
If you want to write to an output file, you could do this for the second with statement:
with open(datafile, "r") as i, open(outputfile, "w") as o:
for name in i:
if name.strip() in names:
o.write(name)
Here's what I would do:
names=[]
outputList=[]
with open(inputfile) as f:
for line in f:
names.append(line.rstrip("\n")
myEmails=set(names)
with open(outputfile) as fd, open("emails.txt", "w") as output:
for line in fd:
for name in names:
c=line.rstrip("\n")
if name in myEmails:
print name #for console
output.write(name) #for writing to file
I think your issue stems from the following:
name = fd.readline()
if name[1:-1] in names:
name[1:-1] slices each email address so that you skip the first and last characters. While it might be good in general to skip the last character (a newline '\n'), when you load the name database in the "dfile"
with open(inputfile, 'r') as f:
names = f.readlines()
you are including newlines. So, don't slice the names in the "ifile" at all, i.e.
if name in names:
I think you can remove name = fd.readline() since you've already got the line in the for loop. It'll read another line in addition to the for loop, which reads one line every time. Also, I think name[1:-1] should be name, since you don't want to strip the first and last character when searching. with automatically closes the files opened.
PS: How I'd do it:
with open("dfile1") as dfile, open("ifile") as ifile:
lines = "\n".join(set(dfile.read().splitlines()) & set(ifile.read().splitlines())
print(lines)
with open("ofile", "w") as ofile:
ofile.write(lines)
In the above solution, basically I'm taking the union (elements part of both sets) of the lines of both the files to find the common lines.

Searching words from a file in another file

I have got 2 files:
access.log.13 : a simple access log from a web server.
bots.txt : that contains spider's and crawlers names, each one in a different line, for example: googlebot mj12bot baidu etc etc
I would like to create a third file "hits.txt" with all the lines from "access.log.13" that contains any of the words from the file "spiders.txt"
This is my little Frankeinstein:
file_working = file("hits.txt", "wt")
file_1_logs = open("access.log.13", "r")
file_2_bots = open("bots.txt", "r")
file_3_hits = open("hits.txt", "a")
list_1 = arxiu_1_logs.readlines()
list_2 = arxiu_2_bots.readlines()
file_3_hits.write("Lines with bots: \n \n")
for i in list_2:
for j in list_1:
if i in j:
file_3_hits.write(j)
arxiu_1_logs.close()
arxiu_2_bots.close()
It doesn't work as i would like cause i only get hits when the line in bots.txt is exactly the same than any line in access.log.13. Thx
You can do it in a more pythonish way:
with open('spiders.txt') as fh:
words = set(re.split(r'[ \n\r]+', fh.read())) # set of searched words
with open('access.log.13') as file_in, \
open('hits.txt', 'w') as file_out:
for line in file_in:
if any(word in line for word in words): # look for any of the words
file_out.write(line)
Or you can use even nicer comprehension:
with open(...) as file_in, open (...) as file_out: # same as previously
good_lines = (line for line in file_in if any(word in line for word in words))
for good_line in good_lines:
file_out.write(good_line)
Replace the if with this:
if j.find(i) != -1

Python: re-formatting multiple lines in text file

I apologize if this post is long, but I am trying to be as detailed as possible. I have done a considerable amount of research on the topic, and would consider myself an "intermediate" skilled programmer.
My problem: I have a text file with multiple lines of data. I would like to remove certain parts of each line in an effort to get rid of some irrelevant information, and then save the file with the newly formatted lines.
Here is an example of what I am trying to accomplish. The original line is something like:
access-list inbound_outside1 line 165 extended permit tcp any host 209.143.156.200 eq www (hitcnt=10086645) 0x3eb90594
I am trying to have the code read the text file, and output:
permit tcp any 209.143.156.200 www
The following code works, but only if there is a single line in the text file:
input_file = open("ConfigInput.txt", "r")
output_file = open("ConfigOutput.txt", "w")
for line in input_file:
line = line.split("extended ", 1)[1]
line = line.split("(", 1)[0]
line = line.replace(" host", "")
line = line.replace(" eq", "")
output_file.write(line)
output_file.close()
input_file.close()
However, when I attempt to run this with a full file of multiple lines of data, I receive an error:
File "C:\Python27\asaReader", line 5, in <module>
line = line.split("extended ", 1)[1]
IndexError: list index out of range
I suspect that it is not moving onto the next line of data in the text file, and therefore there isn't anything in [1] of the previous string. I would appreciate any help I can get on this.
Some possible causes:
You have blank lines in your file (blank lines obviously won't contain the word extended)
You have lines that aren't blank, but don't contain the word extended
You could try printing your lines individually to see where the problem occurs:
for line in input_file:
print("Got line: %s" % (line))
line = line.split("extended ", 1)[1]
Oh, and it's possible that the last line is blank and it's failing on that. It would be easy enough to miss.
Print something out when you hit a line that can't be processed
for line in input_file:
try:
line = line.split("extended ", 1)[1]
line = line.split("(", 1)[0]
line = line.replace(" host", "")
line = line.replace(" eq", "")
output_file.write(line)
except Exception, e:
print "Choked on this line: %r"%line
print e
An alternate approach would be to cache all the lines (assuming the file is not humongous.)
>>> with open('/tmp/ConfigInput.txt', 'rU') as f:
... lines = f.readlines()
...
...
>>> lines
['access-list inbound_outside1 line 165 extended permit tcp any host 209.143.156.200 eq www (hitcnt=10086645) 0x3eb90594\n']
>>> lines = [re.sub('(^.*extended |\(.*$)', '', line) for line in lines]
>>> lines
['permit tcp any host 209.143.156.200 eq www \n']
>>> with open('/tmp/ConfigOutput.txt', 'w') as f:
... f.writelines(lines)
...
...
>>>

Categories

Resources