I'm writing a script to search through multiple text files with mac addresses in them to find what port they are associated with. I need to do this for several hundred mac addresses. The function runs the first time through fine. After that though the new mac address doesn't get passed to the function it remains as the same one it already used and the functions for loop only seems to run once.
import re
import csv
f = open('all_switches.csv','U')
source_file = csv.reader(f)
m = open('macaddress.csv','wb')
macaddress = csv.writer(m)
s = open('test.txt','r')
source_mac = s.read().splitlines()
count = 0
countMac = 0
countFor = 0
def find_mac(sneaky):
global count
global countFor
count = count +1
for switches in source_file:
countFor = countFor + 1
# print sneaky only goes through the loop once
switch = switches[4]
source_switch = open(switch + '.txt', 'r')
switch_read = source_switch.readlines()
for mac in switch_read:
# print mac does search through all the switches
found_mac = re.search(sneaky, mac)
if found_mac is not None:
interface = re.search("(Gi|Eth|Te)(\S+)", mac)
if interface is not None:
port = interface.group()
macaddress.writerow([sneaky, switch, port])
print sneaky + ' ' + switch + ' ' + port
source_switch.close()
for macs in source_mac:
match = re.search(r'[a-fA-F0-9]{4}[.][a-fA-F0-9]{4}[.][a-fA-F0-9]{4}', macs)
if match is not None:
sneaky = match.group()
find_mac(sneaky)
countMac = countMac + 1
print count
print countMac
print countFor
I've added the count countFor and countMac to see how many times the loops and functions run. Here is the output.
549f.3507.7674 the name of the switch Eth100/1/11
677
677
353
Any insight would be appreciated.
source_file is opened globally only once, so the first time you execute call find_mac(), the for switches in source_file: loop will exhaust the file. Since the file wasn't closed and reopened, the next time find_mac() is called the file pointer is at the end of the file and reads nothing.
Moving the following to the beginning of find_mac should fix it:
f = open('all_switches.csv','U')
source_file = csv.reader(f)
Consider using with statements to ensure your files are closed as well.
Related
Here is some code I'm working on requiring string and the opening and closing of files.
#Importing required Packages---------------------------------------------
import string
# Importing Datasets-----------------------------------------------------
allNames = open("allNames.csv", "r")
onlyNames = open("onlyNames.csv", "r")
#=========Tasks==========================================================
# [1] findName(name, outputFile)-----------------------------------------
# Works ####
def findName(name, outputFile):
outfile = open(outputFile + ".csv", "w") # Output file
outfile.write("Artist \tSong \tYear\n") # Initial title lines
alreadyAdded = [] # List of lines already added to remove duplicates
for aline in allNames: # Looping through allNames.csv
fields = aline.split("\t") # Splitting elements of a line into a list
if fields[-1] == name + "\n": # Selecting lines with only the specified name (last element)
dataline = fields[0] + "\t" + fields[1] + "\t" + fields[3] # Each line in the .csv file
if dataline not in alreadyAdded: # Removing Duplicates
outfile.write(dataline + "\n") # Writing the file
alreadyAdded.append(dataline) # Adding lines already added
outfile.close()
# findName("Mary Anne", "mary anne")
# findName("Jack", "jack")
# findName("Mary", "mary")
# findName("Peter", "peter")
The code serves its intended purpose as I get an exported file. However, this only works for one function at a time, for example if I try to run both findName("Mary Anne", "mary anne") and findName("Jack", "jack") at the same time, the second instance of the function does not work. Moreover, all subsequent functions on the project file do not work unless I comment out this code.
Let me know what the issue is, thank you!
I am still new to Python, and have been working on this for work, and a few side projects with it for automating my Plex Media Management tasks.
I am trying to write a python script that would allow me to take a set list of domains from a csv file, match them to their dns name: Example (Plex.tv using 'NS' would return jeremy.ns.cloudflare.com)
My main goal is to read in the list of domains from a csv
run my code to match those domains to a dns resolver name
write those to either a new CSV file, and then zip the two together, which is what I have in my code.
I am having a few problems along the way.
Visual Code doesn't allow import dns.resolver (not a huge issue, but if you know the fix for that it would save me from having to run it from command line)
Matching Domains to their DNS resolver is throwing the error "AttributeError: 'list' object has no attribute 'is_absolute'"
import csv
import socket
import dns.resolver
import os
from os.path import dirname, abspath
# Setting Variables
current_path = dirname(abspath(__file__))
domainFName = '{0}/domains.csv'.format(current_path)
outputFile = '{0}/output.csv'.format(current_path)
dnsList = '{0}/list2.csv'.format(current_path)
case_list = []
fields = ['Domains', 'DNS Resolvers']
caseList = []
dnsResolve = []
# Read in all domains from csv into list
with open(domainFName, 'r') as file:
for line in csv.reader(file):
case_list.append(line)
print(case_list)
# Match domains to the DNS Resolver Name
for domains in case_list:
answer = dns.resolver.resolve(domains, 'NS')
server = answer.target
dnsResolve.append(server)
# Write the dns Resolver names into a new csv file
with open(dnsList,'w', newline="") as r:
writers = csv.writer(r)
writers.writerows(caseList)
# Write the domains and dns resolvers to new output csv
with open(outputFile,'w', newline="") as f:
writer = csv.writer(f)
writer.writerow(fields)
writer.writerow(zip(case_list,caseList))
exit()
Thanks for any help
After a discussion with a co-worker, I was able to resolve my issue, and just for the sake of it, if anyone wants to use this code for a similar need (we use it for DMARC), I will post the whole code:
import dns.resolver
import csv
import os
from os.path import dirname, abspath
# Setting Variables
current_path = dirname(abspath(__file__))
domainFName = '{0}/domains.csv'.format(current_path)
outputFile = '{0}/output.csv'.format(current_path)
dnsList = '{0}/dnslist.csv'.format(current_path)
backupCSV = '{0}/backup-output.csv'.format(current_path)
case_list = []
dns_list = []
fields = ['Domains', 'DNS Resolvers']
csv_output = zip(case_list, dns_list)
domainAmount = 0
rd = 00
dnresolve = 00
part = 0
percentL = []
percents = [10,20,30,40,50,60,70,80,90,95,96,97,98,99]
percentList = []
floatingList = []
floatPart = []
x = 00
keyAzure = 'azure'
keyCSC = 'csc'
while x < .99:
x += .01
floatingList.append(x)
# THIS IS THE CODE FOR WRITING CSV FILES INTO LISTS - LABELED AS #1
print("FILES MUST BE CSV, WILL RETURN AN ERROR IF NOT. LEAVE OFF .CSV")
# Here we will gather the input of which csv file to use. If none are entered, it will use domains.csv
print("Enter your output file name (if blank will use default):")
UserFile = str(input("Enter your filename: ") or "domains")
fullFile = UserFile + '.csv'
domainFName = fullFile.format(current_path)
# Here will will specify the output file name. If the file is not created, it will create it
# If the user enters not data, the default will be used, output.csv
print("Enter your output file name (if blank will use default):")
UserOutput = str(input("Enter your filename: ") or "output")
fullOutput = UserOutput + '.csv'
outputFIle = fullOutput.format(current_path)
# Read in all domains from csv into list
with open(domainFName, 'r') as file:
for line in csv.reader(file):
case_list.append(line)
domainAmount += 1
print("Starting the resolver:")
print("You have " + str(domainAmount) + " Domains to resolve:")
# THIS IS THE END OF THE CODE FOR WRITING CSV FILES INTO LISTS - LABELED AS #1
# THE CODE BELOW IS WORKING FOR FINDING THE DNS RESOLVERS - LABELED AS #2
# Function for matching domains to DNS resolvers
def dnsResolver (domain):
try:
answers = dns.resolver.resolve(domain, 'NS')
for server in answers:
dns_list.append(server.target)
except:
dns_list.append("Did Not Resolve")
print("Now resolving domains to their DNS name:")
print("This will take a few minutes. Check out the progress bar for your status:")
print("I have resolved 0% Domains:")
# This code is for finding the percentages for the total amount of domains to find progress status
def percentageFinder(percent, whole):
return (percent * whole) / 100
def percentGetter(part, whole):
return (100 * int(part)/int(whole))
for x in percents:
percentList.append(int(percentageFinder(x,domainAmount)))
percentL = percentList
#End code for percentage finding
for firstdomain in case_list:
for domain in firstdomain:
dnsResolver(domain)
if dnsResolver != "Did Not Resolve":
rd += 1
else:
dnresolve += 1
# Using w+ to overwrite all Domain Names &
with open(dnsList,'w+', newline="") as r:
writers = csv.writer(r)
writers.writerows(dns_list)
# This is used for showing the percentage of the matching you have done
part += 1
if part in percentL:
total = int(percentGetter(part, domainAmount))
print("I Have Resolved {}".format(total) + "%" + " Domains:")
else:
pass
print("Resolving has completed. Statistics Below:")
print("------------------------------------------")
print("You had " + str(rd) + " domains that resolved.")
print("You had " + str(dnresolve) + " domains that did NOT resolve")
# THIS IS THE END OF THE WORKING CODE - LABELED AS #2
# Write the dns Resolver names into a new csv file
print("Now writing your domains & their DNS Name to an Output File:")
with open(outputFile,'w+', newline="\n") as f:
writer = csv.writer(f, dialect='excel')
writer.writerow(fields)
for row in csv_output:
writer.writerow(row)
print("Writing a backup CSV File")
# Using this to create a backup in case to contain all domains, and all resolvers
# If someone runs the script with a small list of domains, still want to keep a
# running list of everything in case any questions arise.
# This is done by using 'a' instead of 'w' or 'w+' done above.
with open(backupCSV,'w', newline="") as f:
writer = csv.writer(f, dialect='excel')
writer.writerow(fields)
for row in csv_output:
writer.writerow(row)
print("Your backup is now done processing. Exiting program")
# Sort the files by keyword, in this case the domain being azure or csc
for r in dns_list:
if keyAzure in r:
for x in keyAzure:
FileName = x
print(FileName)
exit()
I've been working on a python script that will scrape certain webpages.
The beginning of the script looks like this:
# -*- coding: UTF-8 -*-
import urllib2
import re
database = ''
contents = open('contents.html', 'r')
for line in contents:
entry = ''
f = re.search('(?<=a href=")(.+?)(?=\.htm)', line)
if f:
entry = f.group(0)
page = urllib2.urlopen('https://indo-european.info/pokorny-etymological-dictionary/' + entry + '.htm').read()
m = re.search('English meaning( )+\s+(.+?)</font>', page)
if m:
title = m.group(2)
else:
title = 'N/A'
This accesses each page and grabs a title from it. Then I have a number of blocks of code that test whether certain text is present in each page, here is an example of one:
abg = re.findall('\babg\b', page);
if len(abg) == 0:
abg = 'N'
else:
abg = 'Y'
Then, finally, still in the for loop, I add this information to the variable database:
database += '\n' + str('<F>') + str(entry) + '<TITLE="' + str(title) + '"><FQ="N"><SQ="N"><ABG="' + str(abg) + '"></F>'
Note that I have used str() for each variable because I was getting a "can't concatenate strings and lists" error for some reason.
Once the for loop is completed, I write the database variable to a file:
f = open('database.txt', 'wb')
f.write(database)
f.close()
When I run this in the command line, it times out or never completes running. Any ideas as to what might be causing the issue?
EDIT: I fixed it. It seems the program was getting slowed down by the fact that I was having the database variable store the result of each line's iteration through the loop. All I had to do to fix the issue was change the write function to happen during the for loop.
I am using python 2.4.4 (old machine, can't do anything about it) on a UNIX machine. I am extremely new to python/programming and have never used a UNIX machine before. This is what I am trying to do:
extract a single sequence from a FASTA file (proteins + nucleotides) to a temporary text file.
Give this temporary file to a program called 'threader'
Append the output from threader (called tempresult.out) to a file called results.out
Remove the temporary file.
Remove the tempresult.out file.
Repeat using the next FASTA sequence.
Here is my code so far:
import os
from itertools import groupby
input_file = open('controls.txt', 'r')
output_file = open('results.out', 'a')
def fasta_parser(fasta_name):
input = fasta_name
parse = (x[1] for x in groupby(input, lambda line: line[0] == ">"))
for header in parse:
header = header.next()[0:].strip()
seq = "\n".join(s.strip() for s in parse.next())
yield (header, '\n', seq)
parsedfile = fasta_parser(input_file)
mylist = list(parsedfile)
index = 0
while index < len(mylist):
temp_file = open('temp.txt', 'a+')
temp_file.write(' '.join(mylist[index]))
os.system('threader' + ' temp.txt' + ' tempresult.out' + ' structures.txt')
os.remove('temp.txt')
f = open('tempresult.out', 'r')
data = str(f.read())
output_file.write(data)
os.remove('tempresult.out')
index +=1
output_file.close()
temp_file.close()
input_file.close()
When I run this script I get the error 'Segmentation Fault'. From what I gather this is to do with me messing with memory I shouldn't be messing with (???). I assume it is something to do with the temporary files but I have no idea how I would get around this.
Any help would be much appreciated!
Thanks!
Update 1:
Threader works fine when I give it the same sequence multiple times like this:
import os
input_file = open('control.txt', 'r')
output_file = open('results.out', 'a')
x=0
while x<3:
os.system('threader' + ' control.txt' + ' tempresult.out' + ' structures.txt')
f = open('tempresult.out', 'r')
data = str(f.read())
output_file.write(data)
os.remove('result.out')
x += 1
output_file.close()
input_file.close()
Update 2: In the event that someone else gets this error. I forgot to close temp.txt before invoking the threader program.
I am quite new in python and I need your help.
I have a file like this:
>chr14_Gap_2
ACCGCGATGAAAGAGTCGGTGGTGGGCTCGTTCCGACGCGCATCCCCTGGAAGTCCTGCTCAATCAGGTGCCGGATGAAGGTGGT
GCTCCTCCAGGGGGCAGCAGCTTCTGCGCGTACAGCTGCCACAGCCCCTAGGACACCGTCTGGAAGAGCTCCGGCTCCTTCTTG
acacccaggactgatctcctttaggatggactggctggatcttcttgcagtccaaggggctctcaagagt
………..
>chr14_Gap_3
ACCGCGATGAAAGAGTCGGTGGTGGGCTCGTTCCGACGCGCATCCCCTGGAAGTCCTGCTCAATCAGGTGCCGGATGAAGGTGGT
GCTCCTCCAGGGGGCAGCAGCTTCTGCGCGTACAGCTGCCACAGCCCCTAGGACACCGTCTGGAAGAGCTCCGGCTCCTTCTTG
acacccaggactgatctcctttaggatggactggctggatcttcttgcagtccaaggggctctcaagagt
………..
One string as a tag and one string the dna sequence.
I want to calculate the number of the N letters and the number of the lower case letters and take the percentage.
I wrote the following script which works but I have a problem in printing.
#!/usr/bin/python
import sys
if len (sys.argv) != 2 :
print "Usage: If you want to run this python script you have to put the fasta file that includes the desert area's sequences as arument"
sys.exit (1)
fasta_file = sys.argv[1]
#This script reads the sequences of the desert areas (fasta files) and calculates the persentage of the Ns and the repeats.
fasta_file = sys.argv[1]
f = open(fasta_file, 'r')
content = f.readlines()
x = len(content)
#print x
for i in range(0,len(content)):
if (i%2 == 0):
content[i].strip()
name = content[i].split(">")[1]
print name, #the "," makes the print command to avoid to print a new line
else:
content[i].strip()
numberOfN = content[i].count('N')
#print numberOfN
allChar = len(content[i])
lowerChars = sum(1 for c in content[i] if c.islower())
Ns_persentage = 100 * (numberOfN/float(allChar))
lower_persentage = 100 * (lowerChars/float(allChar))
waste = Ns_persentage + lower_persentage
print ("The waste persentage is: %s" % (round(waste)))
#print ("The persentage of Ns is: %s and the persentage of repeats is: %s" % (Ns_persentage,lower_persentage))
#print (name + waste)
The thing is that it can print the tag in the first line and the waste variable in the second one like this:
chr10_Gap_18759
The waste persentage is: 52.0
How can I print it in the same line, tab separated?
eg
chr10_Gap_18759 52.0
chr10_Gap_19000 78.0
…….
Thank you very much.
You can print it with:
print name, "\t", round(waste)
If you are using python 2.X
I would make some modification to your code. There is the argparse module of python to manage the arguments from the command line. I would do something like this:
#!/usr/bin/python
import argparse
# To use the arguments
parser = argparse.ArgumentParser()
parser.add_argument("fasta_file", help = "The fasta file to be processed ", type=str)
args = parser.parse_args()
f= open(args.fasta_file, "r")
content = f.readlines()
f.close()
x = len(content)
for i in range(x):
line = content[i].strip()
if (i%2 == 0):
#The first time it will fail, for the next occasions it will be printed as you wish
try:
print bname, "\t", round(waste)
except:
pass
name = line.split(">")[1]
else:
numberOfN = line.count('N')
allChar = len(line)
lowerChars = sum(1 for c in content[i] if c.islower())
Ns_persentage = 100 * (numberOfN/float(allChar))
lower_persentage = 100 * (lowerChars/float(allChar))
waste = Ns_persentage + lower_persentage
# To print the last case you need to do it outside the loop
print name, "\t", round(waste)
You can also print it like the other answer with print("{}\t{}".format(name, round(waste)))
I am not sure about the use of i%2, Note that if the sequence uses and odd number of lines you'll will not get the name of the next sequence until the same event occurs. I would check if the line begin with ">" then use store the name, and sum the characters of the next line.
Don't print the name when (i%2 == 0), just save it in variable and print in the next iteration together with the percentage:
print("{0}\t{1}".format(name, round(waste)))
This method of string formatting (new in version 2.6) is the new standard in Python 3, and should be preferred to the % formatting described in String Formatting Operations in new code.
I've fixed the indentation and redundancy:
#!/usr/bin/python
"""
This script reads the sequences of the desert areas (fasta files) and calculates the percentage of the Ns and the repeats.
2014-10-05 v1.0 by Vasilis
2014-10-05 v1.1 by Llopis
2015-02-27 v1.2 by Cees Timmerman
"""
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("fasta_file", help="The fasta file to be processed.", type=str)
args = parser.parse_args()
with open(args.fasta_file, "r") as f:
for line in f.readlines():
line = line.strip()
if line[0] == '>':
name = line.split(">")[1]
print name,
else:
numberOfN = line.count('N')
allChar = len(line)
lowerChars = sum(1 for c in line if c.islower())
Ns_percentage = 100 * (numberOfN/float(allChar))
lower_percentage = 100 * (lowerChars/float(allChar))
waste = Ns_percentage + lower_percentage
print "\t", round(waste) # Note: https://docs.python.org/2/library/functions.html#round
Fed:
>chr14_Gap_2
ACCGCGATGAAAGAGTCGGTGGTGGGCTCGTTCCGACGCGCATCCCCTGGAAGTCCTGCTCAATCAGGTGCCGGATGAAGGTGGTGCTCCTCCAGGGGGCAGCAGCTTCTGCGCGTACAGCTGCCACAGCCCCTAGGACACCGTCTGGAAGAGCTCCGGCTCCTTCTTGacacccaggactgatctcctttaggatggactggctggatcttcttgcagtccaaggggctctcaagagt
>chr14_Gap_3
ACCGCGATGAAAGAGTCGGTGGTGGGCTCGTTCCGACGCGCATCCCCTGGAAGTCCTGCTCAATCAGGTGCCGGATGAAGGTGGTGCTCCTCCAGGGGGCAGCAGCTTCTGCGCGTACAGCTGCCACAGCCCCTAGGACACCGTCTGGAAGAGCTCCGGCTCCTTCTTGacacccaggactgatctcctttaggatggactggctggatcttcttgcagtccaaggggctctcaagagt
Gives:
C:\Python27\python.exe -u "dna.py" fasta.txt
Process started >>>
chr14_Gap_2 29.0
chr14_Gap_3 29.0
<<< Process finished. (Exit code 0)
Using my favorite Python IDE: Notepad++ with NppExec plugin.