work through a file in python

work through a file in python - python

I am new to python and with this task I'm trying to learn it. I'd like to get information out of a DNS zone file. The DNS file has the following entries:
something 600 A 192.168.123.45
600 A 192.168.123.46
someelse CNAME something
anotherone A 192.168.123.47
nextone CNAME anotherone
anotherone TXT ( "te asd as as d" )
The goal is to grab the hostnames and IF there is a coresponding TXT entry, I'd like to get the information for that entry as well.
So I started to just work through the file, if I would describe the entries, the Record type is either at [2] or [1] and right after it, I got the IP. So I have something like this for now:
for line in data:
word = line.split()
if len(word) > 2:
if "CNAME" == word[2]:
rectype = "CNAME"
arecord = word[3].replace('.domain.com.', '')
print rectype + " " + arecord
if "CNAME" == word[1]:
rectype = "CNAME"
arecord = word[2].replace('.domain.com.', '')
print rectype + " " + arecord
if "A" == word[2]:
rectype = "A"
print rectype + " " + word[3]
if "A" == word[1]:
rectype = "A"
print rectype + " " + word[2]
okay, so far so good.. but now if I like to get that corresponding TXT record, do I need to work through every line in the document for each line or is their any easier and more efficiernt way to do that?

Part of the beauty of Python is you can write code that reads like English and it uses intuitive operators.
You have a lot of duplicated code. In your example, we can see that anytime a "keyword" pops up, such as A or CNAME then you simply take the next token. Here, I used the in Python keyword which checks if an element is in a collection of some sort. This is a boolean, so if it returns true then I take the next element, ie, tokens[tokens.index(keyword) + 1].
Similarly, in can also be used for string and substring searches. I check to see if "TXT" is in the current line, and if it is, I assume you want everything after it? I use a splice operator to specify the range.
line[line.index("TXT") + 3:] means that I want everything in line after the index line.index("TXT") + 3.
KEYWORDS = ["CNAME", "A"]
for line in data:
tokens = line.split()
if len(tokens) > 2:
record = ""
for keyword in KEYWORDS:
if keyword in tokens:
record = keyword + " " + tokens[tokens.index(keyword) + 1]
if "TXT" in line:
txt_data = line[line.index("TXT") + 3:]
record += "TXT: " + txt_data
print record

I'd recommend you build a dictionary whose keys are identifiers ("anotherone" would be an example from your sample) and whose values are strings (or maybe a list of strings - I'm rusty on DNS and am not sure if multiples are possible).
As you encounter CNAME and A records, add the identifiers as keys into the dictionary and initialize their corresponding values as empty lists. Then when you hit a line with "TXT" in it, lookup that identifier in the dictionary and add the line as value.

Related

Trying multiplying numbers on a line starting with the word "size" with a constant variable across 181 text files

I have a folder of 181 text file, each containing numbers but I only need to multiply those on lines that have "size" by a constant variable such as 0.5, but I did achieve this with this: Search and replace math operations with the result in Notepad++
But what I am trying to do is without expression or quotation marks so that the rest of the community I am in can simply do the same without editing every file to meet the format needed to multiply each number.
For Example:
farmers = {
culture = armenian
religion = coptic
size = 11850
}
being multiplied by 0.5 to:
farmers = {
culture = armenian
religion = coptic
size = 5925
}
I tried making a python script but it did not work although I don't know much python:
import operator
with open('*.txt', 'r') as file:
data = file.readlines()
factor = 0.5
count = 0
for index, line in enumerate(data):
try:
first_word = line.split()[0]
except IndexError:
pass
if first_word == 'size':
split_line = line.split(' ')
# print(' '.join(split_line))
# print(split_line)
new_line = split_line
new_line[-1] = ("{0:.6f}".format(float(split_line[-1]) * factor))
new_line = ' '.join(new_line) + '\n'
# print(new_line)
data[index] = new_line
count += 1
elif first_word == 'text_scale':
split_line = line.split(' ')
# print(split_line)
# print(' '.join(split_line))
new_line = split_line
new_line[-1] = "{0:.2f}".format(float(split_line[-1]) * factor)
new_line = ' '.join(new_line) + '\n'
# print(new_line)
data[index] = new_line
count += 1
with open('*.txt', 'w') as file:
file.writelines(data)
print("Lines changed:", count)
So are there any solutions to this, I rather not make people in my community format every single file to work with my solution. Anything could work just that I haven't found a simple solution that is quick and easy for anyone to understand for those who use notepad++ or Sublime Text 3.

If you use EmEditor, you can use the Replace in Files feature of EmEditor. In EmEditor, select Replace in Files on the Search menu (or press Ctrl + Shift + H), and enter:
Find: (?<=\s)size(.*?)(\d+)
Replace with: \J "size\1" + \2 / 2
File Types: *.txt (or file extension you are looking for)
In Folder: (folder path you are searching in)
Set the Keep Modified Files Open and Regular Expressions options (and Match Case option if size always appears in lower case),
Click the Replace All button.
Alternatively, if you would like to use a macro, this is a macro for you (you need to edit the folder path):
editor.ReplaceInFiles("(?<=\\s)size(.*?)(\\d+)","\\J \x22size\\1\x22 + \\2 / 2","E:\\Test\\*.txt",eeFindReplaceCase | eeFindReplaceRegExp | eeReplaceKeepOpen,0,"","",0,0);
To run this, save this code as, for instance, Replace.jsee, and then select this file from Select... in the Macros menu. Finally, select Run Replace.jsee in the Macros menu.
Explanations:
\J in the replacement expression specifies JavaScript. In this example, \2 is the backreference from (\d+), thus \2 / 2 represents a matched number divided by two.
References: EmEditor How to: Replace Expression Syntax

Can two identical strings not be equal to each other in Python?

Below is the code in question:
def search():
name = enter.get()
print(name)
name = name.lower()
data = open("moredata.txt")
found = False
results = str()
for line in data:
record = line.split("|")
record[0] = str(record[0])
print("'" + name + "'" "---" + "'" + record[0] + "'")
if record[0] == name:
found = True
line = str(line)
results += line
results = str(results).lstrip(' ')
continue
else:
found = False
continue
if not found:
print("No results found")
else:
print("These items match your search query:\n")
print(results)
# print(results)
print('\n---------------------------------\n')
In the text file I am using, pieces of information are separated by '|' and the function splits each piece into an array (I think?) and compares just the first value to what I put in the search bar.
I have used other text files with the same exact function, it works just fine: it verifies that the two strings are equal and then displays the entire line of the text file corresponding to what I wanted. However, when I search "a" which should register as equal to "a" from the line a|a|a|a|a|a in my text file, it doesn't. I know this is going to bite me later if I don't figure it out and move on because it works in some cases.
The line
print("'" + name + "'" "---" + "'" + record[0] + "'")
results in
'a'---'a'
'a'---'a'
'a'---'b'
when compared to lines
a|a|a|a|a|a
a|a|a|a|a|a
b|b|b|b|b|b
There are no empty lines between results, and both variable types are str().

The continue in your if statement in the loop is what's causing the problem as it should be a break.
Your program is finding it, but it's then being overwritten by the last iteration. Once your program has found, you should either never set found back to false, or preferably just cease the iteration altogether.
I would wager the other files you've tested with all end with the name that you were looking for, and this one is causing a problem because the name you're looking for isn't at the end.
Additionally, though technically not a problem, the other continue under the else in the for loop isn't necessary.

Your found variable is being overwritten on each iteration of the loop.
Therefore it will only be True if the last result matches.
You don't actually need your found variable at all. If there are results, your results variable will have data in it and you can test for that.

python - matching string and replacing

I have a file i am trying to replace parts of a line with another word.
it looks like bobkeiser:bob123#bobscarshop.com:0.0.0.0.0:23rh32o3hro2rh2:234212
i need to delete everything but bob123#bobscarshop.com, but i need to match 23rh32o3hro2rh2 with 23rh32o3hro2rh2:poniacvibe , from a different text file and place poniacvibe infront of bob123#bobscarshop.com
so it would look like this bob123#bobscarshop.com:poniacvibe
I've had a hard time trying to go about doing this, but i think i would have to split the bobkeiser:bob123#bobscarshop.com:0.0.0.0.0:23rh32o3hro2rh2:234212 with data.split(":") , but some of the lines have a (:) in a spot that i don't want the line to be split at, if that makes any sense...
if anyone could help i would really appreciate it.

ok, it looks to me like you are using a colon : to separate your strings.
in this case you can use .split(":") to break your strings into their component substrings
eg:
firststring = "bobkeiser:bob123#bobscarshop.com:0.0.0.0.0:23rh32o3hro2rh2:234212"
print(firststring.split(":"))
would give:
['bobkeiser', 'bob123#bobscarshop.com', '0.0.0.0.0', '23rh32o3hro2rh2', '234212']
and assuming your substrings will always be in the same order, and the same number of substrings in the main string you could then do:
firststring = "bobkeiser:bob123#bobscarshop.com:0.0.0.0.0:23rh32o3hro2rh2:234212"
firstdata = firststring.split(":")
secondstring = "23rh32o3hro2rh2:poniacvibe"
seconddata = secondstring.split(":")
if firstdata[3] == seconddata[0]:
outputdata = firstdata
outputdata.insert(1,seconddata[1])
outputstring = ""
for item in outputdata:
if outputstring == "":
outputstring = item
else
outputstring = outputstring + ":" + item
what this does is:
extract the bits of the strings into lists
see if the "23rh32o3hro2rh2" string can be found in the second list
find the corresponding part of the second list
create a list to contain the output data and put the first list into it
insert the "poniacvibe" string before "bob123#bobscarshop.com"
stitch the outputdata list back into a string using the colon as the separator
the reason your strings need to be the same length is because the index is being used to find the relevant strings rather than trying to use some form of string type matching (which gets much more complex)
if you can keep your data in this form it gets much simpler.
to protect against malformed data (lists too short) you can explicitly test for them before you start using len(list) to see how many elements are in it.
or you could let it run and catch the exception, however in this case you could end up with unintended results, as it may try to match the wrong elements from the list.
hope this helps
James
EDIT:
ok so if you are trying to match up a long list of strings from files you would probably want something along the lines of:
firstfile = open("firstfile.txt", mode = "r")
secondfile= open("secondfile.txt",mode = "r")
first_raw_data = firstfile.readlines()
firstfile.close()
second_raw_data = secondfile.readlines()
secondfile.close()
first_data = []
for item in first_raw_data:
first_data.append(item.replace("\n","").split(":"))
second_data = []
for item in second_raw_data:
second_data.append(item.replace("\n","").split(":"))
output_strings = []
for item in first_data:
searchstring = item[3]
for entry in second_data:
if searchstring == entry[0]:
output_data = item
output_string = ""
output_data.insert(1,entry[1])
for data in output_data:
if output_string == "":
output_string = data
else:
output_string = output_string + ":" + data
output_strings.append(output_string)
break
for entry in output_strings:
print(entry)
this should achieve what you're after and as prove of concept will print the resulting list of stings for you.
if you have any questions feel free to ask.
James
Second edit:
to make this output the results into a file change the last two lines to:
outputfile = open("outputfile.txt", mode = "w")
for entry in output_strings:
outputfile.write(entry+"\n")
outputfile.close()

Print multiple variables in one line using python

I need some assistance with a python script. I need to search a dhcpd file for host entires, their MAC and IP, and print it in one line. I am able to locate the hostname and IP address but cannot figure out how to get the variables out of the if statement to put in one line. Any suggestions, the code is below:
#!/usr/bin/python
import sys
import re
#check for arguments
if len(sys.argv) > 1:
print "usage: no arguments required"
sys.exit()
else:
dhcp_file = open("/etc/dhcp/dhcpd.conf","r")
for line in dhcp_file:
if re.search(r'\bhost\b',line):
split = re.split(r'\s+', line)
print split[1]
if re.search(r'\bhardware ethernet\b',line):
ip = re.split(r'\s+',line)
print ip[2]
dhcp_file.close()

There are a number of ways that you could go about this. The simplest is probably to initialize an empty string before the if statements. Then, instead of printing split[1] and ip[2], concatenate them to the empty string and print that afterwards. So it would look something like this:
printstr = ""
if re.search...
...
printstr += "Label for first item " + split[1] + ", "
if re.search...
...
printstr += "Label for second item " + ip[2]
print printstr

In the general case, you can give comma-separated values to print() to print them all on one line:
entries = ["192.168.1.1", "supercomputer"]
print "Host:", entries[0], "H/W:", entries[1]
In your particular case, how about adding the relevant entries to a list and then printing that list at the end?
entries = []
...
entries.append(split[1])
...
print entries
At this point you may want to join the 'entries' you've collected into a single string. If so, you can use the join() method (as suggested by abarnert):
print ' '.join(entries)
Or, if you want to get fancier, you could use a dictionary of "string": "list" and append to those lists, depending on they key string (eg. 'host', 'hardware', etc...)

You can also use a flag, curhost, and populate a dictionary:
with open("dhcpd.conf","r") as dhcp_file:
curhost,hosts=None,{}
for line in dhcp_file:
if curhost and '}' in line: curhost=None
if not curhost and re.search(r'^\s*host\b',line):
curhost=re.split(r'\s+', line)[1]
hosts[curhost] = dict()
if curhost and 'hardware ethernet' in line:
hosts[curhost]['ethernet'] = line.split()[-1]
print hosts

Python: compare list items to dictionary keys twice in one for loop?

I'm stuck in a script I have to write and can't find a way out...
I have two files with partly overlapping information. Based on the information in one file I have to extract info from the other and save it into multiple new files.
The first is simply a table with IDs and group information (which is used for the splitting).
The other contains the same IDs, but each twice with slightly different information.
What I'm doing:
I create a list of lists with ID and group informazion, like this:
table = [[ID, group], [ID, group], [ID, group], ...]
Then, because the second file is huge and not sorted in the same way as the first, I want to create a dictionary as index. In this index, I would like to save the ID and where it can be found inside the file so I can quickly jump there later. The problem there, of course, is that every ID appears twice. My simple solution (but I'm in doubt about this) is adding an -a or -b to the ID:
index = {"ID-a": [FPos, length], "ID-b": [FPOS, length], "ID-a": [FPos, length], ...}
The code for this:
for line in file:
read = (line.split("\t"))[0]
if not (read+"-a") in indices:
index = read + "-a"
length = len(line)
indices[index] = [FPos, length]
else:
index = read + "-b"
length = len(line)
indices[index] = [FPos, length]
FPos += length
What I am wondering now is if the next step is actually valid (I don't get errors, but I have some doubts about the output files).
for name in table:
head = name[0]
## first round
(FPos,length) = indices[head+"-a"]
file.seek(FPos)
line = file.read(length)
line = line.rstrip()
items = line.split("\t")
output = ["#" + head +" "+ "1:N:0:" +"\n"+ items[9] +"\n"+ "+" +"\n"+ items[10] +"\n"]
name.append(output)
##second round
(FPos,length) = indices[head+"-b"]
file.seek(FPos)
line = file.read(length)
line = line.rstrip()
items = line.split("\t")
output = ["#" + head +" "+ "2:N:0:" +"\n"+ items[9] +"\n"+ "+" +"\n"+ items[10] +"\n"]
name.append(output)
Is it ok to use a for loop like that?
Is there a better, cleaner way to do this?

Use a defaultdict(list) to save all your file offsets by ID:
from collections import defaultdict
index = defaultdict(list)
for line in file:
# ...code that loops through file finding ID lines...
index[id_value].append((fileposn,length))
The defaultdict will take care of initializing to an empty list on the first occurrence of a given id_value, and then the (fileposn,length) tuple will be appended to it.
This will accumulate all references to each id into the index, whether there are 1, 2, or 20 references. Then you can just search through the given fileposn's for the related data.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

work through a file in python - python

Related

Trying multiplying numbers on a line starting with the word "size" with a constant variable across 181 text files

Can two identical strings not be equal to each other in Python?

python - matching string and replacing

Print multiple variables in one line using python

Python: compare list items to dictionary keys twice in one for loop?

Categories

Resources