Find an replace list value based on other list values - python

I am working on an assignment for school and have hit a wall.
The challenge is:
You will be passed the filename P, firstname F, lastname L, and a new birthday B.
Load the fixed length record file in P, search for F,L in the first and change birthday to B.
Then save the file.
I have managed to read from the file and split the data into the desired entries in a list.
However, my issue arises when I am searching for the first and last name in the list.
The first and last name being passed is Adam Smith, but there is also an Adam Smithers in the list who is found by my code as well.
When I attempt to replace the desired element of the list, it is replacing it for both Smith and Smithers.
We cannot use regular expressions for the assignment, so I am at a bit of a loss for how to approach this and find an exact match for the last name while ignoring the other last name that contains Smith without using regex.
Here's my code:
import sys
P= sys.argv[1]
F= sys.argv[2]
L= sys.argv[3]
B= sys.argv[4]
filePath = P
firstName = F
lastName = L
newBirthday = B
records = []
file1 = open(filePath, 'r')
fileContent = file1.read()
while len(fileContent) > 0:
record = []
fname = fileContent[0:16]
lname = fileContent[16:32]
bday = fileContent[32:40]
record = [fname,lname,bday]
records.append(record)
fileContent = fileContent[40:]
for record in records:
if firstName in record[0] and lastName in record[1]:
record[2] = newBirthday
file1.close()
file2 = open(filePath, 'w')
for record in records:
file2.write(record[0])
file2.write(record[1])
file2.write(record[2])
file2.close()
Any ideas or hints that someone would be able to provide will be most appreciated.
Thanks!
Edit:
Icewine was kind enough to suggest using the below instead:
if firstName == record[0] and lastName == record[1]:
However, when I try that, it does not find any matching records.
I believe this is because there as blank spaces after each name to make up 16 characters in each name, giving a fixed length for the name. So when I'm using the == operator, it's not finding an exact match because there are also blank spaces in the name, so it's not an exact match.

Use == instead of in
if firstName == record[0] and lastName == record[1]:
EDIT: try this
Removes whitespace from end of string
if firstName.rstrip() == record[0].rstrip() and lastName.rstrip() == record[1].rstrip()
or
Removes whitespace from start and end of string
if firstName.strip() == record[0].strip() and lastName.strip() == record[1].strip()

Trim the whitespaces in the string and then use == for matching. This will give the expected output. Sample code
for record in records:
if firstName == record[0].strip() and lastName == record[1].strip():
record[2] = newBirthday

Either pad spaces onto the passed data to match what's in the file:
firstName = f'{F:<16}'
or strip the extra spaces from the file contents to match the passed data:
fname = fileContent[0:16].strip()
then you can simply compare the names, either keeping the in operator or using ==.

An easy solution to this would be to sort the list of names and values (in this case, birthdays (I suggest you use a dictionary for this purpose) and then perform a search operation that finds only the first occurrence of an element. That way only Adam Smith is selected.
You can then further deal with duplicates by checking if the next element is the same as the one you found.
i.e. if the first occurrence is i, check if i+1 == i, and update all of the duplicates. (this may be what you need to do for your exercise, though it doesn't make sense to update other people's birthdays like this.)
Hopefully this helps :)

Here are a few improvements I could suggest to your code:
make a function or at least put the code inside if __name__ == '__main__':. You can google why.
I suggest using with open(file_path, 'r') as f: to read and write files; it looks cleaner and you won't forget to close the file.
remove surounding spaces before comparing to your input; I used line[:16].strip().
you read in specified widths, so don't forget to write back into the file with those same widths. This code ensures that each part of the record has the specified width:
f.write('{:16s}{:16s}{:8s}'.format(*record))
Here is my version of the code:
import sys
if __name__ == '__main__':
file_path = sys.argv[1]
first_name = sys.argv[2]
last_name = sys.argv[3]
new_birth_date = sys.argv[4]
record_list = []
with open(file_path, 'r') as f:
while True:
line = f.read(40) # read 40 chars at the time
if len(line) == 0:
# the end of the file was reached
break
record = [
line[:16].strip(),
line[16:32].strip(),
line[32:].strip(),
]
if first_name == record[0] and last_name == record[1]:
record[2] = new_birth_date
record_list.append(record)
with open(file_path, 'w') as f:
for record in record_list:
f.write('{:16s}{:16s}{:8s}'.format(*record))
Does this work for you?
Note: this code does not split lines by newlines, but it reads 40 chars at the time, so you could end up with a newline char inside those 40 chars.
I did it this way because the code in the question seems to do something similar.

Related

i am having problem understanding the differences in [word] and ["word"] in python

teams = []
# TODO: Read teams into memory from file
with open(sys.argv[1], "r") as file:
reader = csv.DictReader(file)
for team in reader:
team[rating] = int(team[rating])
# It works if i put team["rating"] but why?
teams.append(team)
i wanted to change the rating of teams from the file into integer from string. In a previous problem i didnt need to put the "" inside of the [] and i cant fogure out why.
the precious problem is
results = {}
for subsequence in subsequences:
results[subsequence] = longest_match(DNA_sequence, subsequence)
# TODO: Check database for matching profiles
# Check if the value in subsequence i.e. longest nucleotide for each person is equal to that of the result
for person in database:
match = 0
for subsequence in subsequences:
# The value of subsequence of person is stored as string so change it to integer.
if int(person[subsequence]) == results[subsequence]:
match += 1
Maybe this modification helps to clarify what is going on:
teams = []
with open(sys.argv[1], "r") as file:
reader = csv.DictReader(file)
for team in reader:
key = "rating" # choose a key
team[key] = int(team[key])
teams.append(team)
The line key = "rating" mimics what your second piece of code does: assign a value to the variable key. Accessing the dict with team[key] is in this case equivalent to team['rating']

reading a file and parse them into section

okay so I have a file that contains ID number follows by name just like this:
10 alex de souza
11 robin van persie
9 serhat akin
I need to read this file and break each record up into 2 fields the id, and the name. I need to store the entries in a dictionary where ID is the key and the name is the satellite data. Then I need to output, in 2 columns, one entry per line, all the entries in the dictionary, sorted (numerically) by ID. dict.keys and list.sort might be helpful (I guess). Finally the input filename needs to be the first command-line argument.
Thanks for your help!
I have this so far however can't go any further.
fin = open("ids","r") #Read the file
for line in fin: #Split lines
string = str.split()
if len(string) > 1: #Seperate names and grades
id = map(int, string[0]
name = string[1:]
print(id, name) #Print results
We need sys.argv to get the command line argument (careful, the name of the script is always the 0th element of the returned list).
Now we open the file (no error handling, you should add that) and read in the lines individually. Now we have 'number firstname secondname'-strings for each line in the list "lines".
Then open an empty dictionary out and loop over the individual strings in lines, splitting them every space and storing them in the temporary variable tmp (which is now a list of strings: ('number', 'firstname','secondname')).
Following that we just fill the dictionary, using the number as key and the space-joined rest of the names as value.
To print the dictionary sorted just loop over the list of numbers returned by sorted(out), using the key=int option for numerical sorting. Then print the id (the number) and then the corresponding value by calling the dictionary with a string representation of the id.
import sys
try:
infile = sys.argv[1]
except IndexError:
infile = input('Enter file name: ')
with open(infile, 'r') as file:
lines = file.readlines()
out = {}
for fullstr in lines:
tmp = fullstr.split()
out[tmp[0]] = ' '.join(tmp[1:])
for id in sorted(out, key=int):
print id, out[str(id)]
This works for python 2.7 with ASCII-strings. I'm pretty sure that it should be able to handle other encodings as well (German Umlaute work at least), but I can't test that any further. You may also want to add a lot of error handling in case the input file is somehow formatted differently.
Just a suggestion, this code is probably simpler than the other code posted:
import sys
with open(sys.argv[1], "r") as handle:
lines = handle.readlines()
data = dict([i.strip().split(' ', 1) for i in lines])
for idx in sorted(data, key=int):
print idx, data[idx]

Deleting a line in file containing exact string (Python)

import re
print "List of names:"
f=open('names.txt','r') #look below
lines = f.readlines()
for line in lines:
info = line.split('|')
names = info[0]
print names
name = raw_input("Enter the name of the person you want to delete: ")
f.close()
f = open('names.txt','w')
for line in lines:
if not re.match(name,line):
f.write(line)
break
print "That person doesn't exist!"
names.txt :
John|22|Nice
Johnny|55|Better than John
Peter|25|The worst
So, when you run the program, list of names is printed and then you have to enter the name of the person whose line you want to delete.
The problem is, if I enter John, it deletes the first and the second line, but I want only the first line to be deleted. My guess is that I'm not doing re.match() right. I tried re.match(name,names) but that doesn't work either.
So, the string you enter into name should be compared to the strings in names , and if there's an exact match, it should delete the line that has name as the first element.
I found a lot of similar problems but my function contains everything combined and I can't figure it out.
re.match matches the string at the beginning of the string. You may add word delimeter to your expression
name + r'\b'
but in your case, re is an overkill, simple comparison will do
name == line.partition('|')[0]
BTW, if you need to split only once at the beginning - or end - partition and rpartition functions are better options
EDIT
Timing:
>>> timeit('line.startswith(name+"|")', 'line="John|22|Nice";name="John"')
0.33100164101452345
>>> timeit('line.partition("|")[0] == name', 'line="John|22|Nice";name="John"')
0.2520693876228961
>>> timeit('re.match(name+r"\b", line)', 'import re; line="John|22|Nice";name="John"')
1.8754496594662555
>>> timeit('line.split("|")[0] == name', 'line="John|22|Nice";name="Jonny"')
0.511219799415926
Especially for Padraick
>>> timeit('line.partition("|")[0] == name', 'line="John|22|Nice";name="John"')
0.27333073995099083
>>> timeit('line.split("|", 1)[0] == name', 'line="John|22|Nice";name="John"')
0.5120651608158937
Frankly - I am surprised myself
with open("in.txt") as f:
lines = f.readlines()
name = raw_input("Enter the name of the person you want to delete: ").lower() + "|"
ln = len(name)
for ind, line in enumerate(lines):
if name == line[:ln].lower():
lines[ind:ind+1] = []
break
with open("in.txt","w") as out:
out.writelines(lines)
If you want to remove all John's etc.. don't break just keep looping and writing, as it stands we erase the first "John" we find. The fastest way is to just index.

Write dictionary values (list) to output file - Python

I am trying to print values(a list) from a dictionary to the third column of another file that contains the dictionary key in the first column. I would like the list of values to print in the third column of the output file with a space separating each value. I know my problem lies somewhere in the fact that Python can't write things that aren't strings and that the list is separated by a "," but I am new to programming and am not sure how to accomplish this - any help is much appreciated, thanks!
The GtfFile.txt is a 10 column file (sep = '\t') which I generate the dictionary from... using the Gene name as the key and the Term (functional category) as the values. Several genes have more than one Term attributed to them and are repeated as new lines for each term. There are varying numbers of genes associated with each Term as well and thus I generate a list as the key for each Term. THIS PART OF MY SCRIPT APPEARS TO BE WORKING AS I WOULD LIKE IT TO!
The FuncEnr_terms.txt is a 2 column file (sep ='\t') which consists of a Term in the first column and a description of the term in the 2 column. My desired output file would be to duplicate this file with a third column that contains the Genes associated with the Term separated by a space. WRITING THIS TO THE OUTPUT FILE IS WHERE MY PROBLEM LIES.
Below is my code:
#!/usr/bin/env python
import sys
from collections import defaultdict
if len(sys.argv) != 4 :
print("Usage: GeneSetFileGen.py <GtfFile.txt> <FuncEnr_terms.txt> <OutputFile.txt>")
sys.exit(0)
OutFileName = sys.argv[3]
OutFile = open(OutFileName, 'w')
TermGeneDic = defaultdict(list)
with open(sys.argv[1], 'r') as f :
for line in f :
line = line.strip()
line = line.split('\t')
Term = line[8]
Gene = line[0]
TermGeneDic[Term].append(Gene)
#write output file
with open(sys.argv[2], 'r') as f :
for line in f :
line = line.strip()
Term, Des = line.split('\t')
OutFile.write(Term + '\t' + Des + '\t' + str(TermGeneDic[Term]) + '\n')
OutFile.close
If I understand what you require correctly then what you need is to replace this expression:
str(TermGeneDic[Term])
with something like:
" ".join(TermGeneDic[Term])
A couple of pointers on your code: your code will be incomprehensible to anyone else if you don't follow pep 8 conventions fairly closely. This means, no CamelCase except for class names.
Secondly, reusing variable is generally bad, and a sign that you should just chain up those method calls. It's especially bad when you have a variable like line whose type you actually change.
Thirdly, brackets (parentheses) are mandatory for calling a method or function.
Fourthly, you join the elements of a list into a string with '\t'.join(termgenes[term])
Finally, use templating to generate long strings - it ends up being easier to work with.
Your code should look like:
import sys
from collections import defaultdict
if len(sys.argv) != 4 :
print("Usage: GeneSetFileGen.py <GtfFile.txt> <FuncEnr_terms.txt> <OutputFile.txt>")
sys.exit(0)
progname,gtffilename,funcencrfilename,outfilename = sys.argv
termgenes = defaultdict(list)
with open(gtffilename, 'r') as gtf :
for line in gtf:
linefields = line.strip().split('\t')
term, gene = linefields[8],linefields[0]
termgenes[term].append(gene)
#write output file
with open(funcencrfilename, 'r') as funcencrfile, open(outfilename, 'w') as outfile:
for line in funcencrfile:
term, des = line.strip().split('\t')
outfile.write('%s\t%s%s\n' % term,des,'\t'.join(termgenes[term]))

How to read file line by line in file?

I'm trying to learn python and I'm doing a problem out of a book but I'm stuck on one question. It asks me to read a file and each line contains an 'a' or a 's' and basically I have a total which is 500. If the line contains an 'a' it would add the amount next to it for example it would say "a 20" and it would add 20 to my total and for s it would subtract that amount. In the end I'm supposed to return the total after it made all the changes. So far I got
def NumFile(file:
infile = open(file,'r')
content = infile.readlines()
infile.close()
add = ('a','A')
subtract = ('s','S')
after that I'm completely lost at how to start this
You need to iterate over the lines of the file. Here is a skeleton implementation:
# ...
with open(filename) as f:
for line in f:
tok = line.split()
op = tok[0]
qty = int(tok[1])
# ...
# ...
This places every operation and quantity into op and qty respectively.
I leave it to you to fill in the blanks (# ...).
A variation might be
f = open('myfile.txt','r')
lines = f.readlines()
for i in lines:
i = i.strip() # removes new line characters
i = i.split() # splits a string by spaces and stores as a list
key = i[0] # an 'a' or an 's'
val = int( i[1] ) # an integer, which you can now add to some other variable
Try adding print statements to see whats going on. The cool thing about python is you can stack multiple commands in a single line. Here is an equivalent code
for i in open('myfile.txt','r').readlines():
i = i.strip().split()
key = i[0]
val = int (i[1])

Categories

Resources