Having trouble with my python program

Having trouble with my python program - python

I'm having a bit of trouble. So, for my assignment, my teacher wants us to read in data and output the data into another file. Now, the data we are reading in are Students name(Line one), and their grades(Line 2). Now, he wants us to read them in, then write them into another file. Write them in two lines. Line one, being the students name, and line two, being their average. Then, write the averages into a list and run the whole list through mean, median, and standard deviation. Here's an example of some data from the file.
Aiello,Joseph
88 75 80
Alexander,Charles
90 93 100 98
Cambell,Heather
100 100
Denniston,Nelson
56 70 65
So, as you see, it's last name first, separated by a comma, then first. Then, on line two, their grades. He wants us to find the average of them and then write them under the students name. That's the part I'm having trouble on. I know how to find an average. Add the grades up, then divide by the number of grades they got. But how do I put that into python? Can anyone help? Also, I already have a mean, median, standard deviation program. How would I put the averages I get from the first part into a list, then putting the whole list through the mean, median, standard devation program.And back to my original question. Is there anything wrong with what I have so far? Anything I need to add/change? Here's my code.
def main():
input1 = open('StudentGrades.dat', 'r')
output = open('StudentsAvg', 'w')
for nextLine in input1:
output.write(nextLine)
list1 = nextLine.split()
count = int(list1[3])
for p in range(count):
nextLine = input1.readlin()
output.write(nextLine)
list2 = nextLine.split()
name = int(list2[1])
grades = list2[2]
pos = grades.index(grade)
avg =

It seems like there's a few problems here. The first is that what everything you're reading from the file is a string, not a number. Secondly, you should probably be doing all of this within the same for loop wherein you read the lines. (One more point - use the with statement to allow your file objects to be automatically destructed when you're done with them.) So, you could modify your code as follows:
def main():
with open('StudentGrades.dat', 'r') as input1, open('StudentsAvg.txt', 'w') as output:
counter = 0
student_name = ''
for nextLine in input1:
if counter % 2 == 0:
student_name = nextLine
else:
grades = [int(x) for x in nextLine.split()]
avg = sum(grades) / len(grades)
print(student_name, file=output)
print(str(avg), file=output)
counter += 1
Note that print(str, file) is the current, preferred method for writing to a file.

Some improvements made to the original code:
def averagMarksCalculator(): # Learn to name your functions "Meaningfully"
# Using with clause - Learn to love it as much as you can!
with open('StudentGrades.dat') as input1, open('StudentsAvg.txt', 'w') as output:
for nextLine in input1:
strToWrite = nextLine; # Write student's name
output.write(nextLine) # Print student name
list1 = (input1.readline()).split() # split them into individual strings
avg = 0 # initialise
list1 = [int(x) for x in list1]
avg = sum(list1)/len(list1)
output.write("Average marks........"+str(avg)+"\r\n") # avg marks of student
input1.close()
output.close()
Note that the "\r\n" is to make sure you have a line gap after a student's name and average marks printed on the result file. If you don't need the empty new line as a separator, please use "\r" only.

Related

How to solve an error with improper alphabetical comparison

I have to write a program that first reads in the name of an input file and then reads the input file using the file.readlines() method. The input file contains an unsorted list of number of seasons followed by the corresponding TV show. Program puts the contents of the input file into a dictionary where the number of seasons are the keys, and a list of TV shows are the values (since multiple shows could have the same number of seasons). Sorts the dictionary by key (least to greatest) and output the results to a file named output_keys.txt, separating multiple TV shows associated with the same key with a semicolon (;). Sorts the dictionary by values (alphabetical order), and outputs the results to a file named output_titles.txt. So if my input file is "file1.txt" and the contents of that file are:
20
Gunsmoke
30
The Simpsons
10
Will & Grace
14
Dallas
20
Law & Order
12
Murder, She Wrote
The file output_keys.txt should contain:
10: Will & Grace
12: Murder, She Wrote
14: Dallas
20: Gunsmoke; Law & Order
30: The Simpsons
And the file output_title.txt contains:
Dallas
Gunsmoke
Law & Order
Murder, She Wrote
The Simpsons
Will & Grace
My code works perfectly fine and my assignment grades it fine except for the part with the "output_titles.txt" What I wrote in code doesn't put it in alphabetical order for it and I don't know where to go from here.
My code is:
inputFilename = input()
keysFilename = 'output_keys.txt'
titlesFilename = 'output_titles.txt'
shows = {}
with open(inputFilename) as inputFile:
showData = inputFile.readlines()
record_count = int(len(showData) / 2)
for i in range(record_count):
seasons = int(showData[2 * i].strip())
showName = showData[2 * i + 1].strip()
if seasons in shows:
shows[seasons].append(showName)
else:
shows[seasons] = [showName]
with open(keysFilename, 'w') as keysFile:
for season in sorted(shows):
keysFile.write(str(season) + ': ')
keysFile.write('; '.join(shows[season]) + '\n')
with open(titlesFilename, 'w') as titlesFile:
for show_list in sorted(shows.values()):
for show in show_list:
titlesFile.write(show + "\n")
I've attached a picture of the problem I get notified of:1
What should I do to solve this specifically?

The problem here is that shows.values() iterates over lists, not strings, so the sort doesn't quite work as you'd like. You could amalgamate these to a single list, but equally you could retain that list of show names in the first place as you read them in; so your initial interpretation loop would become:
allshows = [] # also collect just names
for i in range(record_count):
seasons = int(showData[2 * i].strip())
showName = showData[2 * i + 1].strip()
allshows.append(showName) # collect for later output
if seasons in shows:
shows[seasons].append(showName)
else:
shows[seasons] = [showName]
allshows.sort() # ready for output
and then output would be a simple iteration over this extra list.

That because you sorted a list of string lists. Each sublist corresponds to a different number of shows, and sorting the big lists, does not sort sublists. Just make one big plain list of show names and sort it. Try, for instance
with open(titlesFilename, 'w') as titlesFile:
for show in sorted(sum(shows.values(), []):
titlesFile.write(show + "\n")
I used the sum since it is succinct and intuitive, yet it might be a terribly slow considering amount of tv programming today. For greatest efficiency use itertools.chain or al good comprehension
sorted((show for show_titles in shows for show in show_titles.values())). Iterating over the list of list was discussed before many times, e.g. Concatenation of many lists in Python pick any method you like

This is the correct code for any future inquiries. I'm currently taking IT-140 and this passed all tests. If you follow the pseudocode line for line in the module videos, you'll easily get this.
file_name = input()
user_file = open(str(file_name))
output_list = user_file.readlines()
my_dict = {}
show_list = []
show_list_split = []
for i in range(len(output_list)):
temp_list = []
list_object = output_list[i].strip('\n')
if (i + 1 < len(output_list) and (i % 2 == 0)):
if int(list_object) in my_dict:
my_dict[int(list_object)].append(output_list[i + 1].strip('\n'))
else:
temp_list.append(output_list[i + 1].strip('\n'))
my_dict[int(list_object)] = temp_list
my_dict_sorted_by_keys = dict(sorted(my_dict.items()))
for x in my_dict.keys():
show_list.append(my_dict[x])
for x in show_list:
for i in x:
show_list_split.append(i)
show_list_split = sorted(show_list_split)
f = open('output_keys.txt', 'w')
for key, value in my_dict_sorted_by_keys.items():
f.write(str(key) + ': ')
for item in value[:-1]:
f.write(item + '; ')
else:
f.write(value[-1])
f.write('\n')
f.close()
f = open('output_titles.txt', 'w')
for item in show_list_split:
f.write(item + '\n')
f.close()

How to parse letter by letter and make a list with Python?

I have a text file I am attempting to parse. Fairly new to Python.
It contains an ID, a sequence, and frequency
SA1 GDNNN 12
SA2 TDGNNED 8
SA3 VGGNNN 3
Say the user wants to compare the frequency of the first two sequences. They would input the ID number. I'm having trouble figuring out how I would parse with python to make a list like
GD this occurs once in the two so it = 12
DN this also occurs once =12
NN occurs 3 times = 12 + 12 + 8 =32
TD occurs once in the second sequence = 8
DG ""
NE ""
ED ""
What do you recommend to parse letter by letter? In a sequence GD, then DN, then NN (without repeating it in the list), TD.. Etc.?
I currently have:
#Read File
def main():
file = open("clonedata.txt", "r")
lines = file.readlines()
file.close()
class clone_data:
def __init__(id, seq, freq):
id.seq = seq
id.freq = freq
def myfunc(id)
id = input ("Input ID number to see frequency: ")
for line in infile:
line = line.strip().upper()
line.find(id)
#print('y')

I'm not entirely sure from the example, but it sounds like you're trying to look at each line in the file and determine if the ID is in a given line. If so, you want to add the number at the end of that line to the current count.
This can be done in Python with something like this:
def get_total_from_lines_for_id(id_string, lines):
total = 0 #record the total at the end of each line
#now loop over the lines searching for the ID string
for line in lines:
if id_string in line: #this will be true if the id_string is in the line and will only match once
split_line = line.split(" ") #split the line at each space character into an array
number_string = split_line[-1] #get the last item in the array, the number
number_int = int(number_string) #make the string a number so we can add it
total = total + number_int #increase the total
return total

I'm honestly not sure what part of that task seems difficult to you, in part because I'm not sure what exactly is the task you're trying to accomplish.
Unless you expect the datafile to be enormous, the simplest way to start would be to read it all into memory, recording the id, sequence and frequency in a dictionary indexed by id: [Note 1]
with open('clonedata.txt') as file:
data = { id : (sequence, int(frequency))
for id, sequence, frequency in (
line.split() for line in file)}
With the sample data provided, that gives you: (newlines added for legibility)
>>> data
{'SA1': ('GDNNN', 12),
'SA2': ('TDGNNED', 8),
'SA3': ('VGGNNN', 3)}
and you can get an individual sequence and frequency with something like:
seq, freq = data['SA2']
Apparently, you always want to count the number of digrams (instances of two consecutive letters) in a sequence of letters. You can do that easily with collections.Counter: [Note 2]
from collections import Counter
# ...
seq, freq = data['SA1']
Counter(zip(seq, seq[1:]))
which prints
Counter({('N', 'N'): 2, ('G', 'D'): 1, ('D', 'N'): 1})
It would probably be most convenient to make that into a function:
def count(seq):
return Counter(zip(seq, seq[1:]))
Also apparently, you actually want to multiply the counted frequency by the frequency extracted from the file. Unfortunately, Counter does not support multiplication (although you can, conveniently, add two Counters to get the sum of frequencies for each key, so there's no obvious reason why they shouldn't support multiplication.) However, you can multiply the counts afterwards:
def count_freq(seq, freq):
retval = count(seq)
for digram in retval:
retval[digram] *= freq
return retval
If you find tuples of pairs of letters annoying, you can easily turn them back into strings using ''.join().
Notes:
That code is completely devoid of error checking; it assumes that your data file is perfect, and will throw an exception for any line with two few elements, including blank lines. You could handle the blank lines by changing for line in file to for line in file if line.strip() or some other similar test, but a fully bullet-proof version would require more work.)
zip(a, a[1:]) is the idiomatic way of making an iterator out of overlapping pairs of elements of a list. If you want non-overlapping pairs, you can use something very similar, using the same list iterator twice:
def pairwise(a):
it = iter(a)
return zip(it, it)
(Or, javascript style: pairwise = lambda a: (lambda it:zip(it, it))(iter(a)).)

Dividing an array in a for loop?

I have been given a task by my teacher and one of the questions wants me to divide everything in the array by 26.22(A full marathon). I have been working on this all day and am totally stuck could someone please show me how to make this work?
this is what I have so far
import string
forename = []
surname = []
distance = []
farthest_walk = []
marathon = []
#Opening the text file and sorting variables
data = open("members.txt","r")
for line in data:
value = line.split(',')
forename.append(value[0])
surname.append(value[1])
distance.append(value[2])
#Closing the text file
data.close()
Results = open("Results.txt","w+")
Results.write("The number of whole marathons walked be each member is:\n")
for count in range(len(distance)):
if float(distance[count])/ 26.22 = temp:
marathon.append
Results.write(forename[count]+":")
Results.write(surname[count]+":")
Results.write(marathon[count])
Results.close()
It is supposed to end up as Forename, Surname, WholeMarathosRun but I don't see how it could get there.

You almost got there.
For each name, you need to compute how many marathons he ran, which can be achieved with the following operation:
temp = float(distance[count])/ 26.22
This doesn't need to be in an if statement.
Then you need to write this value in the output file after the names:
Results.write(forename[count]+":")
Results.write(surname[count]+":")
Results.write(temp)
# line break such that each result stay in one line
Results.write("\n")
All those lines go inside the last for loop that you already have.

how do I count the characters in a group of lines separated by another kind of line?

I am currently working with a text file that has a list of DNA extraction sequences (contigs), each with a header followed by lines of nucleotides, which is the nucleotide length of that contig. there are 120 contigs, with each entry marked by a line that starts with ">" to denote the sequence information. after this line, a length of nucleotides of that sequence is given.
example:
>gi|571136972|ref|XM_006625214.1| Plasmodium chabaudi chabaudi small subunit ribosomal protein 5 (Rps5) (rps5) mRNA, complete cds
ATGAGAAATATTTTATTAAAGAAAAAATTATATAATAGTAAAAATATTTATATTTTATATTATATTTTAATAATATTTAAAAGTATTTTTATTATTTTATTTAATAGTAAATATAATGTGAATTATTATTTATATAATAAAATTTATAATTTATTTATTATATATATAAAATTATATTATATTATAAATAATATATATTATAATAATAATTATTATTATATATATAATATGAATTATATA
TATTTTTATATTTATAAATATAATAGTTTAAATAATA
>gi|571136996|ref|XM_006625226.1| Plasmodium chabaudi chabaudi small subunit ribosomal protein 2 (Rps2) (rps2) mRNA, complete cds
ATGTTTATTACATTTAAAGATTTATTAAAATCTAAAATATATATAGGAAATAATTATAAAAATATTTATATTAATAATTATAAATTTATATATAAAATAAAATATAATTATTGTATTTTAAATTTTACATTAATTATATTATATTTATATAAATTATATTTATATATTTATAATATATCTATATTTAATAATAAAATTTTATTTATTATTAATAATAATTTAATTACAAATTTAATTATT
AATATATGTAATTTAACTAATAATTTTTATATTATTA
what I would like to do is make a list of every contig. My problem is, I do not know the syntax needed to tell Python to:
find the line after the line that starts with ">"
take a count of all of the characters in the lines of that sequence
return a value to a list of all contig values (a list that gives a list of length of every contig, ie 126, 300, 25...)
make sure the last contig (which has no ">" to denote its end) is counted.
I would like a list of integers, so that I can calculate things like the mean length of the contigs, standard deviation, cool gene equations etc.
I am relatively new to programming. if I am unclear or further information is needed, please let me know.

Don't reinvent the wheel, use biopython as Martin has suggested. Here's a start for you that will print the sequence ID and length to terminal. You can install biopython with pip, i.e. pip install biopython
from Bio import SeqIO
import sys
FileIn = sys.argv[1]
handle = open(FileIn, 'rU')
SeqRecords = SeqIO.parse(handle, 'fasta')
for record in SeqRecords: #loop through each fasta entry
length = len(record.seq) #get sequence length
print "%s: %i bp" % (record.id, length) #print sequence ID: seq length
Or you could store the results in a dictionary:
handle = open(FileIn, 'rU')
sequence_lengths = {}
SeqRecords = SeqIO.parse(handle, 'fasta')
for record in SeqRecords: #loop through each fasta entry
length = len(record.seq) #get sequence length
sequence_lengths[record.id] = length
#access dictionary outside of loop
print sequence_lengths

This might work for you: it prints the number of ACGT's in the lines that follow a line that includes >:
import re
with open("input.txt") as input_file:
data = input_file.read()
data = re.split(r">.*", data)[1:]
data = [sum(1 for ch in datum if ch in 'ACGT') for datum in data]
print(data)

thanks for all the help. I have looked at the biopython stuff and am excited to understand it and incorporate it. The overall goal of this assignment was to teach me how to understand python, rather than finding the solution outright, or at least if I find the solution, I have to be able to explain it in my own words.
Anyway, I have created a code incorporating that element as well as others. I have a few more things to do, and if I am confused, I will return to ask.
here is my first working code outside of working directly with my supervisor or tutorials that I made and understand (woo!):
import re
with open("COPYFORTESTINGplastid.1.rna.fna") as fasta:
contigs = 0
for line in fasta:
if line.strip().startswith('>'):
contigs = contigs + 1
with open("COPYFORTESTINGplastid.1.rna.fna") as fasta:
data = fasta.read()
data = re.split(r">.*", data)[1:]
data = [sum(1 for ch in datum if ch in 'ACGT') for datum in data]
print "Total number of contigs: %s" %contigs
total_contigs = sum(data)
N50 = sum(data)/2
print "number used to determine N50 = %s" %N50
average = 0
total = 0
for n in data:
total = total + n
mean = total / len(data)
print "mean length of contigs: %s" %mean
print "total nucleotides in fasta = %s" %total_contigs
#print "list of contigs by length: %s" %sorted([data])
l = data
l.sort(reverse = True)
print "list of contigs by length: %s" %l
this does what I want it to do, but if you have any comments or advice, I would love to hear.
next up, determining N50 with this sweet sweet list. thanks again!

I created a function to calculate N50 and it seemed to work nicely. I can parse the command line and run any .fa file through the program
def calc_n50(array):
array.sort(reverse = True)
n50 = 0 #sums lengths
n = 0 #n50 sequence
half = sum(array)/2
for val in array:
n50 += val
if n50 >= half:
n = val
break #breaks loop when condition is met
print "N50 is",n

Python: Writing peoples scores to individual lines

I have a task where I need to record peoples scores in a text file. My Idea was to set it out like this:
Jon: 4, 1, 3
Simon: 1, 3, 6
This has the name they inputted along with their 3 last scores (Only 3 should be recorded).
Now for my question; Can anyone point me in the right direction to do this? Im not asking for you to write my code for me, Im simply asking for some tips.
Thanks.
Edit: Im guessing it would look something like this: I dont know how I'd add scores after their first though like above.
def File():
score = str(Name) + ": " + str(correct)
File = open('Test.txt', 'w+')
File.write(score)
File.close()
Name = input("Name: ")
correct = input("Number: ")
File()

You could use pandas to_csv() function and store your data in a dictionary. It will be much easier than creating your own format.
from pandas import DataFrame, read_csv
import pandas as pd
def tfile(names):
df = DataFrame(data = names, columns = names.keys())
with open('directory','w') as f:
f.write(df.to_string(index=False, header=True))
names = {}
for i in xrange(num_people):
name = input('Name: ')
if name not in names:
names[name] = []
for j in xrange(3):
score = input('Score: ')
names[name].append(score)
tfile(names)
Simon Jon
1 4
3 1
6 3
This should meet your text requirement now. It converts it to a string and then writes the string to the .txt file. If you need to read it back in you can use pandas read_table(). Here's a link if you want to read about it.

Since you are not asking for the exact code, here is an idea and some pointers
Collect the last three scores per person in a list variable called last_three
do something like:
",".join(last_three) #this gives you the format 4,1,3 etc
write to file an entry such as
name + ":" + ",".join(last_three)
You'll need to do this for each "line" you process
I'd recommend using with clause to open the file in write mode and process your data (as opposed to just an "open" clause) since with handles try/except/finally problems of opening/closing file handles...So...
with open(my_file_path, "w") as f:
for x in my_formatted_data:
#assuming x is a list of two elements name and last_three elems (example: [Harry, [1,4,5]])
name, last_three = x
f.write(name + ":" + ",".join(last_three))
f.write("\n")# a new line
In this way you don't really need to open/close file as with clause takes care of it for you

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Having trouble with my python program - python

Related

How to solve an error with improper alphabetical comparison

How to parse letter by letter and make a list with Python?

Dividing an array in a for loop?

how do I count the characters in a group of lines separated by another kind of line?

Python: Writing peoples scores to individual lines

Categories

Resources