work with files, how to calculate Avrg - python

Input:txt file
iam working with file contains lot of data and i have to get some numbers after specified sentence then calc avrg of these number
# Use the file name mbox-short.txt as the file name
count = 0
fname = raw_input("Enter file name: ")
fh = open(fname)
for lines in fh:
if lines.startswith ("X-DSPAM-Confidence:"):
lines = float (lines [20:50])
count = count +1
print lines
print count
what i get from here is
0.8475
0.6178
0.6961
0.7565
0.7626
0.7556
0.7002
0.7615
0.7601
0.7605
0.6959
0.7606
0.7559
0.7605
0.6932
0.7558
0.6526
0.6948
0.6528
0.7002
0.7554
0.6956
0.6959
0.7556
0.9846
0.8509
0.9907
that loop get to lines start with that txt "X-DSPAM-Confidence:"
and strip it from 20:50 (end of it)
then get me to 2 things get the list of numbers needed and the count which will help later, now i need to sum number to calc avrg. the sum / count
how can i do that? looking for simplest way ever not a problem if i get long code
well i just improved code removed unwanted things sorry for that
print things not important but just to see what i am doing as i am a new to python

using your own code just keep track of the total and divide at the end:
count = 0
total = 0
fname = raw_input("Enter file name: ")
fh = open(fname)
for lines in fh:
if lines.startswith ("X-DSPAM-Confidence:"):
count += 1
total += float (lines [20:50])
print lines
print count
print(total/count)
If you need to store all the data then a list comp would be the best approach to store all the floats then sum and divide the length to get the average:
fname = raw_input("Enter file name: ")
with open(fname) as f:
all_data = [float(line[20:50]) for line in f if line.startswith ("X-DSPAM-Confidence:")]
avg = sum(all_data) / len(all_data)
print(all_data)
print(avg)

It would help if we saw a sample of your data, but you should be able to do this:
sum_lines = sum(lines)
avg_lines = sum_lines / count
sum() is a built in function which will sum an iterable.
I am also wondering why you are reassigning your value to lines when you do
lines = float (lines [20:50])
I would think if those are multiple comma separated floating point numbers, you would want to assign it to a list variable like float_list and then sum using the sum() function.
If you do not want to save the average, you could put a third print that says
print sum(float_list) / count
Updated to Reflect OP Update
Yes you definitely want to create a list. instead of lines = float (lines [20:50]) do this:
float_list = []
float_list = float(line[20:50])
A better way to do this would be to do it with list comprehension.
float_list = [float(lines[20:50] for lines in fh if lines.startwith("X-DSPAM-Confidence:")]
Update...
I think that I misunderstood your original use of the slice [20:50] as representing multiple numbers per line.
If it is only one number, then it would be this, which is basically the answer that Padraic Cunningham posted:
# Use the file name mbox-short.txt as the file name
fname = raw_input("Enter file name: ")
fh = open(fname)
float_list = [float(lines[20:50] for lines in fh if lines.startwith("X-DSPAM-Confidence:")]
list_sum = sum(float_list)
count = len(float_list)
list_avg = list_sum / count
For future reference, it is helpful to post an example of your input data along with your code and desired output in your original question.

Depends mostly on the formatting of your numbers, are they CSV, are they on their own line etc. In any case, a general solution:
As some of the comments have pointed out, your first loop will ruin your iterator, so I combined the two loops. The if else is redundant as if not will work fine.
count = 0
sum = 0
fname = raw_input("Enter file name: ")
fh = open(fname,'r')
lines = fh.readlines()
fh.close()
for line in lines:
if line.startswith ("X-DSPAM-Confidence:"):
continue
else:
line = float(line)
count = count +1
sum += line
avg = sum/count
print avg

Related

Reading a numerical data file and assigning to each items at the same time

I am a beginner in python, and I have a problem when I want to read my numeric data file that contains more lines. My data in the input file contains rows that include a counter number, three float numbers, and finally, a character letter that all of them separated by space.it look like this:
1 12344567.143 12345678.154 1234.123 w
2 23456789.231 23413456.342 4321.321 f
I want to assign each item in the line to a specific parameter that I can use them to other steps.
like this "NO"=first item "X"=second item "Y"=third item "code"=forth item
I am trying to write it as follow:
f1=open('t1.txt','r')
line: float
for line in f1:
print(line.split(', ',4))
f1=float
select(line.split('')
nob: object(1)=int(line[1, 1])
cnt = 0
print(nob)
cnt +=1
but received more error each time when I run the program. Anyone can help me?
The error is probably due to the wrong indentation: in Python indentation is part of the syntax. It would be helpful if you also included the error message in your question.
How about this:
all_first_numbers = []
with open('t1.txt', 'r') as f:
for line in f:
values = line.split()
first_number = int(values[0])
second_number = float(values[1])
letter_code = values[4]
# If you want to save all the first numbers in one array:
all_first_numbers.append(first_number)

Can't read numbers from a text file in python

I have to read a list of 12 grades from a text file. Each grade is on a new line in the text file. I have to find the lowest grade and drop it(in the text file as well), then find the average score of the grades. I am having trouble making the text values into integer values so I can find the average. I also can't figure out how to find the lowest grade and drop it.
Here is my code:
try:
homeworkFile = open('homework.txt', 'r')
except:
print("Error: invalid file")
else:
lines = homeworkFile.readlines()
homeworkFile.close()
homeworkFile = open('homework.txt', 'w')
for line in lines:
Thanks for any help you can give!
So this is just one way to take all of your values and calculate the average.
input_file = input("enter file name: ")
open_file = open(input_file, 'r')
Here I just put all of your values into notepad and read through it
grades_list = []
for grade in open_file:
grade_format = grade.strip() #how to remove extra blank space
grades_list.append(grade_format)
Then I used a for-loop to go through each line and put the grades into a list
grade_convert = [] #new list for all of the converted values
for grade in grades_list:
convert = float(grade) #convert each value in the list into a float
grade_convert.append(convert)
I used another list for converted each value
grade_convert = sorted(grade_convert) #very first element will be the lowest
grade_convert.pop(0) #permanently removes the very first item
grade_total = 0 #set a counter
for grade in grade_convert:
grade_total += grade #add all of the values to the counter
grade_average = grade_total / len(grade_convert) #len is number of items
print(grade_average)
grades_list = []
for line in lines :`
grade = float(line)
grades_list.append(grade)
grades_list.sort()
lowest_grade = grades_list[0] #There is the lowest grade
This is one way you can structure your logic. You need to convert values to float pre-processing. Using with is recommended instead of using open and close explicitly.
import os
file = 'homework.txt'
# check if file exists
assert os.path.exists(file), "File does not exist: {0}".format(file)
# use with to open/close file implicitly
with open(mystr, 'w') as file_in:
lines = homeworkFile.readlines()
grades = sorted(float(line) for line in lines)
# drop lowest trade
del grades[0]

Python ValueError: float: Argument: . is not number on line 12

Ok, so I'm new to python and I'm currently taking the python for everybody course (py4e).
Our lesson 7.2 assignment is to do the following:
7.2 Write a program that prompts for a file name, then opens that file and reads through the file, looking for lines of the form:
X-DSPAM-Confidence: 0.8475
Count these lines and extract the floating point values from each of the lines and compute the average of those values and produce an output as shown below. Do not use the sum() function or a variable named sum in your solution.
You can download the sample data at http://www.py4e.com/code3/mbox-short.txt when you are testing below enter mbox-short.txt as the file name.
I can't figure it out. I keep getting this error
ValueError: float: Argument: . is not number on line 12
when I run this code (see screenshot): https://gyazo.com/a61768894299970692155c819509db54
Line 12 which is num = float(balue) + float(num) keeps acting up. When I remove the float from balue then I get another which says
"TypeError: cannot concatenate 'str' and 'float' objects on line 12".
Can arguments be converted into floats or is it only a string? That might be the problem but I don't know if it's true and even if it is I don't know how to fix my code after that.
Your approach was not so bad I would say. However, I do not get what you intended with for balue in linez, as this is iterating over the characters contained in linez. What you rather would have wanted float(linez). I came up with a close solution looking like this:
fname = raw_input("Enter file name: ")
print(fname)
count = 0
num = 0
with open(fname, "r") as ins:
for line in ins:
line = line.rstrip()
if line.startswith("X-DSPAM-Confidence:"):
num += float(line[20:])
count += 1
print(num/count)
This is only intended to get you on the right track and I have NOT verified the answer or the correctness of the script, as this should be contained your homework.
I realise this is answered - I am doing the same course and I had the same error message on that exact question. It was caused by the line being read as a non float and I had to read it as
number =float((line[19:26]))
By the way, the Python environment in the course is very sensitive to spaces in strings - I just got the right code and it rejected it since I had ": " and the correct answer was ":" - just spent half an hour on colons.
Just for the sake of it, here is the answer I reached, which has been accepted as the correct one. Hope you got there in the end.
# Use the file name mbox-short.txt as the file name
count = 0
average = 0
filename = input("Enter file name: ")
filehandle = open(filename)
for line in filehandle:
if not line.startswith("X-DSPAM-Confidence:") : continue
line = line.rstrip()
number =float((line[19:26]))
count = count + 1
average = average + number
average = (average / count)
average = float(average)
print("Average spam confidence:",average)
#Take input from user
fname = input("Please inser file name:")
#try and except
try:
fhand = open(fname)
except:
print("File can not be found", fname)
quit()
#count and total
count= 0
total = 0
for line in fhand:
line = line.rstrip()
if not line.startswith("X-DSPAM-Confidence:"):
continue
else:
line = line[20:]
uline = line.upper()
fline = float(uline)
count = count+1
total = total + fline
avr = total / count
print("Average spam confidence:" ,avr )
The short answer to your problem is in fixing the slicing of the string to extract the number.
If you are looking for colon character, ":" to find the floating number,
remember you have to slice the string with +1 added to the slicing. If you do not do that you end up getting-----> : 0.8475
which is not a number.
so, slice the string with +1 added to the starting index of the string and you shall have a fix.

How to calculate each string length that belongs to a list of strings by python?

Suppose I have a file with n DNA sequences, each one in a line. I need to turn them into a list and then calculate each sequence's length and then total length of all of them together. I am not sure how to do that before they are into a list.
# open file and writing each sequences' length
f= open('seq.txt' , 'r')
for line in f:
line= line.strip()
print (line)
print ('this is the length of the given sequence', len(line))
# turning into a list:
lines = [line.strip() for line in open('seq.txt')]
print (lines)
How can I do math calculations from the list? Ex. the total length of all sequences together? Standard deviation from their different lengths etc.
Try this to output the individual length and calculate the total length:
lines = [line.strip() for line in open('seq.txt')]
total = 0
for line in lines:
print 'this is the length of the given sequence: {}'.format(len(line))
total += len(line)
print 'this is the total length: {}'.format(total)
Look into the statistics module.
You'll find all kinds of measures of averages and spreads.
You'll get the length of any sequence using len.
In your case, you'll want to map the sequences to their lengths:
from statistics import stdev
with open("seq.txt") as f:
lengths = [len(line.strip()) for line in f]
print("Number of sequences:", len(lengths))
print("Standard deviation:", stdev(lengths))
edit: Because it was asked in the comments: Here's how to cluster the instances into different files depending on their lengths:
from statistics import stdev, mean
with open("seq.txt") as f:
sequences = [line.strip() for line in f]
lengths = [len(sequence) for sequence in sequences]
mean_ = mean(lengths)
stdev_ = stdev(lengths)
with open("below.txt", "w") as below, open("above.txt", "w") as above, open("normal.txt", "w") as normal:
for sequence in sequences:
if len(sequence) > mean+stdev_:
above.write(sequence + "\n")
elif mean+stdev_ > len(sequence > mean-stdev_: #inbetween
normal.write(sequence + "\n")
else:
below.write(sequence + "\n")
The map and reduce functions can be useful to work on collections.
import operator
f= open('seq.txt' , 'r')
for line in f:
line= line.strip()
print (line)
print ('this is the length of the given sequence', len(line))
# turning into a list:
lines = [line.strip() for line in open('seq.txt')]
print (lines)
print('The total length is 'reduce(operator.add,map(len,lines)))
This will do what you require. To do additional calculations you may want to save your results from the text file into a list or set so you won't need to read from a file again.
total_length = 0 # Create a variable that will save our total length of lines read
with open('filename.txt', 'r') as f:
for line in f:
line = line.strip()
total_length += len(line) # Add the length to our total
print("Line Length: {}".format(len(line)))
print("Total Length: {}".format(total_length))
Just a couple of remarks. Use with to handle files so you don't have to worry about closing them after you are done reading\writing, flushing, etc. Also, since you are looping through the file once, why not create the list too? You don't need to go through it again.
# open file and writing each sequences' length
with open('seq.txt', 'r') as f:
sequences = []
total_len = 0
for line in f:
new_seq = line.strip()
sequences.append(new_seq)
new_seq_len = len(new_seq)
total_len += new_seq_len
print('number of sequences: {}'.format(len(sequences)))
print('total lenght: {}'.format(total_len))
print('biggest sequence: {}'.format(max(sequences, key=lambda x: len(x))))
print('\t with length {}'.format(len(sorted(sequences, key=lambda x: len(x))[-1])))
print('smallest sequence: {}'.format(min(sequences, key=lambda x: len(x))))
print('\t with length {}'.format(len(sorted(sequences, key=lambda x: len(x))[0])))
I have included some post-processing info to give you an idea of how to go about it.
If you have any questions just ask.
You have already seen how to get the list of sequences and a list of the lengths using append.
lines = [line.strip() for line in open('seq.txt')]
total = 0
sizes = []
for line in lines:
mysize = len(line)
total += mysize
sizes.append(mysize)
Note that you can also use a for loop to read each line and append to the two lists rather than read every line into lists and then loop through lists. It is a matter of which you would prefer.
You can use the statistics library (as of Python 3.4) for the statistics on the list of lengths.
statistics — Mathematical statistics functions
mean() Arithmetic mean (“average”) of data. median() Median (middle
value) of data. median_low() Low median of data.
median_high() High median of data. median_grouped() Median, or 50th
percentile, of grouped data. mode() Mode (most common value) of
discrete data. pstdev() Population standard deviation of data.
pvariance() Population variance of data. stdev() Sample standard
deviation of data. variance() Sample variance of data.
You can also use the answers at Standard deviation of a list
Note that there is an answer that actually shows the code that was added to Python 3.4 for the statistics module. If you have an older version, you can use that code or get the statistics module code for your own system.

How do i count and dispaly number of occurance word ending in "pizza" (only pizza) using input text

So I want to be able to print out number of occurances:
For example if I have following in my text file:
cheeesepizza
chickenpizza
pepornisub
pizzaaregood
I want to be able to print "There are 2 pizza." ...Here is what I have so far
f = open(file_name)
total = 0
for line in f:
if "pizza" in line:
total += 1
f.close()
print total
You need to check if "pizza" is at the end of the line. The in operator checks if something appears anywhere in a list or string, not just at the end.
We can check if a string ends with something using the endswith bound method on a str. Change your if statement to this:
if line.endswith("pizza"):
Full code:
f = open(file_name)
total = 0
for line in f:
if line.endswith("pizza"):
total += 1
f.close()
print total
If you wanted a more Pythonic way to do what you're trying to achieve, use a list comprehension and count the items, like this:
f = open(file_name)
total = len([line for line in f if line.endswith("pizza")])
f.close()
print total

Categories

Resources