This question already has answers here:
How to find the average of values in a .txt file
(2 answers)
Closed 7 years ago.
I have a text file that contains the numbers like this:
a: 0.8475
b: 0.6178
c: 0.6961
d: 0.7565
e: 0.7626
f: 0.7556
g: 0.7605
h: 0.6932
i: 0.7558
j: 0.6526
I want to extract only the floating point numbers from this file and calculate the average. Here is my program so far,
fh = file.open('abc.txt')
for line in fh:
line_pos = line.find(':')
line = line[line_pos+1:]
line = line.rstrip()
sum = 0
average = 0
for ln in line:
sum = sum + ln
average = sum / len(line)
print average
Can anyone tell me, what is wrong with this code. Thanks
You have the sum addition in the wrong place, and you need to keep track of the number of lines, since you can't send a file object to len(). You will also have to cast the strings to floats. I'd recommend simply splitting on whitespace, as well. Finally, use the with construct to automatically close the file:
with open('abc.txt') as fh:
sum = 0 # initialize here, outside the loop
count = 0 # and a line counter
for line in fh:
count += 1 # increment the counter
sum += float(line.split()[1]) # add here, not in a nested loop
average = sum / count
print average
Convert the line to float to do numeric addition.
Initialize sum once (before the loop begin). Calculate the average after the loop once.
len(line) will give you wrong number. The number of digitis + newline character count for the last number.
Try to avoid using str.find + slicing. Using str.split is more readable.
with open('abc.txt') as fh:
sum = 0
numlines = 0
for line in fh:
n = line.split(':')[-1]
sum += float(n)
numlines += 1
average = sum / numlines
print average
Related
I'm very new to coding so excuse any errors!
I need to modify the below script (GC_content.py) with another file as the input (SfPhi01.fasta) to calculate a skew for that input file in Python v2, editing in emacs. I then need to modify it to calculate the skew for 100-base pair segments (biological context) with a step size of 50 base pairs.
I've been given the tips to use a substring to extract sequence from position X to (X+100) and set the initial value of X as 0.
I also need to use a while loop that increases the value of X by 50 and keep calculating as long as X is smaller than the size of the genome minus 100 base pairs.
import re
i=0
Input = open(sys.argv[1], 'r')
Output = open(sys.argv[1]+'GC_content', 'w')
for line in Input:
if re.match('>', line):
if i>0:
Output.write('\n')
Output.write(line)
else:
Output.write(line)
i+=1
else:
length = float(len(line.replace('\n', '')))
G = re.subn('G', 'G', line)
C = re.subn('C', 'C', line)
G_count = float(G[1])
C_count = float(C[1])
GC_content = (G_count + C_count)/length
Output.write('GC content: '+'\t'+str(GC_content)+'\n')
Input.close()
Output.close()
I have a 7000+ lines .txt file, containing description and ordered path to image. Example:
abnormal /Users/alex/Documents/X-ray-classification/data/images/1.png
abnormal /Users/alex/Documents/X-ray-classification/data/images/2.png
normal /Users/alex/Documents/X-ray-classification/data/images/3.png
normal /Users/alex/Documents/X-ray-classification/data/images/4.png
Some lines are missing. I want to somehow automate the search of missing lines. Intuitively i wrote:
f = open("data.txt", 'r')
lines = f.readlines()
num = 1
for line in lines:
if num in line:
continue
else:
print (line)
num+=1
But of course it didn't work, since lines are strings.
Is there any elegant way to sort this out? Using regex maybe?
Thanks in advance!
the following should hopefully work - it grabs the number out of the filename, sees if it's more than 1 higher than the previous number, and if so, works out all the 'in-between' numbers and prints them. Printing the number (and then reconstructing the filename later) is needed as line will never contain the names of missing files during iteration.
# Set this to the first number in the series -1
num = lastnum = 0
with open("data.txt", 'r') as f:
for line in f:
# Pick the digit out of the filename
num = int(''.join(x for x in line if x.isdigit()))
if num - lastnum > 1:
for i in range(lastnum+1, num):
print("Missing: {}.png".format(str(i)))
lastnum = num
The main advantage of working this way is that as long as your files are sorted in the list, it can handle starting at numbers other than 1, and also reports more than one missing number in the sequence.
You can try this:
lines = ["abnormal /Users/alex/Documents/X-ray-classification/data/images/1.png","normal /Users/alex/Documents/X-ray-classification/data/images/3.png","normal /Users/alex/Documents/X-ray-classification/data/images/4.png"]
maxvalue = 4 # or any other maximum value
missing = []
i = 0
for num in range(1, maxvalue+1):
if str(num) not in lines[i]:
missing.append(num)
else:
i += 1
print(missing)
Or if you want to check for the line ending with XXX.png:
lines = ["abnormal /Users/alex/Documents/X-ray-classification/data/images/1.png","normal /Users/alex/Documents/X-ray-classification/data/images/3.png","normal /Users/alex/Documents/X-ray-classification/data/images/4.png"]
maxvalue = 4 # or any other maximum value
missing = []
i = 0
for num in range(1, maxvalue+1):
if not lines[i].endswith(str(num) + ".png"):
missing.append(num)
else:
i += 1
print(missing)
Example: here
I am trying to find averages from a text file. The text file has columns of numbers and I want to find the average of each column. I get the follwoing error:
IndexError: list index out of range
The code I am using is:
import os
os.chdir(r"path of my file")
file_open = open("name of my file", "r")
file_write = open ("average.txt", "w")
line = file_open.readlines()
list_of_lines = []
length = len(list_of_lines[0])
total = 0
for i in line:
values = i.split('\t')
list_of_lines.append(values)
count = 0
for j in list_of_lines:
count +=1
for k in range(0,count):
print k
list_of_lines[k].remove('\n')
for o in range(0,count):
for p in range(0,length):
print list_of_lines[p][o]
number = int(list_of_lines[p][o])
total + number
average = total/count
print average
The error is in line
length = len(list_of_lines[0])
Please let me know if I can provide anymore information.
The issue is you are trying to get the length of something in the array, not the array itself.
Try this:
length = len(list_of_lines)
You wrote length = len(list_of_lines[0])
line_of_lines is defined right above this line, as a list with 0 items in it. As a result, you cannot select the first item (index number 0) because index number 0 does not exist. Therefore, it is out of range.
I need a little help with a sum function. I'm trying to locate all the lines with prefix "X-DSPAM-Confidence:" in a document. After i extract them i want to call sum() on them and calculate the average. Thanks, heaps!!!
for line in (fhand):
line = line.rstrip()
if not line.startswith("X-DSPAM-Confidence:"):
continue
else:
n = float(line[line.find(":") + 1:])
a = sum(n)
count = count + 1
print (n)
print (a)
print (total / count)
I don't know if I understood this correctly, but as far as I can see, you only need to store the sum of the values in a variable, something like:
total = 0.0
count = 0
for line in (fhand):
line = line.rstrip()
if not line.startswith("X-DSPAM-Confidence:"):
continue
else:
n = float(line[line.find(":") + 1:])
total += n
count = count + 1
print (total / count)
i need to print out average height from a .txt file. How do I write it in an easy way? The .txt file has these numbers:
12
14
59
48
45
12
47
65
152
this is what i've got so far:
import math
text = open(r'stuff.txt').read()
data = []
with open(r'stuff.txt') as f:
for line in f:
fields = line.split()
rowdata = map(float, fields)
data.extend(rowdata)
biggest = min(data)
smallest = max(data)
print(biggest - smallest)
To compute the average of some numbers, you should sum them up and then divide by the number of numbers:
data = []
with open(r'stuff.txt') as f:
for line in f:
fields = line.split()
rowdata = map(float, fields)
data.extend(rowdata)
print(sum(data)/len(data))
# import math -- you don't need this
# text = open(r'stuff.txt').read() not needed.
# data = [] not needed
with open(r'stuff.txt') as f:
data = [float(line.rstrip()) for line in f]
biggest = min(data)
smallest = max(data)
print(biggest - smallest)
print(sum(data)/len(data))
data = [float(ln.rstrip()) for ln in f.readlines()] # Within 'with' statement.
mean_average = float(sum(data))/len(data) if len(data) > 0 else float('nan')
That is the way to calculate the mean average, if that is what you meant. Sadly, math does not have a function for this. FYI, the mean_average line is modified in order to avoid the ZeroDivisionError bug that would occur if the list had a length of 0- just in case.
Array average can be computed like this:
print(sum(data) / len(data))
A simple program for finding the average would be the following (if I understand well, your file has one value in each line, if so, it has to be similar to this, else it has to change accordingly):
import sys
f = open('stuff.txt', 'rU')
lines = f.readlines()
f.close()
size = len(lines)
sum=0
for line in lines:
sum = sum + float(line.rstrip())
avg = sum / float(size)
print avg,
Not the best that can be in python but it's quite straight forward I think...
A full, almost-loopless solution combining elements of other answers here:
with open('stuff.txt','r') as f:
data = [float(line.rstrip()) for line in f.readlines()]
f.close()
mean = float(sum(data))/len(data) if len(data) > 0 else float('nan')
and you don't need to prepend, append, enclose or import anything else.