i need to print out average height from a .txt file. How do I write it in an easy way? The .txt file has these numbers:
12
14
59
48
45
12
47
65
152
this is what i've got so far:
import math
text = open(r'stuff.txt').read()
data = []
with open(r'stuff.txt') as f:
for line in f:
fields = line.split()
rowdata = map(float, fields)
data.extend(rowdata)
biggest = min(data)
smallest = max(data)
print(biggest - smallest)
To compute the average of some numbers, you should sum them up and then divide by the number of numbers:
data = []
with open(r'stuff.txt') as f:
for line in f:
fields = line.split()
rowdata = map(float, fields)
data.extend(rowdata)
print(sum(data)/len(data))
# import math -- you don't need this
# text = open(r'stuff.txt').read() not needed.
# data = [] not needed
with open(r'stuff.txt') as f:
data = [float(line.rstrip()) for line in f]
biggest = min(data)
smallest = max(data)
print(biggest - smallest)
print(sum(data)/len(data))
data = [float(ln.rstrip()) for ln in f.readlines()] # Within 'with' statement.
mean_average = float(sum(data))/len(data) if len(data) > 0 else float('nan')
That is the way to calculate the mean average, if that is what you meant. Sadly, math does not have a function for this. FYI, the mean_average line is modified in order to avoid the ZeroDivisionError bug that would occur if the list had a length of 0- just in case.
Array average can be computed like this:
print(sum(data) / len(data))
A simple program for finding the average would be the following (if I understand well, your file has one value in each line, if so, it has to be similar to this, else it has to change accordingly):
import sys
f = open('stuff.txt', 'rU')
lines = f.readlines()
f.close()
size = len(lines)
sum=0
for line in lines:
sum = sum + float(line.rstrip())
avg = sum / float(size)
print avg,
Not the best that can be in python but it's quite straight forward I think...
A full, almost-loopless solution combining elements of other answers here:
with open('stuff.txt','r') as f:
data = [float(line.rstrip()) for line in f.readlines()]
f.close()
mean = float(sum(data))/len(data) if len(data) > 0 else float('nan')
and you don't need to prepend, append, enclose or import anything else.
Related
I have a 7000+ lines .txt file, containing description and ordered path to image. Example:
abnormal /Users/alex/Documents/X-ray-classification/data/images/1.png
abnormal /Users/alex/Documents/X-ray-classification/data/images/2.png
normal /Users/alex/Documents/X-ray-classification/data/images/3.png
normal /Users/alex/Documents/X-ray-classification/data/images/4.png
Some lines are missing. I want to somehow automate the search of missing lines. Intuitively i wrote:
f = open("data.txt", 'r')
lines = f.readlines()
num = 1
for line in lines:
if num in line:
continue
else:
print (line)
num+=1
But of course it didn't work, since lines are strings.
Is there any elegant way to sort this out? Using regex maybe?
Thanks in advance!
the following should hopefully work - it grabs the number out of the filename, sees if it's more than 1 higher than the previous number, and if so, works out all the 'in-between' numbers and prints them. Printing the number (and then reconstructing the filename later) is needed as line will never contain the names of missing files during iteration.
# Set this to the first number in the series -1
num = lastnum = 0
with open("data.txt", 'r') as f:
for line in f:
# Pick the digit out of the filename
num = int(''.join(x for x in line if x.isdigit()))
if num - lastnum > 1:
for i in range(lastnum+1, num):
print("Missing: {}.png".format(str(i)))
lastnum = num
The main advantage of working this way is that as long as your files are sorted in the list, it can handle starting at numbers other than 1, and also reports more than one missing number in the sequence.
You can try this:
lines = ["abnormal /Users/alex/Documents/X-ray-classification/data/images/1.png","normal /Users/alex/Documents/X-ray-classification/data/images/3.png","normal /Users/alex/Documents/X-ray-classification/data/images/4.png"]
maxvalue = 4 # or any other maximum value
missing = []
i = 0
for num in range(1, maxvalue+1):
if str(num) not in lines[i]:
missing.append(num)
else:
i += 1
print(missing)
Or if you want to check for the line ending with XXX.png:
lines = ["abnormal /Users/alex/Documents/X-ray-classification/data/images/1.png","normal /Users/alex/Documents/X-ray-classification/data/images/3.png","normal /Users/alex/Documents/X-ray-classification/data/images/4.png"]
maxvalue = 4 # or any other maximum value
missing = []
i = 0
for num in range(1, maxvalue+1):
if not lines[i].endswith(str(num) + ".png"):
missing.append(num)
else:
i += 1
print(missing)
Example: here
I have a file which contains blocks of lines that I would like to separate. Each block contains a number identifier in the block's header: "Block X" is the header line for the X-th block of lines. Like this:
Block X
#L E C A F X M N
11.2145 15 27 29.444444 7.6025229 1539742 29.419783
11.21451 13 28 24.607143 6.8247935 1596787 24.586264
...
Block Y
#L E C A F X M N
11.2145 15 27 29.444444 7.6025229 1539742 29.419783
11.21451 13 28 24.607143 6.8247935 1596787 24.586264
...
I can use "enumerate" to find the header line of the block as follows:
with open(filename,'r') as indata:
for num, line in enumerate(indata):
if 'Block X' in line:
startblock=num
print startblock
This will yield the line number of the first line of block #X.
However, my problem is identifying the last line of the block. To do that, I could find the next occurrence of a header line (i.e., the next block) and subtract a few numbers.
My question: how can I find the line number of a the next occurrence of a condition (i.e., right after a certain condition was met)?
I tried using enumerate again, this time indicating the starting value, like this:
with open(filename,'r') as indata:
for num, line in enumerate(indata,startblock):
if 'Block Y ' in line:
endscan=num
break
print endscan
That doesn't work, because it still begins reading the file from line 0, NOT from the line number "startblock". Instead, by starting the "enumerate" counter from a different number, the resulting value of the counter, in this case "endscan" is shifted from 0 by the amount "startblock".
Please, help! How can tell python to disregard the lines previous to "startblock"?
If you want the groups using Block as the delimiter for each section, you can use itertools.groupby:
from itertools import groupby
with open('test.txt') as f:
grp = groupby(f,key=lambda x: x.startswith("Block "))
for k,v in grp:
if k:
print(list(v) + list(next(grp, ("", ""))[1]))
Output:
['Block X\n', '#L E C A F X M N \n', '11.2145 15 27 29.444444 7.6025229 1539742 29.419783\n', '11.21451 13 28 24.607143 6.8247935 1596787 24.586264\n']
['Block Y\n', '#L E C A F X M N \n', '11.2145 15 27 29.444444 7.6025229 1539742 29.419783\n', '11.21451 13 28 24.607143 6.8247935 1596787 24.586264']
If Block can appear elsewhere but you want it only when followed by a space and a single char:
import re
with open('test.txt') as f:
r = re.compile("^Block \w$")
grp = groupby(f, key=lambda x: r.search(x))
for k, v in grp:
if k:
print(list(v) + list(next(grp, ("", ""))[1]))
You can use the .tell() and .seek() methods of file objects to move around. So for example:
with open(filename, 'r') as infile:
start = infile.tell()
end = 0
for line in infile:
if line.startswith('Block'):
end = infile.tell()
infile.seek(start)
# print all the bytes in the block
print infile.read(end - start)
# now go back to where we were so we iterate correctly
infile.seek(end)
# we finished a block, mark the start
start = end
If the difference between the header lines is uniform throughout the file, just use the distance to increase the indexing variable accordingly.
file1 = open('file_name','r')
lines = file1.readlines()
numlines = len(lines)
i=0
for line in file:
if line == 'specific header 1':
line_num1 = i
if line == 'specific header 2':
line_num2 = i
i+=1
diff = line_num2 - line_num1
Now that we know the difference between the line numbers we use for loops to acquire the data.
k=0
array = np.zeros([numlines, diff])
for i in range(numlines):
if k % diff == 0:
for j in range(diff):
array[i][j] = lines[i+j]
k+=1
% is the mod operator which returns 0 only when k is a multiple of the difference in line numbers between the two header lines in the file, which will only occur when the line corresponds to the a header line. Once the line is fixed we go on to the second for loop that fills the array so that we have a matrix that is numlines number of rows and a diff number of columns. The nonzeros rows will contain the data inbetween the header lines.
I have not tried this out, I am just writing off the top of my head. Hopefully it helps!
This question already has answers here:
How to find the average of values in a .txt file
(2 answers)
Closed 7 years ago.
I have a text file that contains the numbers like this:
a: 0.8475
b: 0.6178
c: 0.6961
d: 0.7565
e: 0.7626
f: 0.7556
g: 0.7605
h: 0.6932
i: 0.7558
j: 0.6526
I want to extract only the floating point numbers from this file and calculate the average. Here is my program so far,
fh = file.open('abc.txt')
for line in fh:
line_pos = line.find(':')
line = line[line_pos+1:]
line = line.rstrip()
sum = 0
average = 0
for ln in line:
sum = sum + ln
average = sum / len(line)
print average
Can anyone tell me, what is wrong with this code. Thanks
You have the sum addition in the wrong place, and you need to keep track of the number of lines, since you can't send a file object to len(). You will also have to cast the strings to floats. I'd recommend simply splitting on whitespace, as well. Finally, use the with construct to automatically close the file:
with open('abc.txt') as fh:
sum = 0 # initialize here, outside the loop
count = 0 # and a line counter
for line in fh:
count += 1 # increment the counter
sum += float(line.split()[1]) # add here, not in a nested loop
average = sum / count
print average
Convert the line to float to do numeric addition.
Initialize sum once (before the loop begin). Calculate the average after the loop once.
len(line) will give you wrong number. The number of digitis + newline character count for the last number.
Try to avoid using str.find + slicing. Using str.split is more readable.
with open('abc.txt') as fh:
sum = 0
numlines = 0
for line in fh:
n = line.split(':')[-1]
sum += float(n)
numlines += 1
average = sum / numlines
print average
i have a .txt file with this(it should be random names, tho):
My Name 4 8 7
Your Name 5 8 7
You U 5 9 7
My My 4 8 5
Y Y 8 7 9
I need to put the information into text file results.txt with the names + average of the numbers. How do I do that?
with open(r'stuff.txt') as f:
mylist = list(f)
i = 0
sk = len(mylist)
while i < sk - 4:
print(mylist[i], mylist[i+1], mylist[i+2], mylist[i+3])
i = i + 3
Firstly, open both the input and output files:
with open("stuff.txt") as in_file:
with open("results.txt", "w") as out_file:
Since the problem only needs to work on each line independently, a simple loop over each line would suffice:
for line in in_file:
Split each line at the whitespaces into list of strings (row):
row = line.split()
The numbers occur after the first two fields:
str_nums = row[2:]
However, these are still strings, so they must be converted to a floating-point number to allow arithmetic to be performed on them. This results in a list of floats (nums):
nums = map(float, str_nums)
Now calculate the average:
avg = sum(nums) / len(str_nums)
Finally, write the names and the average into the output file.
out_file.write("{} {} {}\n".format(row[0], row[1], avg))
what about this?
with open(fname) as f:
new_lines = []
lines = f.readlines()
for each in lines:
col = each.split()
l = len(col)#<-- length of each line
average = (int(col[l-1])+int(col[l-2])+int(col[l-3]))/3
new_lines.append(col[0]+col[1]+str(average) + '\n')
for each in new_lines:#rewriting new lines into file
f.write(each)
f.close()
I tried, and this worked:
inputtxt=open("stuff.txt", "r")
outputtxt=open("output.txt", "w")
output=""""""
for i in inputtxt.readlines():
nums=[]
name=""
for k in i:
try:
nums.append(int(k))
except:
if k!=" ":
name+=k
avrg=0
for j in nums:
avrg+=j
avrg/=len(nums)
line=name+" "+str(avrg)+"\n"
output+=line
outputtxt.write(output)
inputtxt.close()
outputtxt.close()
I am doing some filtering on csv file where for every title there are many duplicate IDs with different prediction values, so the column 2 (pythoniac) is different. I would like to keep only 30 lowest values but with unique ID. I came to this code, but I don't know how to keep lowest 30 entries.
Can you please help with suggestions how to obtain 30 unique by ID entries?
# title1 id1 100 7.78E-25 # example of the line
with open("test.txt") as fi:
cmp = {}
for R in csv.reader(fi, delimiter='\t'):
for L in ligands:
newR = R[0], R[1]
if R[0] == L:
if (int(R[2]) <= int(1000) and int(R[2]) != int(0) and float(R[3]) < float("1.0e-10")):
if newR in cmp:
if float(cmp[newR][3]) > float(R[3]):
cmp[newR] = R[:-2]
else:
cmp[newR] = R[:-2]
Maybe try something along this line...
from bisect import insort
nth_lowest = [very_high_value] * 30
for x in my_loop:
do_stuff()
...
if x < nth_lowest[-1]:
insort(nth_lowest, x)
nth_lowest.pop() # remove the highest element