I am having a problem of calculating the average value of numbers in a file.
So far i have made a function that reads in files and calculate the number of lines.
The file consists of many columns of numbers, but the column 8 is the one i need to calculate from.
def file_read():
fname = input("Input filname: ")
infile = open(fname,'r')
txt = infile.readlines()
print("opens",fname,"...")
num_lines = sum(1 for line in open(fname))
#The first line in the file is only text, so i subtract 1
print("Number of days:",(num_lines-1))
The numbers are also decimals, so i use float.
This is my try on calculating the sum of numbers,
which shall be divided by the number of lines , but i comes an error, cuz the first line is text.
with open(fname) as txt:
return sum(float(x)
for line in txt
for x in line.split()[8]
Is there a way i can get python to ignore the first line and just concentrate about the numbers down under?
You could use txt.readline() to read the first line, but to stick with iterators way to do it, just drop the first line using iteration on file with next
with open(fname) as txt:
next(txt) # it returns the first line, we just ignore the return value
# your iterator is now on the second line, where the numbers are
for line in txt:
...
Side note: this is also very useful to skip title lines of files open with the csv module, that's where next is better than readline since csv title can be on multiple lines.
Try this
import re
#regular expression for decimals
digits_reg = re.compile(r"\d+\.\d+|\d+")
with open('''file name''', "r") as file:
allNum = []
#find numbers in each line and add them to the list
for line in file:
allNum.extend(digits_reg.findall(line))
#should be a list that contains all numbers in the file
print(alNum)
Related
I'm trying to use python to open and read a file with a line that repeats in my output. The line is:
"AVE. CELL LNTHS[bohr] = 0.4938371E+02 0.4938371E+02 0.4938371E+02"
the values change in each line ( with every step), but all lines start with AVE. CELL LNTHS[bohr]. I want to take the first of the three values from every line, and make a list.the image is a snip of the output file and repeating line of interest.
You can use the float command to convert a string to number. Also, use split to split the line first on the '=' then on space. Lastly, use list comprehension to build a list from the parts of the string.
path_to_file = r"C:\Documents\whatever.csv"
with open(path_to_file, "r") as file:
for line in file:
if line.startswith("AVE. CELL LNTHS[bohr]"):
values = [float(x) for x in line.split("=")[1].split()]
# Do something with the values
print(values)
I have two files. One file contains lines of numbers. The other file contains lines of text. I want to look up specific lines of text from the list of numbers. Currently my code looks like this.
a_file = open("numbers.txt")
b_file = open("keywords.txt")
for position, line in enumerate(b_file):
lines_to_read = [a_file]
if position in lines_to_read:
print(line)
The values in numbers look like this..
26
13
122
234
41
The values in keywords looks like (example)
this is an apple
this is a pear
this is a banana
this is a pineapple
...
...
...
I can manually write out the values like this
lines_to_read = [26,13,122,234,41]
but that defeats the point of using a_file to look up the values in b_file. I have tried using strings and other variables but nothing seems to work.
[a_file] is a list with one single element which is a_file. What you want is a list containing the lines which you can get with a_file.readlines() or list(read_lines). But you do not want the text value of lines but their integer value, and you want to search often the container meaning that a set would be better. At the end, I would write:
lines_to_read = set(int(line) for line in a_file)
This is now fine:
for position, line in enumerate(b_file):
if position in lines_to_read:
print(line)
You need to read the contents of the a_file to get the numbers out.
Something like this should work:
lines_to_read = [int(num.strip()) for num in a_file.readlines()]
This will give you a list of the numbers in the file - assuming each line contains a single line number to lookup.
Also, you wouldn't need to put this inside the loop. It should go outside the loop - i.e. before it -- these numbers are fixed once read in from the file, so there's no need to process them again in each iteration.
socal_nerdtastic helped me find this solution. Thanks so much!
# first, read the numbers file into a list of numbers
with open("numbers.txt") as f:
lines_to_read = [int(line) for line in f]
# next, read the keywords file into a list of lines
with open("keywords.txt") as f:
keyword_lines = f.read().splitlines()
# last, use one to print the other
for num in lines_to_read:
print(keyword_lines[num])
I would just do this...
a_file = open("numbers.txt")
b_file = open("keywords.txt")
keywords_file = b_file.readlines()
for x in a_file:
print(keywords_file[int(x)-1])
This reads all lines of the keywords file to get the data as a list, then iterate through your numbers file to get the line numbers, and use those line numbers as the index of the array
I am trying to count entries in a text file but having difficulty. The key is that each line is one entry and if the term "ADALIMUMAB" shows up in the line, it counts as one. If it shows up twice, it still should only count as one. Here is an example of lines in the text file.
101700392$10170039$3$I$BUDESONIDE.$BUDESONIDE$1$Oral$9 MG, DAILY$$$$$$$$9$MG$$
101700392$10170039$4$C$ADALIMUMAB$ADALIMUMAB$1$$UNK$$$$$$$$$$$
102117144$10211714$1$PS$HUMIRA$ADALIMUMAB$1$Subcutaneous$$$$$N$ NOT AVAILABLE,NOT
I currently have this working:
fDRUG14Q3 = open("DRUG14Q3.txt")
data = fDRUG14Q3.read()
occurencesDRUG14Q3 = data.count("ADALIMUMAB")
But it will count line 2 in the example above as 2 entries rather than one.
You can use a generator expression passed to sum(). Each line will either be True(1) of False(0) and you'll take the total count. Basically you are counting how many lines return True for 'ADALIMUMAB' in line:
with open(path, 'r') as f:
total = sum('ADALIMUMAB' in line for line in f)
print(total)
# 2
This has the added benefit of not requiring you to read the whole file into memory first too.
I have the following text file:
abstract 233:1 253:1 329:2 1087:2 1272:1
game 64:1 99:1 206:1 595:1
direct 50:1 69:1 1100:1 1765:1 2147:1 3160:1
each key pair is how many times each string appears in a document [docID]:[stringFq]
How could you calculate the number of key pairs in this text file?
Your regex approach works fine. Here is an iterative approach. If you uncomment the print statements you will uncover some itermediate results.
Given
%%file foo.txt
abstract 233:1 253:1 329:2 1087:2 1272:1
game 64:1 99:1 206:1 595:1
direct 50:1 69:1 1100:1 1765:1 2147:1 3160:1
Code
import itertools as it
with open("foo.txt") as f:
lines = f.readlines()
#print(lines)
pred = lambda x: x.isalpha()
count = 0
for line in lines:
line = line.strip("\n")
line = "".join(it.dropwhile(pred, line))
pairs = line.strip().split(" ")
#print(pairs)
count += len(pairs)
count
# 15
Details
First we use a with statement, which an idiom for safely opening and closing files. We then split the file into lines via readlines(). We define a conditional function (or predicate) that we will use later. The lambda expression is used for convenience and is equivalent to the following function:
def pred(x):
return x.isaplha()
We initialize a count variable and start iterating each line. Every line may have a trailing newline character \n, so we first strip() them away before feeding the line to dropwhile.
dropwhile is a special itertools iterator. As it iterates a line, it will discard any leading characters that satisfy the predicate until it reaches the first character that fails the predicate. In other words, all letters at the start will be dropped until the first non-letter is found (which happens to be a space). We clean the new line again, stripping the leading space, and the remaining string is split() into a list of pairs.
Finally the length of each line of pairs is incrementally added to count. The final count is the sum of all lengths of pairs.
Summary
The code above shows how to tackle basic file handling with simple, iterative steps:
open the file
split the file into lines
while iterating each line, clean and process data
output a result
import re
file = open('input.txt', 'r')
file = file.read()
numbers = re.findall(r"[-+]?\d*\.\d+|\d+", file)
#finds all ints from text file
numLen = len(numbers) / 2
#counts all ints, when I needed to count pairs, so I just divided it by 2
print(numLen)
Basically, I want to be able to count the number of characters in a txt file (with user input of file name). I can get it to display how many lines are in the file, but not how many characters. I am not using the len function and this is what I have:
def length(n):
value = 0
for char in n:
value += 1
return value
filename = input('Enter the name of the file: ')
f = open(filename)
for data in f:
data = length(f)
print(data)
All you need to do is sum the number of characters in each line (data):
total = 0
for line in f:
data = length(line)
total += data
print(total)
There are two problems.
First, for each line in the file, you're passing f itself—that is, a sequence of lines—to length. That's why it's printing the number of lines in the file. The length of that sequence of lines is the number of lines in the file.
To fix this, you want to pass each line, data—that is, a sequence of characters. So:
for data in f:
print length(data)
Next, while that will properly calculate the length of each line, you have to add them all up to get the length of the whole file. So:
total_length = 0
for data in f:
total_length += length(data)
print(total_length)
However, there's another way to tackle this that's a lot simpler. If you read() the file, you will get one giant string, instead of a sequence of separate lines. So you can just call length once:
data = f.read()
print(length(data))
The problem with this is that you have to have enough memory to store the whole file at once. Sometimes that's not appropriate. But sometimes it is.
When you iterate over a file (opened in text mode) you are iterating over its lines.
for data in f: could be rewritten as for line in f: and it is easier to see what it is doing.
Your length function looks like it should work but you are sending the open file to it instead of each line.