reading a specific line from one file using a value from another - python

I have two files. One file contains lines of numbers. The other file contains lines of text. I want to look up specific lines of text from the list of numbers. Currently my code looks like this.
a_file = open("numbers.txt")
b_file = open("keywords.txt")
for position, line in enumerate(b_file):
lines_to_read = [a_file]
if position in lines_to_read:
print(line)
The values in numbers look like this..
26
13
122
234
41
The values in keywords looks like (example)
this is an apple
this is a pear
this is a banana
this is a pineapple
...
...
...
I can manually write out the values like this
lines_to_read = [26,13,122,234,41]
but that defeats the point of using a_file to look up the values in b_file. I have tried using strings and other variables but nothing seems to work.

[a_file] is a list with one single element which is a_file. What you want is a list containing the lines which you can get with a_file.readlines() or list(read_lines). But you do not want the text value of lines but their integer value, and you want to search often the container meaning that a set would be better. At the end, I would write:
lines_to_read = set(int(line) for line in a_file)
This is now fine:
for position, line in enumerate(b_file):
if position in lines_to_read:
print(line)

You need to read the contents of the a_file to get the numbers out.
Something like this should work:
lines_to_read = [int(num.strip()) for num in a_file.readlines()]
This will give you a list of the numbers in the file - assuming each line contains a single line number to lookup.
Also, you wouldn't need to put this inside the loop. It should go outside the loop - i.e. before it -- these numbers are fixed once read in from the file, so there's no need to process them again in each iteration.

socal_nerdtastic helped me find this solution. Thanks so much!
# first, read the numbers file into a list of numbers
with open("numbers.txt") as f:
lines_to_read = [int(line) for line in f]
# next, read the keywords file into a list of lines
with open("keywords.txt") as f:
keyword_lines = f.read().splitlines()
# last, use one to print the other
for num in lines_to_read:
print(keyword_lines[num])

I would just do this...
a_file = open("numbers.txt")
b_file = open("keywords.txt")
keywords_file = b_file.readlines()
for x in a_file:
print(keywords_file[int(x)-1])
This reads all lines of the keywords file to get the data as a list, then iterate through your numbers file to get the line numbers, and use those line numbers as the index of the array

Related

reading line in output file that repeats but has different associated values

I'm trying to use python to open and read a file with a line that repeats in my output. The line is:
"AVE. CELL LNTHS[bohr] = 0.4938371E+02 0.4938371E+02 0.4938371E+02"
the values change in each line ( with every step), but all lines start with AVE. CELL LNTHS[bohr]. I want to take the first of the three values from every line, and make a list.the image is a snip of the output file and repeating line of interest.
You can use the float command to convert a string to number. Also, use split to split the line first on the '=' then on space. Lastly, use list comprehension to build a list from the parts of the string.
path_to_file = r"C:\Documents\whatever.csv"
with open(path_to_file, "r") as file:
for line in file:
if line.startswith("AVE. CELL LNTHS[bohr]"):
values = [float(x) for x in line.split("=")[1].split()]
# Do something with the values
print(values)

Printing characters from a given sequence till a certain range only. How to do this in Python?

I have a file in which I have a sequence of characters. I want to read the second line of that file and want to read the characters of that line to a certain range only.
I tried this code, however, it is only printing specific characters from both lines. And not printing the range.
with open ("irumfas.fas", "r") as file:
first_chars = [line[1] for line in file if not line.isspace()]
print(first_chars)
Can anyone help in this regard? How can I give a range?
Below is mentioned the sequence that I want to print.But I want to start printing the characters from the second line of the sequence till a certain range only.
IRUMSEQ
ATTATAAAATTAAAATTATATCCAATGAATTCAATTAAATTAAATTAAAGAATTCAATAATATACCCCGGGGGGATCCAATTAAAAGCTAAAAAAAAAAAAAAAAAA
The following approach can be used.
Consider the file contains
RANDOMTEXTSAMPLE
SAMPLERANDOMTEXT
RANDOMSAMPLETEXT
with open('sampleText.txt') as sampleText:
content = sampleText.read()
content = content.split("\n")[1]
content = content[:6]
print(content)
Output will be
SAMPLE
I think you want something like this:
with open("irumfas.fas", "r") as file:
second_line = file.readlines()[1]
print(second_line[0:9])
readlines() will give you a list of the lines -- which we index to get only the 2nd line. Your existing code will iterate over all the lines (which is not what you want).
As for extracting a certain range, you can use list slices to select the range of characters you want from that line -- in the example above, its the first 10.
You can slice the line[1] in the file as you would slice a list.
You were very close:
end = 6 # number of characters
with open ("irumfas.fas", "r") as file:
first_chars = [line[1][:end] for line in file if not line.isspace()]
print(first_chars)

Calculate the average value of the numbers in a file

I am having a problem of calculating the average value of numbers in a file.
So far i have made a function that reads in files and calculate the number of lines.
The file consists of many columns of numbers, but the column 8 is the one i need to calculate from.
def file_read():
fname = input("Input filname: ")
infile = open(fname,'r')
txt = infile.readlines()
print("opens",fname,"...")
num_lines = sum(1 for line in open(fname))
#The first line in the file is only text, so i subtract 1
print("Number of days:",(num_lines-1))
The numbers are also decimals, so i use float.
This is my try on calculating the sum of numbers,
which shall be divided by the number of lines , but i comes an error, cuz the first line is text.
with open(fname) as txt:
return sum(float(x)
for line in txt
for x in line.split()[8]
Is there a way i can get python to ignore the first line and just concentrate about the numbers down under?
You could use txt.readline() to read the first line, but to stick with iterators way to do it, just drop the first line using iteration on file with next
with open(fname) as txt:
next(txt) # it returns the first line, we just ignore the return value
# your iterator is now on the second line, where the numbers are
for line in txt:
...
Side note: this is also very useful to skip title lines of files open with the csv module, that's where next is better than readline since csv title can be on multiple lines.
Try this
import re
#regular expression for decimals
digits_reg = re.compile(r"\d+\.\d+|\d+")
with open('''file name''', "r") as file:
allNum = []
#find numbers in each line and add them to the list
for line in file:
allNum.extend(digits_reg.findall(line))
#should be a list that contains all numbers in the file
print(alNum)

How to count the number of characters in a file (not using the len function)?

Basically, I want to be able to count the number of characters in a txt file (with user input of file name). I can get it to display how many lines are in the file, but not how many characters. I am not using the len function and this is what I have:
def length(n):
value = 0
for char in n:
value += 1
return value
filename = input('Enter the name of the file: ')
f = open(filename)
for data in f:
data = length(f)
print(data)
All you need to do is sum the number of characters in each line (data):
total = 0
for line in f:
data = length(line)
total += data
print(total)
There are two problems.
First, for each line in the file, you're passing f itself—that is, a sequence of lines—to length. That's why it's printing the number of lines in the file. The length of that sequence of lines is the number of lines in the file.
To fix this, you want to pass each line, data—that is, a sequence of characters. So:
for data in f:
print length(data)
Next, while that will properly calculate the length of each line, you have to add them all up to get the length of the whole file. So:
total_length = 0
for data in f:
total_length += length(data)
print(total_length)
However, there's another way to tackle this that's a lot simpler. If you read() the file, you will get one giant string, instead of a sequence of separate lines. So you can just call length once:
data = f.read()
print(length(data))
The problem with this is that you have to have enough memory to store the whole file at once. Sometimes that's not appropriate. But sometimes it is.
When you iterate over a file (opened in text mode) you are iterating over its lines.
for data in f: could be rewritten as for line in f: and it is easier to see what it is doing.
Your length function looks like it should work but you are sending the open file to it instead of each line.

Converting .txt file to list AND be able to index and print list line by line

I want to be able to read the file line by line and then when prompted (say user inputs 'background'), it returns lines 0:24 because those are the lines in the .txt that relate to his/her background.
def anaximander_background():
f = open('Anaximander.txt', 'r')
fList = []
fList = f.readlines()
fList = [item.strip('\n') for item in fList]
print(fList[:20])
This code prints me the list like:
['ANAXIMANDER', '', 'Anaximander was born in Miletus in 611 or 610 BCE.', ...]
I've tried a lot of different ways (for, if, and while loops) and tried the csv import.
The closest I've gotten was being able to have a print out akin to:
[ANAXIMANDER]
[]
[info]
and so on, depending on how many objects I retrieve from fList.
I really want it to print like the example I just showed but without the list brackets ([ ]).
Definitely can clarify if necessary.
Either loop over the list, or use str.join():
for line in fList[:20]:
print(line)
or
print('\n'.join(fList[:20])
The first print each element contained in the fList slice separately, the second joins the lines into a new string with \n newline characters between them before printing.
To print the first 20 lines from a file:
import sys
from itertools import islice
with open('Anaximander.txt') as file:
sys.stdout.writelines(islice(file, 20))

Categories

Resources