Reading specific data from a text file - python

I have a .dat file that contains over 40,000 lines containing text and data. I want to extract specific data from this file according to the following:
I need a line counter, obviously, so I know when I reach the end of the file.
I want to open the file for reading and another for writing, and read the first line. If the line 2 positions from the first line begins with "Model", I want to print a blank line to the file open for writing and then skip two lines ahead in the file. If the line two positions from the opening line does not start with "Model", then I wish to select the text that is 8 positions from this first line and print that to the file opened for writing. I will then move 11 positions from the first line and so on.
infile = open("ratios.dat","r")
outfile = open("corr_ratios.txt","w")
for aline in infile:
items = (aline+2).split()
if items[0] = "Model"
outfile.write("\n")
aline = aline+2
else
items = aline+8
outfile.write(items)

Files in python are their own iterators and can be worked with / advanced a line at a time like so:
with open('path-to-file.txt') as infile:
for line in infile:
# code here to deal with line.
Additionally, because the file handle is an iterator, it can be advanced explicitly as well:
with open('path-to-file.txt') as infile:
for line in infile:
if condition:
# skip a line
next(infile)
Combining the two, you should be able to use lines, skip lines, etc.
Having reviewed your posted code closer, you're attempting to add an integer to a string (aline + 2). To come closer to your attempted approach, you'd actually do something like this:
lines = infile.readlines()
for lineno, line in enumerate(lines):
targetline = lines[lineno + 2]
This approach loads the entire file into memory, which may or may not be suitable depending on your file size.

Related

Trying to append quotes to each item in list.

I am trying to create a script that will take each line in my text file which includes one rule name in each of them. The first script I created worked (finished) but would delete everything in the file. I have been googling for past hour or so trying to take examples and apply them on my own but keep failing. The current script is as follows.
with open('TDAppendlist.txt', 'w') as file:
for line in file:
s = ('""')
seq = (file)
s.join(seq)
with open('TDAppendlist.txt') as file:
line = file.readlines()
for line in file:
line.join('"' + line + '"')
Neither of them are working. Could someone please point me in the correct direction? Thank you all for reading.
First, we'll read all the lines of the file into a list, then we can change them, and finally write them back to the file.
with open('TDAppendlist.txt') as file:
lines = list(file)
with open('TDAppendlist.txt', 'w') as file:
file.write('\n'.join(['"{}"'.format(line.rstrip('\n')) for line in lines]))
That last line can be written out to be more clear
lines = (line.rstrip('\n') for line in lines)
lines = ('"{}"'.format(line) for line in lines)
lines = '\n'.join(lines)
file.write(lines)
This produces an output file TDAppendlist_out that is just like the input, but with quotes surrounding the lines:
with open('TDAppendlist.txt', 'r') as f:
with open('TDAppendlist_out.txt', 'w') as f_out:
for line in f:
f_out.write('\"{}\"'.format(line))
This keeps the input file intact as is should you need it later, and avoids putting everything in the input file into memory all at once.

How do i check for a keyword in a specific line of a text file? python [duplicate]

I want to go to line 34 in a .txt file and read it. How would you do that in Python?
Use Python Standard Library's linecache module:
line = linecache.getline(thefilename, 33)
should do exactly what you want. You don't even need to open the file -- linecache does it all for you!
This code will open the file, read the line and print it.
# Open and read file into buffer
f = open(file,"r")
lines = f.readlines()
# If we need to read line 33, and assign it to some variable
x = lines[33]
print(x)
A solution that will not read more of the file than necessary is
from itertools import islice
line_number = 34
with open(filename) as f:
# Adjust index since Python/islice indexes from 0 and the first
# line of a file is line 1
line = next(islice(f, line_number - 1, line_number))
A very straightforward solution is
line_number = 34
with open(filename) as f:
f.readlines()[line_number - 1]
There's two ways:
Read the file, line by line, stop when you've gotten to the line you want
Use f.readlines() which will read the entire file into memory, and return it as a list of lines, then extract the 34th item from that list.
Solution 1
Benefit: You only keep, in memory, the specific line you want.
code:
for i in xrange(34):
line = f.readline();
# when you get here, line will be the 34th line, or None, if there wasn't
# enough lines in the file
Solution 2
Benefit: Much less code
Downside: Reads the entire file into memory
Problem: Will crash if less than 34 elements are present in the list, needs error handling
line = f.readlines()[33]
You could just read all the lines and index the line your after.
line = open('filename').readlines()[33]
for linenum,line in enumerate(open("file")):
if linenum+1==34: print line.rstrip()
I made a thread about this and didn't receive help so I took matter into my own hands.
Not any complicated code here.
import linecache
#Simply just importing the linecache function to read our line of choosing
number = int(input("Enter a number from 1-10 for a random quote "))
#Asks the user for which number they would like to read(not necessary)
lines = linecache.getline("Quotes.txt", number)
#Create a new variable in order to grab the specific line, the variable
#integer can be replaced by any integer of your choosing.
print(lines)
#This will print the line of your choosing.
If you are completing this in python make sure you have both files (.py) and (.txt) in the same location otherwise python will not be able to retrieve this, unless you specify the file location. EG.
linecache.getline("C:/Directory/Folder/Quotes.txt
This is used when the file is in another folder than the .py file you are using.
Hope this helps!
Option that always closes the file and doesn't load the whole file into memory
with open('file.txt') as f:
for i, line in enumerate(f):
if i+1 == 34: break
print(line.rstrip())

how to save changes after modifying content in file using Python

I want to insert a line into file "original.txt" (the file contains about 200 lines). the line neds to be inserted two lines after a string is found in one of the existing lines. This is my code, I am using a couple of print options that show me that the line is being added to the list, in the spot I need, but the file "original.txt" is not being edited
with open("original.txt", "r+") as file:
lines = file.readlines() # makes file into a list of lines
print(lines) #test
for number, item in enumerate(lines):
if testStr in item:
i = number +2
print(i) #test
lines.insert(i, newLine)
print(lines) #test
break
file.close()
I am turning the lines in the text into a list, then I enumerate the lines as I look for the string, assigning the value of the line to i and adding 2 so that the new line is inserted two lines after, the print() fiction shows the line was added in the correct spot, but the text "original.txt" is not modified
You seem to misunderstand what your code is doing. Lets go line by line
with open("original.txt", "r+") as file: # open a file for reading
lines = file.readlines() # read the contents into a list of lines
print(lines) # print the whole file
for number, item in enumerate(lines): # iterate over lines
if testStr in item:
i = number +2
print(i) #test
lines.insert(i, newLine) # insert lines into the list
print(lines) #test
break # get out of the look
file.close() # not needed, with statement takes care of closing
You are not modifying the file. You read the file into a list of strings and modify the list. To modify the actual file you need to open it for writing and write the list back into it. Something like this at the end of the code might work
with open("modified.txt", "w") as f:
for line in lines: f.write(line)
You never modified the original text. Your codes reads the lines into local memory, one at a time. When you identify your trigger, you count two lines, and then insert the undefined value newLine into your local copy. At no point in your code did you modify the original file.
One way is to close the file and then rewrite it from your final value of lines. Do not modify the file while you're reading it -- it's good that you read it all in and then start processing.
Another way is to write to a new file as you go, then use a system command to replace the original file with your new version.

From excel to txt - Separate lines

I'm doing a program where I export an excel file to .txt and I have to import this .txt file into my program. The main goal is to extract the same part from each line but the problem is that in the .txt file the lines of the excel are being made into a huge string with no /n. Do you know if there is a way to separate them within the program and if so how can I do it?
The file I'm working with can be downloaded in http://we.tl/YtixI1ck6l
and so far I was trying something like
ppi = []
for line in read_text:
prot_interaction = line[0:14]
ppi.append(prot_interaction)
result_ppi = []
for line in read_text:
result = line[-1]
result_ppi.append(result)
But since it's not formatted in lines but just in a single one I'm not getting any good results.
Using that file as an example, use the csv module to parse it.
Example:
import csv
with open('/tmp/Model_Oralome.txt', 'rU') as f:
reader=csv.reader(f, delimiter="\t")
for row in reader:
print row[0]
Prints:
ppi
C4FQL5;Q08426
C8PB60;D2NP19
P40189;Q05655
P22712;Q9NR31
...
P05783;P02751
B5E709;D2NPK7
Q8N7J2;Q9UKZ4
(BTW, the issue you may be having with this particular file is the line terminations are a CR only from a Mac Classic OS. You can fix that in Python by using the Universal Newline mode when you open the file...)
Excel is exporting the text file with carriage returns (\r) instead of newlines (\n).
ppi = []
with open("Model_Oralome.txt",'r') as f:
lines = f.readlines()
lines = lines[0].split('\r')
From here you can iterate through each line of lines. Since it looks like you want the value of the first column:
lines = lines[1:]
for line in lines:
content = line.split('\t')
ppi.append(content[0])

Go to a specific line in Python?

I want to go to line 34 in a .txt file and read it. How would you do that in Python?
Use Python Standard Library's linecache module:
line = linecache.getline(thefilename, 33)
should do exactly what you want. You don't even need to open the file -- linecache does it all for you!
This code will open the file, read the line and print it.
# Open and read file into buffer
f = open(file,"r")
lines = f.readlines()
# If we need to read line 33, and assign it to some variable
x = lines[33]
print(x)
A solution that will not read more of the file than necessary is
from itertools import islice
line_number = 34
with open(filename) as f:
# Adjust index since Python/islice indexes from 0 and the first
# line of a file is line 1
line = next(islice(f, line_number - 1, line_number))
A very straightforward solution is
line_number = 34
with open(filename) as f:
f.readlines()[line_number - 1]
There's two ways:
Read the file, line by line, stop when you've gotten to the line you want
Use f.readlines() which will read the entire file into memory, and return it as a list of lines, then extract the 34th item from that list.
Solution 1
Benefit: You only keep, in memory, the specific line you want.
code:
for i in xrange(34):
line = f.readline();
# when you get here, line will be the 34th line, or None, if there wasn't
# enough lines in the file
Solution 2
Benefit: Much less code
Downside: Reads the entire file into memory
Problem: Will crash if less than 34 elements are present in the list, needs error handling
line = f.readlines()[33]
You could just read all the lines and index the line your after.
line = open('filename').readlines()[33]
for linenum,line in enumerate(open("file")):
if linenum+1==34: print line.rstrip()
I made a thread about this and didn't receive help so I took matter into my own hands.
Not any complicated code here.
import linecache
#Simply just importing the linecache function to read our line of choosing
number = int(input("Enter a number from 1-10 for a random quote "))
#Asks the user for which number they would like to read(not necessary)
lines = linecache.getline("Quotes.txt", number)
#Create a new variable in order to grab the specific line, the variable
#integer can be replaced by any integer of your choosing.
print(lines)
#This will print the line of your choosing.
If you are completing this in python make sure you have both files (.py) and (.txt) in the same location otherwise python will not be able to retrieve this, unless you specify the file location. EG.
linecache.getline("C:/Directory/Folder/Quotes.txt
This is used when the file is in another folder than the .py file you are using.
Hope this helps!
Option that always closes the file and doesn't load the whole file into memory
with open('file.txt') as f:
for i, line in enumerate(f):
if i+1 == 34: break
print(line.rstrip())

Categories

Resources