Python, select a specific line - python

I want to read a textfile using python and print out specific lines. The problem is that I want to print a line which starts with the word "nominal" (and I know how to do it) and the line following this which is not recognizable by some specific string. Could you point me to some lines of code that are able to do that?

In good faith and under the assumption that this will help you start coding and showing some effort, here you go:
file_to_read = r'myfile.txt'
with open(file_to_read, 'r') as f_in:
flag = False
for line in f_in:
if line.startswith('nominal'):
print(line)
flag = True
elif flag:
print(line)
flag = False
it might work out-of-the-box but please try to spend some time going through it and you will definitely get the logic behind it. Note that text comparison in python is case sensitive.

If the file isn't too large, you can put it all in a list:
def printLines(fname):
with open(fname) as f:
lines = f.read().split('\n')
if len(lines) == 0: return None
if lines[0].startswith('Nominal'): print(lines[0])
for i, line in enumerate(lines[1:]):
if lines[i-1].startswith('Nominal') or line.startswith('Nominal'):
print(line)
Then e.g. printLines('test.txt') will do what you want.

Related

remove specific endline breaks in Python

I have a long fasta file and I need to format the lines. I tried many things but since I'm not much familiar python I couldn't solve exactly.
>seq1
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>seq2
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
I want them to look like:
>seq1
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>seq2
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
I've tried this:
a_file = open("file.fasta", "r")
string_without_line_breaks = ""
for line in a_file:
if line[0:1] == ">":
continue
else:
stripped_line = line.rstrip()
string_without_line_breaks += stripped_line
a_file.close()
print(string_without_line_breaks)
But the result not showing ">" lines and also merging all other lines. Hope you can help me about it. Thank you
A common arrangement is to remove the newline, and then add it back when you see the next record.
# Use a context manager (with statement)
with open("file.fasta", "r") as a_file:
# Keep track of whether we have written something without a newline
written_lines = False
for line in a_file:
# Use standard .startswith()
if line.startswith(">"):
if written_lines:
print()
written_lines = False
print(line, end='')
else:
print(line.rstrip('\n'), end='')
written_lines = True
if written_lines:
print()
A common beginner bug is forgetting to add the final newline after falling off the end of the loop.
This simply prints one line at a time and doesn't return anything. Probably a better design would be to collect and yield one FASTA record (header + sequence) at a time, probably as an object. and have the caller decide what to do with it; but then, you probably want to use an existing library which does that - BioPython seems to be the go-to solution for bioinformatics.
Since you’re working with FASTA data, another solution would be to use a dedicated library, in which case what you want is a one-liner:
from Bio import SeqIO
SeqIO.write(SeqIO.parse('file.fasta', 'fasta'), sys.stdout, 'fasta-2line')
Using the 'fasta-2line' format description tells SeqIO.write to omit line breaks inside sequences.
First the usual disclaimer: operate on files using a with block when at all possible. Otherwise they won't be closed on error.
Observe that you want to remove newlines on every line not starting with >, except the last one of every block. You can achieve the same effect by stripping the newline after every line that doesn't start with >, and prepend a newline to each line starting with > except the first.
out = sys.stdout
with open(..., 'r') as file:
first = True
hasline = False
for line in file:
if line.startswith('>'):
if not first:
out.write('\n')
out.write(line)
first = False
else:
out.write(line.rstrip())
hasline = True
if hasline:
out.write('\n')
Printing as you go is much simpler than accumulating the strings in this case. Printing to a file using the write method is simpler than using print when you're just transcribing lines.
I have edited some mistakes in your code.
a_file = open("file.fasta", "r")
string_without_line_breaks = ""
needed_lines = []
for line in a_file:
if line.strip().startswith(">") or line.strip() == "":
# If there was any lines appended before, commit it.
if string_without_line_breaks != "":
needed_lines.append(string_without_line_breaks)
string_without_line_breaks = ""
needed_lines.append(line)
continue
else:
stripped_line = line.strip()
string_without_line_breaks += stripped_line
a_file.close()
print("\n".join(needed_lines))
Please make sure to add the lines containing the right bracket (>) to your string.
a_file = open("file.fasta", "r")
string_without_line_breaks = ""
for line in a_file:
if line[0:1] == ">":
string_without_line_breaks += "\n" + line
continue
else:
stripped_line = line.rstrip()
string_without_line_breaks += stripped_line
a_file.close()
print(string_without_line_breaks)
By the way, you can turn this into a one liner:
import re
with open("file.fasta", 'r') as f:
data = f.read()
result = re.sub(r"^(?!>)(.*)$\n(?!>)", r"\1", data, flags=re.MULTILINE)
print(result)
The regex contains a negative lookahead to prevent trimming lines starting with >, and prevents trimming lines that are right before a >

Include surrounding lines of text file match in output using Python 2.7.3

I've been working on a program which assists in log analysis. It finds error or fail messages using regex and prints them to a new .txt file. However, it would be much more beneficial if the program including the top and bottom 4 lines around what the match is. I can't figure out how to do this! Here is a part of the existing program:
def error_finder(filepath):
source = open(filepath, "r").readlines()
error_logs = set()
my_data = []
for line in source:
line = line.strip()
if re.search(exp, line):
error_logs.add(line)
I'm assuming something needs to be added to the very last line, but I've been working on this for a bit and either am not applying myself fully or just can't figure it out.
Any advice or help on this is appreciated.
Thank you!
Why python?
grep -C4 '^your_regex$' logfile > outfile.txt
Some comments:
I'm not sure why error_logs is a set instead of a list.
Using readlines() will read the entire file in memory, which will be inefficient for large files. You should be able to just iterate over the file a line at a time.
exp (which you're using for re.search) isn't defined anywhere, but I assume that's elsewhere in your code.
Anyway, here's complete code that should do what you want without reading the whole file in memory. It will also preserve the order of input lines.
import re
from collections import deque
exp = '\d'
# matches numbers, change to what you need
def error_finder(filepath, context_lines = 4):
source = open(filepath, 'r')
error_logs = []
buffer = deque(maxlen=context_lines)
lines_after = 0
for line in source:
line = line.strip()
if re.search(exp, line):
# add previous lines first
for prev_line in buffer:
error_logs.append(prev_line)
# clear the buffer
buffer.clear()
# add current line
error_logs.append(line)
# schedule lines that follow to be added too
lines_after = context_lines
elif lines_after > 0:
# a line that matched the regex came not so long ago
lines_after -= 1
error_logs.append(line)
else:
buffer.append(line)
# maybe do something with error_logs? I'll just return it
return error_logs
I suggest to use index loop instead of for each loop, try this:
error_logs = list()
for i in range(len(source)):
line = source[i].strip()
if re.search(exp, line):
error_logs.append((line,i-4,i+4))
in this case your errors log will contain ('line of error', line index - 4, line index + 4), so you can get these lines later form "source"

Use Python to remove lines in a files that start with an octothorpe?

This seems like a straight-forward question but I can't seem to pinpoint my problem. I am trying to delete all lines in a file that start with an octothorpe (#) except the first line. Here is the loop I am working with:
for i, line in enumerate(input_file):
if i > 1:
if not line.startswith('#'):
output.write(line)
The above code doesn't seem to work. Does anyone known what my problem is? Thanks!
You aren't writing out the first line:
for i, line in enumerate(input_file):
if i == 0:
output.write(line)
else:
if not line.startswith('#'):
output.write(line)
Keep in mind also that enumerate (like most things) starts at zero.
A little more concisely (and not repeating the output line):
for i, line in enumerate(input_file):
if i == 0 or not line.startswith('#'):
output.write(line)
I wouldn't bother with enumerate here. You only need it decide which line is the first line and which isn't. This should be easy enough to deal with by simply writing the first line out and then using a for loop to conditionally write additional lines that do not start with a '#'.
def removeComments(inputFileName, outputFileName):
input = open(inputFileName, "r")
output = open(outputFileName, "w")
output.write(input.readline())
for line in input:
if not line.lstrip().startswith("#"):
output.write(line)
input.close()
output.close()
Thanks to twopoint718 for pointing out the advantage of using lstrip.
Maybe you want to omit lines from the output where the first non-whitespace character is an octothorpe:
for i, line in enumerate(input_file):
if i == 0 or not line.lstrip().startswith('#'):
output.write(line)
(note the call to lstrip)

Python 2.5.2: remove what found between two lines that contain two concrete strings

is there any way to remove what found between two lines that contain two concrete strings?
I mean: I want to remove anything found between 'heaven' and 'hell' in a text file with this text:
I'm in heaven
foobar
I'm in hell
After executing the script/function I'm asking the text file will be empty.
Use a flag to indicate whether you're writing or not.
from __future__ import with_statement
writing = True
with open('myfile.txt') as f:
with open('output.txt') as out:
for line in f:
if writing:
if "heaven" in line:
writing = False
else:
out.write(line)
elif "hell" in line:
writing = True
os.remove('myfile.txt')
os.rename('output.txt', 'myfile.txt')
EDIT
As extraneon pointed in the comments, the requirement is to remove the lines between two concrete strings. That means that if the second (closing) string is never found, nothing should be removed. That can be achieved by keeping a buffer of lines. The buffer gets discarded if the closing string "I'm in hell" is found, but if the end of file is reached without finding it, the whole contents must be written to the file.
Example:
I'm in heaven
foo
bar
Should keep the whole contents since there's no closing tag and the question says between two lines.
Here's an example to do that, for completion:
from __future__ import with_statement
writing = True
with open('myfile.txt') as f:
with open('output.txt') as out:
for line in f:
if writing:
if "heaven" in line:
writing = False
buffer = [line]
else:
out.write(line)
elif "hell" in line:
writing = True
else:
buffer.append(line)
else:
if not writing:
#There wasn't a closing "I'm in hell", so write buffer contents
out.writelines(buffer)
os.remove('myfile.txt')
os.rename('output.txt', 'myfile.txt')
Looks like by "remove" you mean "rewrite the input file in-place" (or make it look like you're so doing;-), in which case fileinput.input helps:
import fileinput
writing = True
for line in fileinput.input(['thefile.txt'], inplace=True):
if writing:
if 'heaven' in line: writing = False
else: print line,
else:
if 'hell' in line: writing = True
You could do something like the following with regular expressions. There are probably more efficient ways to do it since I'm still learning a lot of python, but this should work.
import re
f = open('hh_remove.txt')
lines = f.readlines()
pattern1 = re.compile("heaven",re.I)
pattern2 = re.compile("hell",re.I)
mark1 = False
mark2 = False
for i, line in enumerate(lines):
if pattern1.search(line) != None:
mark1 = True
set1 = i
if pattern2.search(line) != None:
mark2 = True
set2 = i+1
if ((mark1 == True) and (mark2 == True)):
del lines[set1:set2]
mark1 = False
mark2 = False
f.close()
out = open('hh_remove.txt','w')
out.write("".join(lines))
out.close()
I apologize but this sounds like a homework problem. We have a policy on these: https://meta.stackexchange.com/questions/10811/homework-on-stackoverflow
However, what I can say is that the feature #nosklo wrote about is available in any Python 2.5.x (or newer), but you need to learn enough Python to enable it. :-)
My solution would involve using creating a new string with the undesired stuff stripped out using str.find() or str.index() (or some relative of those 2).
Best of luck!
see below. I dont know if it's ok but It seems is working ok.
import re,fileinput,os
for path, dirs, files in os.walk(path):
for filename in files:
fullpath = os.path.join(path, filename)
f = open(fullpath,'r')
data = f.read()
patter = re.compile('Im in heaven.*?Im in hell', re.I | re.S)
data = patter.sub("", data)
f.close()
f = open(fullpath, 'w')
f.write(data)
f.close()
Anyway when i execute it, it leaves a blank line. I mean, if have this function:
public function preFetchAll(Doctrine_Event $event){
//Im in heaven
$a = sfContext::getInstance()->getUser()->getAttribute("passw.formulario");
var_dump($a);
//Im in hell
foreach ($this->_listeners as $listener) {
$listener->preFetchAll($event);
}
}
and i execute my script, i get this:
public function preFetchAll(Doctrine_Event $event){
foreach ($this->_listeners as $listener) {
$listener->preFetchAll($event);
}
}
As you can see there is an empty line between "public..." and "foreach...".
Why?
Javi

python style question around reading small files

What is the most pythonic way to read in a named file, strip lines that are either empty, contain only spaces, or have # as a first character, and then process remaining lines? Assume it all fits easily in memory.
Note: it's not tough to do this -- what I'm asking is for the most pythonic way. I've been writing a lot of Ruby and Java and have lost my feel.
Here's a strawman:
file_lines = [line.strip() for line in open(config_file, 'r').readlines() if len(line.strip()) > 0]
for line in file_lines:
if line[0] == '#':
continue
# Do whatever with line here.
I'm interested in concision, but not at the cost of becoming hard to read.
Generators are perfect for tasks like this. They are readable, maintain perfect separation of concerns, and efficient in memory-use and time.
def RemoveComments(lines):
for line in lines:
if not line.strip().startswith('#'):
yield line
def RemoveBlankLines(lines):
for line in lines:
if line.strip():
yield line
Now applying these to your file:
filehandle = open('myfile', 'r')
for line in RemoveComments(RemoveBlankLines(filehandle)):
Process(line)
In this case, it's pretty clear that the two generators can be merged into a single one, but I left them separate to demonstrate their composability.
lines = [r for r in open(thefile) if not r.isspace() and r[0] != '#']
The .isspace() method of strings is by far the best way to test if a string is entirely whitespace -- no need for contortions such as len(r.strip()) == 0 (ech;-).
for line in open("file"):
sline=line.strip()
if sline and not sline[0]=="#" :
print line.strip()
output
$ cat file
one
#
#
two
three
$ ./python.py
one
two
three
I would use this:
processed = [process(line.strip())
for line in open(config_file, 'r')
if line.strip() and not line.strip().startswith('#')]
The only ugliness I see here is all the repeated stripping. Getting rid of it complicates the function a bit:
processed = [process(line)
for line in (line.strip() for line in open(config_file, 'r'))
if line and not line.startswith('#')]
This matches the description, ie
strip lines that are either empty,
contain only spaces, or have # as a
first character, and then process
remaining lines
So lines that start or end in spaces are passed through unfettered.
with open("config_file","r") as fp:
data = (line for line in fp if line.strip() and not line.startswith("#"))
for item in data:
print repr(item)
I like Paul Hankin's thinking, but I'd do it differently:
from itertools import ifilter, ifilterfalse, imap
with open(r'c:\temp\testfile.txt', 'rb') as f:
s1 = ifilterfalse(str.isspace, f)
s2 = ifilter(lambda x: not x.startswith('#'), s1)
s3 = imap(str.rstrip, s2)
print "\n".join(s3)
I'd probably only do it this way instead of using some of the more obvious approaches suggested here if I were concerned about memory usage. And I might define an iscomment function to eliminate the lambda.
The file is small, so performance is not really an issue. I will go for clarity than conciseness:
fp = open('file.txt')
for line in fp:
line = line.strip()
if line and not line.startswith('#'):
# process
fp.close()
If you want, you can wrap this in a function.
Using slightly newer idioms (or with Python 2.5 from __future__ import with) you could do this, which has the advantage of cleaning up safely yet is quite concise.
with file('file.txt') as fp:
for line in fp:
line = line.strip()
if not line or line[0] == '#':
continue
# rest of processing here
Note that stripping the line first means the check for "#" will actually reject lines with that as the first non-blank, not merely "as first character". Easy enough to modify if you're strict about that.

Categories

Resources