startswith() can't find '//' at the front of my string - python

I've got a pretty simple python script that reads in a file, and parses it line by line.
It doesn't seem to recognize the '//' at the start of my lines. If I change it to look for '#' at the start of my lines, it doesn't find those lines either. Am I just misunderstanding this?
line = fIn.readline()
while line:
print "line is", line
line = line.strip()
if line.startswith('//'):
print "winner"
line = fIn.readline()
The file I'm reading in looks like this:
// Feedback
"Feedback" = "Feedback";
// New strings
"File URL not reachable." = "File URL not reachable.";
And the debug line looks appropriate when it prints out:
line is // Feedback
line is "Feedback" = "Feedback";
line is
line is // New strings
line is "File URL not reachable." = "File URL not reachable.";
line is

Better version:
with open("abc") as f:
for line in f:
line=line.strip()
if line and line.startswith("//"):
print "line is",line
print "winner"
print next(f)
....:
output:
line is // Feedback
winner
"Feedback" = "Feedback";
line is // New strings
winner
"File URL not reachable." = "File URL not reachable.";

You are only reading one line of your text file. Other than you have the wrong indent on the last line, it seems to work. Try running your program after making sure line = fIn.readline() gets executed on each iteration (move it one block to the left).
Here is what I get after fixing that one line, is this the desired output?
line is // Feedback
winner
line is "Feedback" = "Feedback";
line is
line is // New strings
winner
line is "File URL not reachable." = "File URL not reachable.";
Edit: does this work for you?
for line in open("yourfile.txt").readlines():
print "line is", line
line = line.strip()
if line.startswith('//'):
print "winner"

try this
for line in fIn:
print "line is", line
line = line.strip()
if line[0:2]=='//':
print "winner"
line = fIn.readline()

Related

Not printing file contents

I'm trying to read and print the contents of a text file, but nothing shows up:
coffee = open('coffeeInventory.txt' , 'r')
coffee.seek(0)
line = coffee.readline()
while line != '':
print(line)
coffee.close()
Thank you for any advice.
Try this:
with open('coffeeInventory.txt') as inf:
for line in inf:
print(line, end='')
readline leaves a newline on the end of the line, so use end='' to prevent print from appending its own newline.
Try this code for each line, please:
file = open('coffeeInventory.txt')
lines = file.readlines()
for line in lines:
print(line)
file.close()

How can we write a text file from variable using python?

I am working on NLP project and have extracted the text from pdf using PyPDF2. Further, I removed the blank lines. Now, my output is being shown on the console but I want to populate the text file with the same data which is stored in my variable (file).
Below is the code which is removing the blank lines from a text file.
for line in open('resume1.txt'):
line = line.rstrip()
if line != '':
file=line
print(file)
Output on Console:
Eclipse,
Visual Studio 2012,
Arduino IDE,
Java
,
HTML,
CSS
2013
Excel
.
Now, I want the same data in my (resume1.txt) text file. I have used three methods but all these methods print a single dot in my resume1.txt file. If I see at the end of the text file then there is a dot which is being printed.
Method 1:
with open("resume1.txt", "w") as out_file:
out_file.write(file)
Method 2:
print(file, file=open("resume1.txt", 'w'))
Method 3:
pathlib.Path('resume1.txt').write_text(file)
Could you please be kind to assist me in populating the text file. Thank you for your cooperation.
First of all, note that you are writing to the same file losing the old data, I don't know if you want to do that. Other than that, every time you write using those methods, you are overwriting the data you previously wrote to the output file. So, if you want to use these methods, you must write just 1 time (write all the data).
SOLUTIONS
Using method 1:
to_file = []
for line in open('resume1.txt'):
line = line.rstrip()
if line != '':
file = line
print(file)
to_file.append(file)
to_save = '\n'.join(to_file)
with open("resume1.txt", "w") as out_file:
out_file.write(to_save)
Using method 2:
to_file = []
for line in open('resume1.txt'):
line = line.rstrip()
if line != '':
file = line
print(file)
to_file.append(file)
to_save = '\n'.join(to_file)
print(to_save, file=open("resume1.txt", 'w'))
Using method 3:
import pathlib
to_file = []
for line in open('resume1.txt'):
line = line.rstrip()
if line != '':
file = line
print(file)
to_file.append(file)
to_save = '\n'.join(to_file)
pathlib.Path('resume1.txt').write_text(to_save)
In these 3 methods, I have used to_save = '\n'.join(to_file) because I'm assuming you want to separate each line of other with an EOL, but if I'm wrong, you can just use ''.join(to_file) if you want not space, or ' '.join(to_file) if you want all the lines in a single one.
Other method
You can do this by using other file, let's say 'output.txt'.
out_file = open('output.txt', 'w')
for line in open('resume1.txt'):
line = line.rstrip()
if line != '':
file = line
print(file)
out_file.write(file)
out_file.write('\n') # EOL
out_file.close()
Also, you can do this (I prefer this):
with open('output.txt', 'w') as out_file:
for line in open('resume1.txt'):
line = line.rstrip()
if line != '':
file = line
print(file)
out_file.write(file)
out_file.write('\n') # EOL
First post on stack, so excuse the format
new_line = ""
for line in open('resume1.txt', "r"):
for char in line:
if char != " ":
new_line += char
print(new_line)
with open('resume1.txt', "w") as f:
f.write(new_line)

How to seek file pointer line by line in python

I am trying to seek the file pointer line by line, i found the following code
fo = open("temp.tmp", "r")
print "Name of the file: ", fo.name
"""Assuming file has following 5 lines:
This is 1st line
This is 2nd line
This is 3rd line
This is 4th line
This is 5th line
"""
line = fo.readline()
print "Read Line: %s" % (line)
# Again set the pointer to the beginning
fo.seek(3, 0)
line = fo.readline()
print "Read Line: %s" % (line)
fo.close()
but, it moves file pointer character by character, is there any way to seek the pointer line by line

parsing .xml blast output with re

I'm trying to parse BLAST output in XML format using re, have never done it before, below is my code.
However,since some hits have Hsp_num sometimes more than once, I get more results for query_from and query_to, and less for query_len, how to specify that if Hsp_num is more than 1 do print query_len for it again? thank you
import re
output = open('result.txt','w')
n = 0
with open('file.xml','r') as xml:
for line in xml:
if re.search('<Hsp_query-from>', line) != None:
line = line.strip()
line = line.rstrip()
line = line.strip('<Hsp_query-from>')
line = line.rstrip('</')
query_from = line
if re.search('<Hsp_query-to>', line) != None:
line = line.strip()
line = line.rstrip()
line = line.strip('<Hsp_query-to>')
line = line.rstrip('</')
query_to = line
if re.search('<Hsp_num>', line) != None:
line = line.strip()
line = line.rstrip()
line = line.strip('<Hsp_num>')
line = line.rstrip('</')
Hsp_num = line
print >> output, Hsp_num+'\t'+query_from+'\t'+query_to
output.close()
I did query_len in a separate file, since it didnt work..
with open('file.xml','r') as xml:
for line in xml:
if re.search('<Iteration_query-len>', line) != None:
line = line.strip()
line = line.rstrip()
line = line.strip('<Iteration_query-len>')
line = line.rstrip('</')
query_len = line
Are you familiar with Biopython? Its Bio.Blast.NCBIXML module may be just what you need. Chapter 7 of the Tutorial and Cookbook is all about BLAST, and section 7.3 deals with parsing. You'll get an idea of how it works, and it will be a lot easier than using regex to parse XML, which will only lead to tears and mental breakdowns.

Python: re-formatting multiple lines in text file

I apologize if this post is long, but I am trying to be as detailed as possible. I have done a considerable amount of research on the topic, and would consider myself an "intermediate" skilled programmer.
My problem: I have a text file with multiple lines of data. I would like to remove certain parts of each line in an effort to get rid of some irrelevant information, and then save the file with the newly formatted lines.
Here is an example of what I am trying to accomplish. The original line is something like:
access-list inbound_outside1 line 165 extended permit tcp any host 209.143.156.200 eq www (hitcnt=10086645) 0x3eb90594
I am trying to have the code read the text file, and output:
permit tcp any 209.143.156.200 www
The following code works, but only if there is a single line in the text file:
input_file = open("ConfigInput.txt", "r")
output_file = open("ConfigOutput.txt", "w")
for line in input_file:
line = line.split("extended ", 1)[1]
line = line.split("(", 1)[0]
line = line.replace(" host", "")
line = line.replace(" eq", "")
output_file.write(line)
output_file.close()
input_file.close()
However, when I attempt to run this with a full file of multiple lines of data, I receive an error:
File "C:\Python27\asaReader", line 5, in <module>
line = line.split("extended ", 1)[1]
IndexError: list index out of range
I suspect that it is not moving onto the next line of data in the text file, and therefore there isn't anything in [1] of the previous string. I would appreciate any help I can get on this.
Some possible causes:
You have blank lines in your file (blank lines obviously won't contain the word extended)
You have lines that aren't blank, but don't contain the word extended
You could try printing your lines individually to see where the problem occurs:
for line in input_file:
print("Got line: %s" % (line))
line = line.split("extended ", 1)[1]
Oh, and it's possible that the last line is blank and it's failing on that. It would be easy enough to miss.
Print something out when you hit a line that can't be processed
for line in input_file:
try:
line = line.split("extended ", 1)[1]
line = line.split("(", 1)[0]
line = line.replace(" host", "")
line = line.replace(" eq", "")
output_file.write(line)
except Exception, e:
print "Choked on this line: %r"%line
print e
An alternate approach would be to cache all the lines (assuming the file is not humongous.)
>>> with open('/tmp/ConfigInput.txt', 'rU') as f:
... lines = f.readlines()
...
...
>>> lines
['access-list inbound_outside1 line 165 extended permit tcp any host 209.143.156.200 eq www (hitcnt=10086645) 0x3eb90594\n']
>>> lines = [re.sub('(^.*extended |\(.*$)', '', line) for line in lines]
>>> lines
['permit tcp any host 209.143.156.200 eq www \n']
>>> with open('/tmp/ConfigOutput.txt', 'w') as f:
... f.writelines(lines)
...
...
>>>

Categories

Resources