If statements not catching empty lines in files

If statements not catching empty lines in files - python

I am currently writing a static function where an open file object is passed in as a parameter. It then reads the file, and if the line is empty, it returns False. If the line is not empty, it uses the line in question plus the next three to create a new object of Person class (the class being designed in my module). For some reason, my if statement is not catching newlines, no matter what method I have tried, and I keep getting errors because of it. What am I doing wrong?
#staticmethod
def read_person(fobj):
p_list = []
for line in fobj:
if line.isspace() or line == "\n":
return False
else:
p_list.append(line)
return Person(p_list[0],p_list[1],p_list[2],p_list[3])
Thanks for your help!

The magic you want is:
if line.strip() == "":
You can get caught up in all the little cases possible in blank line processing. Is it space-newline? space-space-newline? tab-newline? space-tab-newline? Etc.
So, don't check all those cases. Use strip() to remove all left and right whitespace. If you have an empty string remaining, it's a blank line, and Bob's your uncle.

Related

using readline() in a function to read through a log file will not iterate

In the code below readline() will not increment. I've tried using a value, no value and variable in readline(). When not using a value I don't close the file so that it will iterate but that and the other attempts have not worked.
What happens is just the first byte is displayed over and over again.
If I don't use a function and just place the code in the while loop (without 'line' variable in readline()) it works as expected. It will go through the log file and print out the different hex numbers.
i=0
x=1
def mFinder(line):
rgps=open('c:/code/gps.log', 'r')
varr=rgps.readline(line)
varr=varr[12:14].rstrip()
rgps.close()
return varr
while x<900:
val=mFinder(i)
i+=1
x+=1
print val
print 'this should change'

It appears you have misunderstood what file.readline() does. Passing in an argument does not tell the method to read a specific numbered line.
The documentation tells you what happens instead:
file.readline([size])
Read one entire line from the file. A trailing newline character is kept in the string (but may be absent when a file ends with an incomplete line). If the size argument is present and non-negative, it is a maximum byte count (including the trailing newline) and an incomplete line may be returned.
Bold emphasis mine, you are passing in a maximum byte count and rgps.readline(1) reads a single byte, not the first line.
You need to keep a reference to the file object around until you are done with it, and repeatedly call readline() on it to get successive lines. You can pass the file object to a function call:
def finder(fileobj):
line = fileobj.readline()
return line[12:14].rstrip()
with open('c:/code/gps.log') as rgps:
x = 0
while x < 900:
section = finder(rgps)
print section
# do stuff
x += 1
You can also loop over files directly, because they are iterators:
for line in openfilobject:
or use the next() function to get a next line, as long as you don't mix .readline() calls and iteration (including next()). If you combine this witha generator function, you can leave the file object entirely to a separate function that will read lines and produce sections until you are done:
def read_sections():
with open('c:/code/gps.log') as rgps:
for line in rgps:
yield line[12:14].rstrip()
for section in read_sections():
# do something with `section`.

Python If == true statement only working on last line of readline

My function only says that the last word in a file of words is an anagram (the first helper function). But every word in the file is an anagram of the word I tested and returns true independently with the helper function outside of the main function. I am not sure if it has something to do with /n being a part of the string, and then it accounting for that, but I tried putting in an if statement saying to delete it if it was in there and that did not work either. I also did test to make sure it is running through each word in the .txt file and it is.
def is_anagram(string1, string2):
"""Returns True if the two strings are anagrams of eachother.
str, str -> bool"""
if sorted(string1)==sorted(string2):
return True
else:
return False
def find_anagrams(word):
final = []
content = open("small_list.txt")
content.close
while True:
line = content.readline()
print(line)
if is_anagram(word, line) == True:
print("bruh")
final.append(line)
elif line == '':
break
return final

This is expected, based on the method you use to read a line (file.readline). From the documentation:
f.readline() reads a single line from the file; a newline character
(\n) is left at the end of the string, and is only omitted on the last
line of the file if the file doesn’t end in a newline.
Your line has a trailing newline, but word certainly does not. So, in the end, all you'd need to change is:
line = content.readline().rstrip()
Well, that's all you'd need to change to get it working. Additionally, I'd also recommend using the with...as context manager to handle file I/O. It's good practice, and you'll thank yourself for it.
with open("small_list.txt") as f:
for line in f:
if is_anagram(word, line.rstrip()):
... # do something here
It's better to use a for loop to iterate over the lines of a file (rather than a while, it's cleaner). Also, there's no need to explicitly call f.close() when you use a context manager (you're not currently doing it, you're only referencing the method without actually calling it).
Incorporating #Christian Dean's suggestion in this answer, you can simplify your anagram function as well - call sorted and return the result in a single line:
def is_anagram(a, b):
return sorted(a) == sorted(b)

find common elements in the strings python

I'm trying to find common elements in the strings reading from a file. And this is what I wrote:
file = open ("words.txt", 'r')
while 1:
line = file.readlines()
if len(line) == 0:
break
print line
file.close
def com_Letters(*strings):
return set.intersection(*map(set,strings))
and the result turns out: ['out\n', 'dog\n', 'pingo\n', 'coconut']
I put com_Letters(line), but the result is empty.

There are two problems, but neither one is with com_Letters.
First, this code guarantees that line will always be an empty list:
while 1:
line = file.readlines()
if len(line) == 0:
break
print line
The first time through the loop, you call readlines(), which will
Read until EOF using readline() and return a list containing the lines thus read.
If the file is empty, that's an empty list, so you'll break.
Otherwise, you'll print out the list, and go back into the loop. At which point readlines() is going to have nothing left to read, since you already read until EOF, so it's guaranteed to be an empty list. Which means you'll break.
Either way, list ends up empty.
It's not clear what you're trying to do with that loop. There's never any good reason to call readlines() repeatedly on the same file. But, even if there were, you'd probably want to accumulate all of the results, rather than just keeping the last (guaranteed-empty) result. Something like this:
while 1:
new_line = file.readlines()
if len(new_line) == 0:
break
print new_line
line += new_line
Anyway, if you fix that problem (e.g., by scrapping the whole loop and just using line = file.readlines()), you're calling com_Letters with a single list of strings. That's not particularly useful; it's just a very convoluted way of calling set. If it's not clear why:
Since there's only one argument (a list of strings), *strings ends up as a one-element tuple of that argument.
map(set, strings) on a single-element tuple just calls set on that element and returns a single-element list.
*map(set, strings) explodes that into one argument, the set.
set.intersection(s) is the same thing as s.intersection(), which just returns s itself.
All of this would be easier to see if you broke up some of those complex expressions and printed the intermediate values. Then you'd know exactly where it first goes wrong, instead of just knowing it's somewhere in a long chain of events.
A few side notes:
You forgot the () on the file.close, which means you're not actually closing the file. One of the many reasons that with is better is that it means you can't make that mistake.
Use plural names for collections. line sounds like a variable that should have a single line in it, not a variable that should have all of your lines.
The readlines function with no sizehint argument is basically useless. If you're just going to iterate over the lines, you can do that to the file itself. If you really need the lines in a list instead of reading them lazily, list(file) makes your intention clearer—and doesn't mislead you into thinking it might be useful to do repeatedly.
The Pythonic way to check for an empty collection is just if not line:, rather than if len(line) == 0:.
while True is clearer than while 1.

I suggest modifying the function as follows:
def com_Letters(strings):
return set.intersection(*map(set,strings))
I think the function is treating the argument strings as a list of a list of strings (only one argument passed in this case a single list) and therefore not finding the intersection.

Incorporating multiple lists in one text file

I am new to coding and I ran in trouble while trying to make my own fastq masker. The first module is supposed to trim the line with the + away, modify the sequence header (begins with >) to the line number, while keeping the sequence and quality lines (A,G,C,T line and Unicode score, respectively).
class Import_file(object):
def trim_fastq (self, fastq_file):
f = open('path_to_file_a', 'a' )
sanger = []
sequence = []
identifier = []
plus = []
f2 = open('path_to_file_b')
for line in f2.readlines():
line = line.strip()
if line[0]=='#':
identifier.append(line)
identifier.replace('#%s','>[i]' %(line))
elif line[0]==('A' or 'G'or 'T' or 'U' or 'C'):
seq = ','.join(line)
sequence.append(seq)
elif line[0]=='+'and line[1]=='' :
plus.append(line)
remove_line = file.writelines()
elif line[0]!='#' or line[0]!=('A' or 'G'or 'T' or 'U' or 'C') or line[0]!='+' and line[1]!='':
sanger.append(line)
else:
print("Danger Will Robinson, Danger!")
f.write("'%s'\n '%s'\n '%s'" %(identifier, sequence, sanger))
f.close()
return (sanger,sequence,identifier,plus)
Now for my question. I have ran this and no error appears, however the target file is empty. I am wondering what I am doing wrong... Is it my way to handle the lists or the lack of .join? I am sorry if this is a duplicate. It is simply that I do not know what is the mistake here. Also, important note... This is not some homework, I just need a masker for work... Any help is greatly appreciated and all mentions of improvement to the code are welcomed. Thanks.
Note (fastq format):
#SRR566546.970 HWUSI-EAS1673_11067_FC7070M:4:1:2299:1109 length=50
TTGCCTGCCTATCATTTTAGTGCCTGTGAGGTGGAGATGTGAGGATCAGT
+
hhhhhhhhhhghhghhhhhfhhhhhfffffe`ee[`X]b[d[ed`[Y[^Y
Edit: Still unable to get anything, but working at it.

Your problem is with your understanding of the return statement. return x means stop executing the current function and give x back to whoever called it. In your code you have:
return sanger
return sequence
return identifier
return plus
When the first one executes (return sanger) execution of the function stops and sanger is returned. The second through fourth return statements never get evaluated and neither does your I/O stuff at the end. If you're really interested in returning all of these values, move this after the file I/O and return the four of them packed up as a tuple.
f.write("'%s'\n '%s'\n '%s'" %(identifier, sequence, sanger))
f.close()
return (sanger,sequence,identifier,plus)
This should get you at least some output in the file. Whether or not that output is in the format you want, I can't really say.
Edit:
Just noticed you were using /n and probably want \n so I made the change in my answer here.

You have all sorts of errors beyond what #Brian addressed. I'm guessing that your if and else tests are trying to check the first character of line? You'd do that with
if line[0] == '#':
etc.
You'll probably need to write more scripts soon, so I suggest you work through the Python Tutorial so you can get on top of the basics. It'll be worth your while.

Python help - Parsing Packet Logs

I'm writing a simple program that's going to parse a logfile of a packet dump from wireshark into a more readable form. I'm doing this with python.
Currently I'm stuck on this part:
for i in range(len(linelist)):
if '### SERVER' in linelist[i]:
#do server parsing stuff
packet = linelist[i:find("\n\n", i, len(linelist))]
linelist is a list created using the readlines() method, so every line in the file is an element in the list. I'm iterating through it for all occurances of "### SERVER", then grabbing all lines after it until the next empty line(which signifies the end of the packet). I must be doing something wrong, because not only is find() not working, but I have a feeling there's a better way to grab everything between ### SERVER and the next occurance of a blank line.
Any ideas?

Looking at thefile.readlines() doc:
file.readlines([sizehint])
Read until EOF using readline() and return a list containing the lines thus read. If the optional sizehint argument is present, instead of reading up to EOF, whole lines totalling approximately sizehint bytes (possibly after rounding up to an internal buffer size) are read. Objects implementing a file-like interface may choose to ignore sizehint if it cannot be implemented, or cannot be implemented efficiently.
and the file.readline() doc:
file.readline([size])
Read one entire line from the file. A trailing newline character is kept in the string (but may be absent when a file ends with an incomplete line). [6] If the size argument is present and non-negative, it is a maximum byte count (including the trailing newline) and an incomplete line may be returned. An empty string is returned only when EOF is encountered immediately.
A trailing newline character is kept in the string - means that each line in linelist will contain at most one newline. That is why you cannot find a "\n\n" substring in any of the lines - look for a whole blank line (or an empty one at EOF):
if myline in ("\n", ""):
handle_empty_line()
Note: I tried to explain find behavior, but a pythonic solution looks very different from your code snippet.

General idea is:
inpacket = False
packets = []
for line in open("logfile"):
if inpacket:
content += line
if line in ("\n", ""): # empty line
inpacket = False
packets.append(content)
elif '### SERVER' in line:
inpacket = True
content = line
# put here packets.append on eof if needed

This works well with an explicit iterator, also. That way, nested loops can update the iterator's state by consuming lines.
fileIter= iter(theFile)
for x in fileIter:
if "### SERVER" in x:
block = [x]
for y in fileIter:
if len(y.strip()) == 0: # empty line
break
block.append(y)
print block # Or whatever
# elif some other pattern:
This has the pleasant property of finding blocks that are at the tail end of the file, and don't have a blank line terminating them.
Also, this is quite easy to generalize, since there's no explicit state-change variables, you just go into another loop to soak up lines in other kinds of blocks.

best way - use generators
read presentation Generator Tricks for Systems Programmers
This best that I saw about parsing log ;)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

If statements not catching empty lines in files - python

Related

using readline() in a function to read through a log file will not iterate

Python If == true statement only working on last line of readline

find common elements in the strings python

Incorporating multiple lists in one text file

Python help - Parsing Packet Logs

Categories

Resources