How does python read lines from file - python

Consider the following simple python code:
f=open('raw1', 'r')
i=1
for line in f:
line1=line.split()
for word in line1:
print word,
print '\n'
In the first for loop i.e "for line in f:", how does python know that I want to read a line and not a word or a character?
The second loop is clearer as line1 is a list. So the second loop will iterate over the list elemnts.

Python has a notation of what are called "iterables". They're things that know how to let you traverse some data they hold. Some common iterators are lists, sets, dicts, pretty much every data structure. Files are no exception to this.
The way things become iterable is by defining a method to return an object with a next method. This next method is meant to be called repeatedly and return the next piece of data each time. The for foo in bar loops actually are just calling the next method repeatedly behind the scenes.
For files, the next method returns lines, that's it. It doesn't "know" that you want lines, it's just always going to return lines. The reason for this is that ~50% of cases involving file traversal are by line, and if you want words,
for word in (word for line in f for word in line.split(' ')):
...
works just fine.

In python the for..in syntax is used over iterables (elements tht can be iterated upon). For a file object, the iterator is the file itself.
Please refer here to the documentation of next() method - excerpt pasted below:
A file object is its own iterator, for example iter(f) returns f
(unless f is closed). When a file is used as an iterator, typically in
a for loop (for example, for line in f: print line), the next() method
is called repeatedly. This method returns the next input line, or
raises StopIteration when EOF is hit when the file is open for reading
(behavior is undefined when the file is open for writing). In order to
make a for loop the most efficient way of looping over the lines of a
file (a very common operation), the next() method uses a hidden
read-ahead buffer. As a consequence of using a read-ahead buffer,
combining next() with other file methods (like readline()) does not
work right. However, using seek() to reposition the file to an
absolute position will flush the read-ahead buffer. New in version
2.3.

Related

Replace words in list that later will be used in variable

I have a file which currently stores a string eeb39d3e-dd4f-11e8-acf7-a6389e8e7978
which I am trying to pass into as a variable to my subprocess command.
My current code looks like this
with open(logfilnavn, 'r') as t:
test = t.readlines()
print(test)
But this prints ['eeb39d3e-dd4f-11e8-acf7-a6389e8e7978\n'] and I don't want the part with ['\n'] to be passed into my command, so i'm trying to remove them by using replace.
with open(logfilnavn, 'r') as t:
test = t.readlines()
removestrings = test.replace('[', '').replace('[', '').replace('\\', '').replace("'", '').replace('n', '')
print(removestrings)
I get an exception value saying this so how can I replace these with nothing and store them as a string for my subprocess command?
'list' object has no attribute 'replace'
so how can I replace these with nothing and store them as a string for my subprocess command?
readline() returns a list. Try print(test[0].strip())
You can read the whole file and split lines using str.splitlines:
test = t.read().splitlines()
Your test variable is a list, because readlines() returns a list of all lines read.
Since you said the file only contains this one line, you probably wish to perform the replace on only the first line that you read:
removestrings = test[0].replace('[', '').replace('[', '').replace('\\', '').replace("'", '').replace('n', '')
Where you went wrong...
file.readlines() in python returns an array (collection or grouping of the same variable type) of the lines in the file -- arrays in python are called lists. you, here are treating the list as a string. you must first target the string inside it, then apply that string-only function.
In this case however, this would not work as you are trying to change the way the python interpretter has displayed it for one to understand.
Further information...
In code it would not be a string - we just can't easily understand the stack, heap and memory addresses easily. The example below would work for any number of lines (but it will only print the first element) you will need to change that and
this may be useful...
you could perhaps make the variables globally available (so that other parts of the program can read them
more useless stuff
before they go out of scope - the word used to mean the points at which the interpreter (what runs the program) believes the variable is useful - so that it can remove it from memory, or in much larger programs only worry about the locality of variables e.g. when using for loops i is used a lot without scope there would need to be a different name for each variable in the whole project. scopes however get specialised (meaning that if a scope contains the re-declaration of a variable this would fail as it is already seen as being one. an easy way to understand this might be to think of them being branches and the connections between the tips of branches. they don't touch along with their variables.
solution?
e.g:
with open(logfilenavn, 'r') as file:
lines = file.readlines() # creates a list
# an in-line for loop that goes through each item and takes off the last character: \n - the newline character
#this will work with any number of lines
strippedLines = [line[:-1] for line in lines]
#or
strippedLines = [line.replace('\n', '') for line in lines]
#you can now print the string stored within the list
print(strippedLines[0]) # this prints the first element in the list
I hope this helped!
You get the error because readlines returns a list object. Since you mentioned in the comment that there is just one line in the file, its better to use readline() instead,
line = "" # so you can use it as a variable outside `with` scope,
with open("logfilnavn", 'r') as t:
line = t.readline()
print(line)
# output,
eeb39d3e-dd4f-11e8-acf7-a6389e8e7978
readlines will return a list of lines, and you can't use replace with a list.
If you really want to use readlines, you should know that it doesn't remove the newline character from the end, you'll have to do it yourself.
lines = [line.rstrip('\n') for line in t.readlines()]
But still, after removing the newline character yourself from the end of each line, you'll have a list of lines. And from the question, it looks like, you only have one line, you can just access first line lines[0].
Or you can just leave out readlines, and just use read, it'll read all of the contents from the file. And then just do rstrip.
contents = t.read().rstrip('\n')

using readline() in a function to read through a log file will not iterate

In the code below readline() will not increment. I've tried using a value, no value and variable in readline(). When not using a value I don't close the file so that it will iterate but that and the other attempts have not worked.
What happens is just the first byte is displayed over and over again.
If I don't use a function and just place the code in the while loop (without 'line' variable in readline()) it works as expected. It will go through the log file and print out the different hex numbers.
i=0
x=1
def mFinder(line):
rgps=open('c:/code/gps.log', 'r')
varr=rgps.readline(line)
varr=varr[12:14].rstrip()
rgps.close()
return varr
while x<900:
val=mFinder(i)
i+=1
x+=1
print val
print 'this should change'
It appears you have misunderstood what file.readline() does. Passing in an argument does not tell the method to read a specific numbered line.
The documentation tells you what happens instead:
file.readline([size])
Read one entire line from the file. A trailing newline character is kept in the string (but may be absent when a file ends with an incomplete line). If the size argument is present and non-negative, it is a maximum byte count (including the trailing newline) and an incomplete line may be returned.
Bold emphasis mine, you are passing in a maximum byte count and rgps.readline(1) reads a single byte, not the first line.
You need to keep a reference to the file object around until you are done with it, and repeatedly call readline() on it to get successive lines. You can pass the file object to a function call:
def finder(fileobj):
line = fileobj.readline()
return line[12:14].rstrip()
with open('c:/code/gps.log') as rgps:
x = 0
while x < 900:
section = finder(rgps)
print section
# do stuff
x += 1
You can also loop over files directly, because they are iterators:
for line in openfilobject:
or use the next() function to get a next line, as long as you don't mix .readline() calls and iteration (including next()). If you combine this witha generator function, you can leave the file object entirely to a separate function that will read lines and produce sections until you are done:
def read_sections():
with open('c:/code/gps.log') as rgps:
for line in rgps:
yield line[12:14].rstrip()
for section in read_sections():
# do something with `section`.

find common elements in the strings python

I'm trying to find common elements in the strings reading from a file. And this is what I wrote:
file = open ("words.txt", 'r')
while 1:
line = file.readlines()
if len(line) == 0:
break
print line
file.close
def com_Letters(*strings):
return set.intersection(*map(set,strings))
and the result turns out: ['out\n', 'dog\n', 'pingo\n', 'coconut']
I put com_Letters(line), but the result is empty.
There are two problems, but neither one is with com_Letters.
First, this code guarantees that line will always be an empty list:
while 1:
line = file.readlines()
if len(line) == 0:
break
print line
The first time through the loop, you call readlines(), which will
Read until EOF using readline() and return a list containing the lines thus read.
If the file is empty, that's an empty list, so you'll break.
Otherwise, you'll print out the list, and go back into the loop. At which point readlines() is going to have nothing left to read, since you already read until EOF, so it's guaranteed to be an empty list. Which means you'll break.
Either way, list ends up empty.
It's not clear what you're trying to do with that loop. There's never any good reason to call readlines() repeatedly on the same file. But, even if there were, you'd probably want to accumulate all of the results, rather than just keeping the last (guaranteed-empty) result. Something like this:
while 1:
new_line = file.readlines()
if len(new_line) == 0:
break
print new_line
line += new_line
Anyway, if you fix that problem (e.g., by scrapping the whole loop and just using line = file.readlines()), you're calling com_Letters with a single list of strings. That's not particularly useful; it's just a very convoluted way of calling set. If it's not clear why:
Since there's only one argument (a list of strings), *strings ends up as a one-element tuple of that argument.
map(set, strings) on a single-element tuple just calls set on that element and returns a single-element list.
*map(set, strings) explodes that into one argument, the set.
set.intersection(s) is the same thing as s.intersection(), which just returns s itself.
All of this would be easier to see if you broke up some of those complex expressions and printed the intermediate values. Then you'd know exactly where it first goes wrong, instead of just knowing it's somewhere in a long chain of events.
A few side notes:
You forgot the () on the file.close, which means you're not actually closing the file. One of the many reasons that with is better is that it means you can't make that mistake.
Use plural names for collections. line sounds like a variable that should have a single line in it, not a variable that should have all of your lines.
The readlines function with no sizehint argument is basically useless. If you're just going to iterate over the lines, you can do that to the file itself. If you really need the lines in a list instead of reading them lazily, list(file) makes your intention clearer—and doesn't mislead you into thinking it might be useful to do repeatedly.
The Pythonic way to check for an empty collection is just if not line:, rather than if len(line) == 0:.
while True is clearer than while 1.
I suggest modifying the function as follows:
def com_Letters(strings):
return set.intersection(*map(set,strings))
I think the function is treating the argument strings as a list of a list of strings (only one argument passed in this case a single list) and therefore not finding the intersection.

Is it good form to iterate through a file using only a for loop? [duplicate]

This question already has answers here:
Is explicitly closing files important?
(7 answers)
Closed 9 years ago.
I've come across some code that iterates through lines in a file like so:
for line in open(filename, 'r'):
do_all_the_things()
Is that a more Pythonic version of something like:
with open(filename, 'r') as f:
for line in f:
do_all_the_things()
It uses less indentation levels, so it looks nicer, but is it the same? From what I know, with basically adds a finally: f.close() or something to that effect to ensure after leaving the block the object is cleaned up. When the first for loop ends (or is cut short with a break perhaps) and the variable goes out-of-scope does the same thing happen? Can I take my cue from the first bit of code and save myself some keystrokes, or rather, should I fix it?
You're not creating any reference to the file object other than within the iterator used by the for loop.
That means as soon as the for loop ends, that iterator, then the file object, will have their reference counts go to zero and they will be deleted.
When a file object is deleted, it is closed.
So you're not leaving open files by using the bare for loop.
That said, in a larger program, it's good to be explicit -- and the with statement says clearly that the file is used only within that context.
Personally, if I'm opening a file for writing / appending, then I'll use with even if I'm only using it in one place. If I'm just opening it for reading and not creating an explicit reference, I just use the object directly.
Definitely use the context manager. It is the only way to guarantee that your file object is handled properly.
While the first version will likely not give you any problems in Cpython (currently), the specification never specifies when (or even if) __del__ is called. Because of this, you can't know for sure that your file object ever gets properly finalized using the first version.

write() versus writelines() and concatenated strings

So I'm learning Python. I am going through the lessons and ran into a problem where I had to condense a great many target.write() into a single write(), while having a "\n" between each user input variable(the object of write()).
I came up with:
nl = "\n"
lines = line1, nl, line2, nl, line3, nl
textdoc.writelines(lines)
If I try to do:
textdoc.write(lines)
I get an error. But if I type:
textdoc.write(line1 + "\n" + line2 + ....)
Then it works fine. Why am I unable to use a string for a newline in write() but I can use it in writelines()?
Python 2.7
writelines expects an iterable of strings
write expects a single string.
line1 + "\n" + line2 merges those strings together into a single string before passing it to write.
Note that if you have many lines, you may want to use "\n".join(list_of_lines).
Why am I unable to use a string for a newline in write() but I can use it in writelines()?
The idea is the following: if you want to write a single string you can do this with write(). If you have a sequence of strings you can write them all using writelines().
write(arg) expects a string as argument and writes it to the file. If you provide a list of strings, it will raise an exception (by the way, show errors to us!).
writelines(arg) expects an iterable as argument (an iterable object can be a tuple, a list, a string, or an iterator in the most general sense). Each item contained in the iterator is expected to be a string. A tuple of strings is what you provided, so things worked.
The nature of the string(s) does not matter to both of the functions, i.e. they just write to the file whatever you provide them. The interesting part is that writelines() does not add newline characters on its own, so the method name can actually be quite confusing. It actually behaves like an imaginary method called write_all_of_these_strings(sequence).
What follows is an idiomatic way in Python to write a list of strings to a file while keeping each string in its own line:
lines = ['line1', 'line2']
with open('filename.txt', 'w') as f:
f.write('\n'.join(lines))
This takes care of closing the file for you. The construct '\n'.join(lines) concatenates (connects) the strings in the list lines and uses the character '\n' as glue. It is more efficient than using the + operator.
Starting from the same lines sequence, ending up with the same output, but using writelines():
lines = ['line1', 'line2']
with open('filename.txt', 'w') as f:
f.writelines("%s\n" % l for l in lines)
This makes use of a generator expression and dynamically creates newline-terminated strings. writelines() iterates over this sequence of strings and writes every item.
Edit: Another point you should be aware of:
write() and readlines() existed before writelines() was introduced. writelines() was introduced later as a counterpart of readlines(), so that one could easily write the file content that was just read via readlines():
outfile.writelines(infile.readlines())
Really, this is the main reason why writelines has such a confusing name. Also, today, we do not really want to use this method anymore. readlines() reads the entire file to the memory of your machine before writelines() starts to write the data. First of all, this may waste time. Why not start writing parts of data while reading other parts? But, most importantly, this approach can be very memory consuming. In an extreme scenario, where the input file is larger than the memory of your machine, this approach won't even work. The solution to this problem is to use iterators only. A working example:
with open('inputfile') as infile:
with open('outputfile') as outfile:
for line in infile:
outfile.write(line)
This reads the input file line by line. As soon as one line is read, this line is written to the output file. Schematically spoken, there always is only one single line in memory (compared to the entire file content being in memory in case of the readlines/writelines approach).
Actually, I think the problem is that your variable "lines" is bad. You defined lines as a tuple, but I believe that write() requires a string. All you have to change is your commas into pluses (+).
nl = "\n"
lines = line1+nl+line2+nl+line3+nl
textdoc.writelines(lines)
should work.
if you just want to save and load a list try Pickle
Pickle saving:
with open("yourFile","wb")as file:
pickle.dump(YourList,file)
and loading:
with open("yourFile","rb")as file:
YourList=pickle.load(file)
Exercise 16 from Zed Shaw's book? You can use escape characters as follows:
paragraph1 = "%s \n %s \n %s \n" % (line1, line2, line3)
target.write(paragraph1)
target.close()

Categories

Resources