How to read first line of a file twice? - python

I have a big files with many lines and want to read the first line first and then loop through all lines starting with the first line again.
I first thought that something like that would do it:
file = open("fileName", 'r')
first_line = file.readline()
DoStuff_1(first_line)
for line in file:
DoStuff_2(line)
file.close()
But this issue with this script is that the first line that is passed to DoStuff_2 is the second line and not the first one. I don't have a good intuition of what kind of object file is. I think it is an iterator and don't really know how to deal with it. The bad solution I found is
file = open("fileName", 'r')
first_line = file.readline()
count = 0
for line in file:
if count == 0:
count = 1
DoStuff_1(first_line)
DoStuff_2(line)
file.close()
But it is pretty dumb and is computationally a bit costly as it runs a if statement at each iteration.

You could do this:
with open('fileName', 'r') as file:
first_line = file.readline()
DoStuff_1(first_line)
DoStuff_2(first_line)
# remaining lines
for line in file:
DoStuff_2(line)
Note that I changed your code to use with so file is automatically closed.

I'd like using a generator to abstract your general control flow. Something like:
def first_and_file(file_obj):
"""
:type file_obj: file
:rtype: (str, __generator[str])
"""
first_line = next(file_obj)
def gen_rest():
yield first_line
yield from file_obj
return first_line, gen_rest()
In Python 2.7, swap out the yield from for:
for line in file_obj:
yield line

Another answer is to just open the file twice.
with open("file.txt", "r") as r:
Do_Stuff1(r.readline())
with open("file.txt", "r") as r:
for line in r:
Do_Stuff2(line)

One of the solutions for a general case of this question is to save the line number on which you are. After completing an operation which requires you to go a previous line relative to the current line, use the line number variable by doing file.seek(0) and then looping over file.readline() the required number of times.

Related

reading .txt file in python

I have a problem with a code in python. I want to read a .txt file. I use the code:
f = open('test.txt', 'r') # We need to re-open the file
data = f.read()
print(data)
I would like to read ONLY the first line from this .txt file. I use
f = open('test.txt', 'r') # We need to re-open the file
data = f.readline(1)
print(data)
But I am seeing that in screen only the first letter of the line is showing.
Could you help me in order to read all the letters of the line ? (I mean to read whole the line of the .txt file)
with open("file.txt") as f:
print(f.readline())
This will open the file using with context block (which will close the file automatically when we are done with it), and read the first line, this will be the same as:
f = open(“file.txt”)
print(f.readline())
f.close()
Your attempt with f.readline(1) won’t work because it the argument is meant for how many characters to print in the file, therefore it will only print the first character.
Second method:
with open("file.txt") as f:
print(f.readlines()[0])
Or you could also do the above which will get a list of lines and print only the first line.
To read the fifth line, use
with open("file.txt") as f:
print(f.readlines()[4])
Or:
with open("file.txt") as f:
lines = []
lines += f.readline()
lines += f.readline()
lines += f.readline()
lines += f.readline()
lines += f.readline()
print(lines[-1])
The -1 represents the last item of the list
Learn more:
with statement
files in python
readline method
Your first try is almost there, you should have done the following:
f = open('my_file.txt', 'r')
line = f.readline()
print(line)
f.close()
A safer approach to read file is:
with open('my_file.txt', 'r') as f:
print(f.readline())
Both ways will print only the first line.
Your error was that you passed 1 to readline which means you want to read size of 1, which is only a single character. please refer to https://www.w3schools.com/python/ref_file_readline.asp
I tried this and it works, after your suggestions:
f = open('test.txt', 'r')
data = f.readlines()[1]
print(data)
Use with open(...) instead:
with open("test.txt") as file:
line = file.readline()
print(line)
Keep f.readline() without parameters.
It will return you first line as a string and move cursor to second line.
Next time you use f.readline() it will return second line and move cursor to the next, etc...

Python: Issue with Writing over Lines?

So, this is the code I'm using in Python to remove lines, hence the name "cleanse." I have a list of a few thousand words and their parts-of-speech:
NN by
PP at
PP at
... This is the issue. For whatever reason (one I can't figure out and have been trying to for a few hours), the program I'm using to go through the word inputs isn't clearing duplicates, so the next best thing I can do is the former! Y'know, cycle through the file and delete the duplicates on run. However, whenever I do, this code instead takes the last line of the list and duplicates that hundreds of thousands of times.
Thoughts, please? :(
EDIT: The idea is that cleanseArchive() goes through a file called words.txt, takes any duplicate lines and deletes them. Since Python isn't able to delete lines, though, and I haven't had luck with any other methods, I've turned to essentially saving the non-duplicate data in a list (saveList) and then writing each object from that list into a new file (deleting the old). However, as of the moment as I said, it just repeats the final object of the original list thousands upon thousands of times.
EDIT2: This is what I have so far, taking suggestions from the replies:
def cleanseArchive():
f = open("words.txt", "r+")
given_line = f.readlines()
f.seek(0)
saveList = set(given_line)
f.close()
os.remove("words.txt")
f = open("words.txt", "a")
f.write(saveList)
but ATM it's giving me this error:
Traceback (most recent call last):
File "C:\Python33\Scripts\AI\prototypal_intelligence.py", line 154, in <module>
initialize()
File "C:\Python33\Scripts\AI\prototypal_intelligence.py", line 100, in initialize
cleanseArchive()
File "C:\Python33\Scripts\AI\prototypal_intelligence.py", line 29, in cleanseArchive
f.write(saveList)
TypeError: must be str, not set
for i in saveList:
f.write(n+"\n")
You basically print the value of n over and over.
Try this:
for i in saveList:
f.write(i+"\n")
If you just want to delete "duplicated lines", I've modified your reading code:
saveList = []
duplicates = []
with open("words.txt", "r") as ins:
for line in ins:
if line not in duplicates:
duplicates.append(line)
saveList.append(line)
Additionally take the correction above!
def cleanseArchive():
f = open("words.txt", "r+")
f.seek(0)
given_line = f.readlines()
saveList = set()
for x,y in enumerate(given_line):
t=(y)
saveList.add(t)
f.close()
os.remove("words.txt")
f = open("words.txt", "a")
for i in saveList: f.write(i)
Finished product! I ended up digging into enumerate and essentially just using that to get the strings. Man, Python has some bumpy roads when you get into sets/lists, holy shit. So much stuff not working for very ambiguous reasons! Whatever the case, fixed it up.
Let's clean up this code you gave us in your update:
def cleanseArchive():
f = open("words.txt", "r+")
given_line = f.readlines()
f.seek(0)
saveList = set(given_line)
f.close()
os.remove("words.txt")
f = open("words.txt", "a")
f.write(saveList)
We have bad names that don't respect the Style Guide for Python Code, we have superfluous code parts, we don't use the full power of Python and part of it is not working.
Let us start with dropping unneeded code while at the same time using meaningful names.
def cleanse_archive():
infile = open("words.txt", "r")
given_lines = infile.readlines()
words = set(given_lines)
infile.close()
outfile = open("words.txt", "w")
outfile.write(words)
The seek was not needed, the mode for opening a file to read is now just r, the mode for writing is now w and we dropped the removing of the file because it will be overwritten anyway. Having a look at this now clearer code we see, that we missed to close the file after writing. If we open the file with the with statement Python will take care of that for us.
def cleanse_archive():
with open("words.txt", "r") as infile:
words = set(infile.readlines())
with open("words.txt", "w") as outfile:
outfile.write(words)
Now that we have clear code we'll deal with the error message that occurs when outfile.write is called: TypeError: must be str, not set. This message is clear: You can't write a set directly to the file. Obviously you'll have to loop over the content of the set.
def cleanse_archive():
with open("words.txt", "r") as infile:
words = set(infile.readlines())
with open("words.txt", "w") as outfile:
for word in words:
outfile.write(word)
That's it.

How to only read lines in a text file after a certain string?

I'd like to read to a dictionary all of the lines in a text file that come after a particular string. I'd like to do this over thousands of text files.
I'm able to identify and print out the particular string ('Abstract') using the following code (gotten from this answer):
for files in filepath:
with open(files, 'r') as f:
for line in f:
if 'Abstract' in line:
print line;
But how do I tell Python to start reading the lines that only come after the string?
Just start another loop when you reach the line you want to start from:
for files in filepath:
with open(files, 'r') as f:
for line in f:
if 'Abstract' in line:
for line in f: # now you are at the lines you want
# do work
A file object is its own iterator, so when we reach the line with 'Abstract' in it we continue our iteration from that line until we have consumed the iterator.
A simple example:
gen = (n for n in xrange(8))
for x in gen:
if x == 3:
print('Starting second loop')
for x in gen:
print('In second loop', x)
else:
print('In first loop', x)
Produces:
In first loop 0
In first loop 1
In first loop 2
Starting second loop
In second loop 4
In second loop 5
In second loop 6
In second loop 7
You can also use itertools.dropwhile to consume the lines up to the point you want:
from itertools import dropwhile
for files in filepath:
with open(files, 'r') as f:
dropped = dropwhile(lambda _line: 'Abstract' not in _line, f)
next(dropped, '')
for line in dropped:
print(line)
Use a boolean to ignore lines up to that point:
found_abstract = False
for files in filepath:
with open(files, 'r') as f:
for line in f:
if 'Abstract' in line:
found_abstract = True
if found_abstract:
#do whatever you want
You can use itertools.dropwhile and itertools.islice here, a pseudo-example:
from itertools import dropwhile, islice
for fname in filepaths:
with open(fname) as fin:
start_at = dropwhile(lambda L: 'Abstract' not in L.split(), fin)
for line in islice(start_at, 1, None): # ignore the line still with Abstract in
print line
To me, the following code is easier to understand.
with open(file_name, 'r') as f:
while not 'Abstract' in next(f):
pass
for line in f:
#line will be now the next line after the one that contains 'Abstract'
Just to clarify, your code already "reads" all the lines. To start "paying attention" to lines after a certain point, you can just set a boolean flag to indicate whether or not lines should be ignored, and check it at each line.
pay_attention = False
for line in f:
if pay_attention:
print line
else: # We haven't found our trigger yet; see if it's in this line
if 'Abstract' in line:
pay_attention = True
If you don't mind a little more rearranging of your code, you can also use two partial loops instead: one loop that terminates once you've found your trigger phrase ('Abstract'), and one that reads all following lines. This approach is a little cleaner (and a very tiny bit faster).
for skippable_line in f: # First skim over all lines until we find 'Abstract'.
if 'Abstract' in skippable_line:
break
for line in f: # The file's iterator starts up again right where we left it.
print line
The reason this works is that the file object returned by open behaves like a generator, rather than, say, a list: it only produces values as they are requested. So when the first loop stops, the file is left with its internal position set at the beginning of the first "unread" line. This means that when you enter the second loop, the first line you see is the first line after the one that triggered the break.
Making a guess as to how the dictionary is involved, I'd write it this way:
lines = dict()
for filename in filepath:
with open(filename, 'r') as f:
for line in f:
if 'Abstract' in line:
break
lines[filename] = tuple(f)
So for each file, your dictionary contains a tuple of lines.
This works because the loop reads up to and including the line you identify, leaving the remaining lines in the file ready to be read from f.

how to start reading at line X in python?

actually I read a file like this:
f = open("myfile.txt")
for line in f:
#do s.th. with the line
what do I need to do to start reading not at the first line, but at the X line? (e.g. the 5.)
Using itertools.islice you can specify start, stop and step if needs be and apply that to your input file...
from itertools import islice
with open('yourfile') as fin:
for line in islice(fin, 5, None):
pass
An opened file object f is an iterator. Read (and throw away) the first four lines and then go on with regular reading:
with open("myfile.txt", 'r') as f:
for i in xrange(4):
next(f, None)
for line in f:
#do s.th. with the line

Python: Print next x lines from text file when hitting string

The situation is as follows:
I have a .txt file with results of several nslookups.
I want to loop tru the file and everytime it hits the string "Non-authoritative answer:" the scripts has to print the following 8 lines from that position. If it works I shoud get all the positive results in my screen :).
First I had the following code:
#!/bin/usr/python
file = open('/tmp/results_nslookup.txt', 'r')
f = file.readlines()
for positives in f:
if 'Authoritative answers can be found from:' in positives:
print positives
file.close()
But that only printed "Authoritative answers can be found from:" the times it was in the .txt.
The code what I have now:
#!/bin/usr/python
file = open('/tmp/results_nslookup.txt', 'r')
lines = file.readlines()
i = lines.index('Non-authoritative answer:\n')
for line in lines[i-0:i+9]:
print line,
file.close()
But when I run it, it prints the first result nicely to my screen but does not print the other positve results.
p.s. I am aware of socket.gethostbyname("foobar.baz") but first I want to solve this basic problem.
Thank you in advance!
You can use the file as an iterator, then print the next 8 lines every time you find your sentence:
with open('/tmp/results_nslookup.txt', 'r') as f:
for line in f:
if line == 'Non-authoritative answer:\n':
for i in range(8):
print(next(lines).strip())
Each time you use the next() function on the file object (or loop over it in a for loop), it'll return the next line in that file, until you've read the last line.
Instead of the range(8) for loop, I'd actually use itertools.islice:
from itertools import islice
with open('/tmp/results_nslookup.txt', 'r') as f:
for line in f:
if line == 'Non-authoritative answer:\n':
print(''.join(islice(f, 8)))
file = open('/tmp/results_nslookup.txt', 'r')
for line in file:
if line=='Non-authoritative answer:\n':
for _ in range(8):
print file.next()
By the way: don't ever use the name file for a variable because it is the name of a built-in function.

Categories

Resources