I'm reading a large text file and I need to read a number from a specific line. The file looks like this:
....
unknown number of lines
....
ABCD
some random stuff
a number I want to read
....
....
I want to read the number from the line that is 2 lines after a "signature" line that's ABCD, which is unique. Right now what I'm doing is:
with open(filename,'r') as f:
for line in f:
if line.rstrip('\n') == 'ABCD':
continue
But the continue only advances the for loop by 1 iteration. So, how can I make it to advance one more iteration to get the line I actually need?
You could explicitly call next on f* (which the for loop usually does for you) and advance the iterator and then call continue:
for line in f:
if line.rstrip('\n') == 'ABCD':
next(f)
continue
print(line)
This will now print:
....
unknown number of lines
....
a number I want to read
....
....
Thereby skipping 'ABCD' and 'some random stuff'.
In the general case where you are certain ABCD is not the final element, this should not cause issues. If you want to be on the safe side, though, you could wrap it in a try - except to catch the StopIteration exception.
* In this case, this works because f is it's own iterator i.e iter(f) is f. In general, this is not the case, for lists the iterator is it's own distinct object list_iterator so advancing it like this will not work.
If you want to stick with this approach then do this:
f = open(filename,'r'):
while f.readline().rstrip('\n') != 'ABCD': # this will advanced the pointer to the ABCD line
continue
f.next() # to skip over the unnecessary stuff
desiredNumber = f.readline() # desired line
I think regex would look a lot better, but if you want something to get the work done, here it is.
If you don't need any information at all from the skipped line, you can advance the file manually by a line before continueing:
with open(filename,'r') as f:
for line in f:
if line.rstrip('\n') == 'ABCD':
next(f) # The next iteration of the for loop will skip a line
continue
If the only thing you need from this file is that one line, there's no need to continue at all. Just skip a line, grab the next line, do whatever you need to do with it, and break out of the for loop, all from within that if block.
I prefer #Jim's use of next(), but another option is to just use a flag:
with open(filename,'r') as f:
skip_line = False
for line in f:
if line.rstrip('\n') == 'ABCD':
skip_line = True
continue
if skip_line == True:
skip_line = False
else:
print(line)
Related
I have a csv file which looks like following:
CCC;reserved;reserved;pIndex;wedgeWT;NA;NA;NA;NA;NA;xOffset;yOffset;zOffset
0.10089,0,0,1,0,0,0,0,0,0,-1.8,-0.7,1999998
0.1124,0,0,3,0,0,0,0,0,0,-1.2,1.8,-3.9
I am using the fileinput method to do some operation in the file, but I want to skip the operation on the first (header) line, though still keeping it there. I have tried using next(f) and f.isfirstline(), but they delete the header line. I want to keep the header line intact, though not doing any operation on it.
with fileinput.input(inplace=True) as f:
#skip line
for line in f:
.
.
You can use enumerate to easily keep track of the line numbers:
for linenum, line in enumerate(f):
if linenum == 0:
# 'line' is the header line
continue
# 'line' is a data line
# ...
You can use iter and skip it with next:
with fileinput.input(inplace=True) as f:
iterF = iter(f)
print next(iterF)#skipping computation but printing data
for line in iterF:
#...
This way you will just get the overhead of creating the iterator once, but will not create the indexes nor compute an if in each iteration loop as in #JonathonReinhart solution (wich is also valid).
I'd like to read to a dictionary all of the lines in a text file that come after a particular string. I'd like to do this over thousands of text files.
I'm able to identify and print out the particular string ('Abstract') using the following code (gotten from this answer):
for files in filepath:
with open(files, 'r') as f:
for line in f:
if 'Abstract' in line:
print line;
But how do I tell Python to start reading the lines that only come after the string?
Just start another loop when you reach the line you want to start from:
for files in filepath:
with open(files, 'r') as f:
for line in f:
if 'Abstract' in line:
for line in f: # now you are at the lines you want
# do work
A file object is its own iterator, so when we reach the line with 'Abstract' in it we continue our iteration from that line until we have consumed the iterator.
A simple example:
gen = (n for n in xrange(8))
for x in gen:
if x == 3:
print('Starting second loop')
for x in gen:
print('In second loop', x)
else:
print('In first loop', x)
Produces:
In first loop 0
In first loop 1
In first loop 2
Starting second loop
In second loop 4
In second loop 5
In second loop 6
In second loop 7
You can also use itertools.dropwhile to consume the lines up to the point you want:
from itertools import dropwhile
for files in filepath:
with open(files, 'r') as f:
dropped = dropwhile(lambda _line: 'Abstract' not in _line, f)
next(dropped, '')
for line in dropped:
print(line)
Use a boolean to ignore lines up to that point:
found_abstract = False
for files in filepath:
with open(files, 'r') as f:
for line in f:
if 'Abstract' in line:
found_abstract = True
if found_abstract:
#do whatever you want
You can use itertools.dropwhile and itertools.islice here, a pseudo-example:
from itertools import dropwhile, islice
for fname in filepaths:
with open(fname) as fin:
start_at = dropwhile(lambda L: 'Abstract' not in L.split(), fin)
for line in islice(start_at, 1, None): # ignore the line still with Abstract in
print line
To me, the following code is easier to understand.
with open(file_name, 'r') as f:
while not 'Abstract' in next(f):
pass
for line in f:
#line will be now the next line after the one that contains 'Abstract'
Just to clarify, your code already "reads" all the lines. To start "paying attention" to lines after a certain point, you can just set a boolean flag to indicate whether or not lines should be ignored, and check it at each line.
pay_attention = False
for line in f:
if pay_attention:
print line
else: # We haven't found our trigger yet; see if it's in this line
if 'Abstract' in line:
pay_attention = True
If you don't mind a little more rearranging of your code, you can also use two partial loops instead: one loop that terminates once you've found your trigger phrase ('Abstract'), and one that reads all following lines. This approach is a little cleaner (and a very tiny bit faster).
for skippable_line in f: # First skim over all lines until we find 'Abstract'.
if 'Abstract' in skippable_line:
break
for line in f: # The file's iterator starts up again right where we left it.
print line
The reason this works is that the file object returned by open behaves like a generator, rather than, say, a list: it only produces values as they are requested. So when the first loop stops, the file is left with its internal position set at the beginning of the first "unread" line. This means that when you enter the second loop, the first line you see is the first line after the one that triggered the break.
Making a guess as to how the dictionary is involved, I'd write it this way:
lines = dict()
for filename in filepath:
with open(filename, 'r') as f:
for line in f:
if 'Abstract' in line:
break
lines[filename] = tuple(f)
So for each file, your dictionary contains a tuple of lines.
This works because the loop reads up to and including the line you identify, leaving the remaining lines in the file ready to be read from f.
I was wondering if it was possible to make python disregard the first 4 lines of my text file. Like if I had a text file which looked like this:
aaa
aaa
aaa
aaa
123412
1232134
Can I make it so python starts working from the numbers?
Use next and a loop:
with open("/path/to/myfile.txt") as myfile:
for _ in range(4): # Use xrange here if you are on Python 2.x
next(myfile)
for line in myfile:
print(line) # Just to demonstrate
Because file objects are iterators in Python, a line will be skipped each time you do next(myfile).
This should do the trick
f = open('file.txt')
for index,line in enumerate(f):
if index>3:
print(line)
assuming you know the number of lines to discard, you can use this method:
for i, line in enumerate(open('myfile')):
if i < number_of_lines_to_discard:
continue
# do your stuff here
or if you just want to disregard non numeric lines:
for line in open('myfile'):
if not re.match('^[0-9]+$\n', line):
continue
# do your stuff here
A more robust solution, not relying on the exact number of lines:
with open(filename) as f:
for line in f:
try:
line = int(line)
except ValueError:
continue
# process line
What i'm trying to do is to take 4 lines from a file that look like this:
#blablabla
blablabla #this string needs to match the amount of characters in line 4
!blablabla
blablabla #there is a string here
This goes on for a few hundred times.
I read the entire thing line by line, make a change to the fourth line, then want to match the second line's character count to the amount in the fourth line.
I can't figure out how to "backtrack" and change the second line after making changes to the fourth.
with fileC as inputA:
for line1 in inputA:
line2 = next(inputA)
line3 = next(inputA)
line4 = next(inputA)
is what i'm currently using, because it lets me handle 4 lines at the same time, but there has to be a better way as causes all sorts of problems when writing away the file. What could I use as an alternative?
you could do:
with open(filec , 'r') as f:
lines = f.readlines() # readlines creates a list of the lines
to access line 4 and do something with it you would access:
lines[3] # as lines is a list
and for line 2
lines[1] # etc.
You could then write your lines back into a file if you wish
EDIT:
Regarding your comment, perhaps something like this:
def change_lines(fileC):
with open(fileC , 'r') as f:
while True:
lines = []
for i in range(4):
try:
lines.append(f.next()) # f.next() returns next line in file
except StopIteration: # this will happen if you reach end of file before finding 4 more lines.
#decide what you want to do here
return
# otherwise this will happen
lines[2] = lines[4] # or whatever you want to do here
# maybe write them to a new file
# remember you're still within the for loop here
EDIT:
Since your file divides into fours evenly, this works:
def change_lines(fileC):
with open(fileC , 'r') as f:
while True:
lines = []
for i in range(4):
try:
lines.append(f.next())
except StopIteration:
return
code code # do something with lines here
# and write to new file etc.
Another way to do it:
import sys
from itertools import islice
def read_in_chunks(file_path, n):
with open(file_path) as fh:
while True:
lines = list(islice(fh, n))
if lines: yield lines
else: break
for lines in read_in_chunks(sys.argv[1], 4):
print lines
Also relevant is the grouper() recipe in the itertools module. In that case, you would need to filter out the None values before yielding them to the caller.
You could read the file with .readlines and then index which ever line you want to change and write that back to the file:
rf = open('/path/to/file')
file_lines = rf.readlines()
rf.close()
line[1] = line[3] # trim/edit however you'd like
wf = open('/path/to/write', 'w')
wf.writelines(file_lines)
wf.close()
I have this simple code which is really just to help me understand how Python I/O works:
inFile = open("inFile.txt",'r')
outFile = open("outFile.txt",'w')
lines = inFile.readlines()
first = True
for line in lines:
if first == True:
outFile.write(line) #always print the header
first = False
continue
nums = line.split()
outFile.write(nums[3] + "\n") #print the 4th column of each row
outFile.close()
My input file is something like this:
#header
34.2 3.42 64.56 54.43 3.45
4.53 65.6 5.743 34.52 56.4
4.53 90.8 53.45 134.5 4.58
5.76 53.9 89.43 54.33 3.45
The output prints out into the file just as it should but I also get the error:
outFile.write(nums[3] + "\n")
IndexError: list index out of range
I'm assuming this is because it has continued to read the next line although there is no longer any data?
Others have already answered your question. Here is a better way to "always print out the file header", avoiding testing for first at every iteration:
with open('inFile.txt', 'r') as inFile, open('outFile.txt', 'w') as outFile:
outFile.write(inFile.readline()) #always print the header
for line in inFile:
nums = line.split()
if len(nums) >= 4: #Checks to make sure a fourth column exists.
outFile.write(nums[3] + "\n") #print the 4th column of each row
A couple things are going on here:
with open('inFile.txt', 'r') as inFile, open('outFile.txt', 'w') as outFile:
The with expression is a convenient way to open files because it automatically closes the files even if an exception occurs and the with block exits early.
Note: In Python 2.6, you will need to use two with statements, as support for multiple contexts was not added until 2.7. e.g:
with open(somefile, 'r') as f:
with open(someotherfile, 'w') as g:
#code here.
outFile.write(inFile.readline()) #always print the header
The file object is an iterator that gets consumed. When readline() is called, the buffer position advances forwards and the first line is returned.
for line in inFile:
As mentioned before, the file object is an iterator, so you can use it directly in a for loop.
The error shows that in your source code you have the following line:
outFile.write(nums[6] + "\n")
Note that the 6 there is different from the 3 you show in your question. You may have two different versions of the file.
It fails because nums is the result of splitting a line and in your case it contains only 5 elements:
for line in lines:
# ...
# line is for example "34.2 3.42 64.56 54.43 3.45"
nums = line.split()
print len(nums)
You can't index past the end of a list.
You also may have an error in your code. You write the header, then split it and write one element from it. You probably want an if/else.
for line in lines:
if first == 1:
# do something with the header
else:
# do something with the other lines
Or you could just handle the header separately before you enter the loop.
The problem is that you are processing the "header line" just like the rest of the data. I.e., even though you identify the header line, you don't skip its processing. I.e., you don't avoid split()'ing it further down in the loop which causes the run-time error.
To fix your problem simply insert a continue as shown:
first = True
for line in lines:
if first == True:
outFile.write(line) #always print the header
first = False
continue ## skip the rest of the loop and start from the top
nums = line.split()
...
that will bypass the rest of the loop and all will work as it should.
The output file outFile.txt will contain:
#header
54.43
34.52
134.5
54.33
And the 2nd problem turned out having blank lines at the end of the input file (see discussion in comments below)
Notes: You could restructure your code, but if you are not interested in doing that, the simple fix above lets you keep all of your present code, and only requires the addition of the one line. As mentioned in other posts, it's worth looking into using with to manage your open files as it will also close them for you when you are done or an exception is encountered.