I have a csv file which looks like following:
CCC;reserved;reserved;pIndex;wedgeWT;NA;NA;NA;NA;NA;xOffset;yOffset;zOffset
0.10089,0,0,1,0,0,0,0,0,0,-1.8,-0.7,1999998
0.1124,0,0,3,0,0,0,0,0,0,-1.2,1.8,-3.9
I am using the fileinput method to do some operation in the file, but I want to skip the operation on the first (header) line, though still keeping it there. I have tried using next(f) and f.isfirstline(), but they delete the header line. I want to keep the header line intact, though not doing any operation on it.
with fileinput.input(inplace=True) as f:
#skip line
for line in f:
.
.
You can use enumerate to easily keep track of the line numbers:
for linenum, line in enumerate(f):
if linenum == 0:
# 'line' is the header line
continue
# 'line' is a data line
# ...
You can use iter and skip it with next:
with fileinput.input(inplace=True) as f:
iterF = iter(f)
print next(iterF)#skipping computation but printing data
for line in iterF:
#...
This way you will just get the overhead of creating the iterator once, but will not create the indexes nor compute an if in each iteration loop as in #JonathonReinhart solution (wich is also valid).
Related
I have file contains text like Hello:World
#!/usr/bin/python
f = open('m.txt')
while True:
line = f.readline()
if not line :
break
first = line.split(':')[0]
second = line.split(':')[1]
f.close()
I want to put the string after splitting it into 2 variables
On the second iteration i get error
List index out of range
it doesn't break when the line is empty , i searched the answer on related topics and the solution was
if not line:
print break
But it does not work
If there's lines after an empty line (or your text editor inserted an empty line at the end of the file), it's not actually empty. It has a new line character and/or carriage return
You need to strip it off
with open('m.txt') as f:
for line in f:
if not line.strip():
break
first, second = line.split(':')
You can do this relatively easily by utilizing an optional feature of the built-in iter() function by passing it a second argument (called sentinel in the docs) that will cause it to stop if the value is encountered while iterating.
Here's what how use it to make the line processing loop terminate if an empty line is encountered:
with open('m.txt') as fp:
for line in iter(fp.readline, ''):
first, second = line.rstrip().split(':')
print(first, second)
Note the rstrip() which removes the newline at the end of each line read.
Your code is fine, I can't put a picture in a comment. It all works, here:
I'm reading a large text file and I need to read a number from a specific line. The file looks like this:
....
unknown number of lines
....
ABCD
some random stuff
a number I want to read
....
....
I want to read the number from the line that is 2 lines after a "signature" line that's ABCD, which is unique. Right now what I'm doing is:
with open(filename,'r') as f:
for line in f:
if line.rstrip('\n') == 'ABCD':
continue
But the continue only advances the for loop by 1 iteration. So, how can I make it to advance one more iteration to get the line I actually need?
You could explicitly call next on f* (which the for loop usually does for you) and advance the iterator and then call continue:
for line in f:
if line.rstrip('\n') == 'ABCD':
next(f)
continue
print(line)
This will now print:
....
unknown number of lines
....
a number I want to read
....
....
Thereby skipping 'ABCD' and 'some random stuff'.
In the general case where you are certain ABCD is not the final element, this should not cause issues. If you want to be on the safe side, though, you could wrap it in a try - except to catch the StopIteration exception.
* In this case, this works because f is it's own iterator i.e iter(f) is f. In general, this is not the case, for lists the iterator is it's own distinct object list_iterator so advancing it like this will not work.
If you want to stick with this approach then do this:
f = open(filename,'r'):
while f.readline().rstrip('\n') != 'ABCD': # this will advanced the pointer to the ABCD line
continue
f.next() # to skip over the unnecessary stuff
desiredNumber = f.readline() # desired line
I think regex would look a lot better, but if you want something to get the work done, here it is.
If you don't need any information at all from the skipped line, you can advance the file manually by a line before continueing:
with open(filename,'r') as f:
for line in f:
if line.rstrip('\n') == 'ABCD':
next(f) # The next iteration of the for loop will skip a line
continue
If the only thing you need from this file is that one line, there's no need to continue at all. Just skip a line, grab the next line, do whatever you need to do with it, and break out of the for loop, all from within that if block.
I prefer #Jim's use of next(), but another option is to just use a flag:
with open(filename,'r') as f:
skip_line = False
for line in f:
if line.rstrip('\n') == 'ABCD':
skip_line = True
continue
if skip_line == True:
skip_line = False
else:
print(line)
I'm trying to write a code that looks for a specific text in a file and gets the line after.
f = open('programa.txt','r')
for line in f:
if (line == "[Height of the board]\n"):
## skip to next line and saves its content
print(line)
Set a flag so you know to grab the next line.
f = open('programa.txt','r')
grab_next = False
for line in f:
if grab_next:
print(line)
grab_next = line == "[Height of the board]\n"
File objects are iterators in Python; while the for loop uses the iterator protocol implicitly, you can invoke it manually yourself when you need to skip ahead:
with open('programa.txt') as f:
for line in f:
if line == "[Height of the board]\n":
# skip to next line and saves its content
line = next(f)
print(line)
Your example code is unclear on where to store the next line, so I've stored it back to line, making the original line header disappear. If the goal was to print only that line and break, you could use:
with open('programa.txt') as f:
for line in f:
if line == "[Height of the board]\n":
# skip to next line and saves its content
importantline = next(f)
print(importantline)
break
Problems like this are almost always simpler when you look back rather than trying to look ahead. After all, finding out the last line is trivial; you just store it in a variable! In this case, you want to save the current line if the previous line was the header:
f = open('programa.txt', 'r')
last = ""
for line in f:
if last == "[Height of the board]\n":
height = int(line.strip()) # for example
break # exit the loop once found (optional)
last = line
I'd like to read to a dictionary all of the lines in a text file that come after a particular string. I'd like to do this over thousands of text files.
I'm able to identify and print out the particular string ('Abstract') using the following code (gotten from this answer):
for files in filepath:
with open(files, 'r') as f:
for line in f:
if 'Abstract' in line:
print line;
But how do I tell Python to start reading the lines that only come after the string?
Just start another loop when you reach the line you want to start from:
for files in filepath:
with open(files, 'r') as f:
for line in f:
if 'Abstract' in line:
for line in f: # now you are at the lines you want
# do work
A file object is its own iterator, so when we reach the line with 'Abstract' in it we continue our iteration from that line until we have consumed the iterator.
A simple example:
gen = (n for n in xrange(8))
for x in gen:
if x == 3:
print('Starting second loop')
for x in gen:
print('In second loop', x)
else:
print('In first loop', x)
Produces:
In first loop 0
In first loop 1
In first loop 2
Starting second loop
In second loop 4
In second loop 5
In second loop 6
In second loop 7
You can also use itertools.dropwhile to consume the lines up to the point you want:
from itertools import dropwhile
for files in filepath:
with open(files, 'r') as f:
dropped = dropwhile(lambda _line: 'Abstract' not in _line, f)
next(dropped, '')
for line in dropped:
print(line)
Use a boolean to ignore lines up to that point:
found_abstract = False
for files in filepath:
with open(files, 'r') as f:
for line in f:
if 'Abstract' in line:
found_abstract = True
if found_abstract:
#do whatever you want
You can use itertools.dropwhile and itertools.islice here, a pseudo-example:
from itertools import dropwhile, islice
for fname in filepaths:
with open(fname) as fin:
start_at = dropwhile(lambda L: 'Abstract' not in L.split(), fin)
for line in islice(start_at, 1, None): # ignore the line still with Abstract in
print line
To me, the following code is easier to understand.
with open(file_name, 'r') as f:
while not 'Abstract' in next(f):
pass
for line in f:
#line will be now the next line after the one that contains 'Abstract'
Just to clarify, your code already "reads" all the lines. To start "paying attention" to lines after a certain point, you can just set a boolean flag to indicate whether or not lines should be ignored, and check it at each line.
pay_attention = False
for line in f:
if pay_attention:
print line
else: # We haven't found our trigger yet; see if it's in this line
if 'Abstract' in line:
pay_attention = True
If you don't mind a little more rearranging of your code, you can also use two partial loops instead: one loop that terminates once you've found your trigger phrase ('Abstract'), and one that reads all following lines. This approach is a little cleaner (and a very tiny bit faster).
for skippable_line in f: # First skim over all lines until we find 'Abstract'.
if 'Abstract' in skippable_line:
break
for line in f: # The file's iterator starts up again right where we left it.
print line
The reason this works is that the file object returned by open behaves like a generator, rather than, say, a list: it only produces values as they are requested. So when the first loop stops, the file is left with its internal position set at the beginning of the first "unread" line. This means that when you enter the second loop, the first line you see is the first line after the one that triggered the break.
Making a guess as to how the dictionary is involved, I'd write it this way:
lines = dict()
for filename in filepath:
with open(filename, 'r') as f:
for line in f:
if 'Abstract' in line:
break
lines[filename] = tuple(f)
So for each file, your dictionary contains a tuple of lines.
This works because the loop reads up to and including the line you identify, leaving the remaining lines in the file ready to be read from f.
Is there an elegant way of skipping first line of file when using python fileinput module?
I have data file with nicely formated data but the first line is header. Using fileinput I would have to include check and discard line if the line does not seem to contain data.
The problem is that it would apply the same check for the rest of the file.
With read() you can open file, read first line then go to loop over the rest of the file. Is there similar trick with fileinput?
Is there an elegant way to skip processing of the first line?
Example code:
import fileinput
# how to skip first line elegantly?
for line in fileinput.input(["file.dat"]):
data = proces_line(line);
output(data)
lines = iter(fileinput.input(["file.dat"]))
next(lines) # extract and discard first line
for line in lines:
data = proces_line(line)
output(data)
or use the itertools.islice way if you prefer
import itertools
finput = fileinput.input(["file.dat"])
lines = itertools.islice(finput, 1, None) # cuts off first line
dataset = (process_line(line) for line in lines)
results = [output(data) for data in dataset]
Since everything used are generators and iterators, no intermediate list will be built.
The fileinput module contains a bunch of handy functions, one of which seems to do exactly what you're looking for:
for line in fileinput.input(["file.dat"]):
if not fileinput.isfirstline():
data = proces_line(line);
output(data)
fileinput module documentation
It's right in the docs: http://docs.python.org/library/fileinput.html#fileinput.isfirstline
One option is to use openhook:
The openhook, when given, must be a function that takes two arguments,
filename and mode, and returns an accordingly opened file-like object.
You cannot use inplace and openhook together.
One could create helper function skip_header and use it as openhook, something like:
import fileinput
files = ['file_1', 'file_2']
def skip_header(filename, mode):
f = open(filename, mode)
next(f)
return f
for line in fileinput.input(files=files, openhook=skip_header):
# do something
Do two loops where the first one calls break immediately.
with fileinput.input(files=files, mode='rU', inplace=True) as f:
for line in f:
# add print() here if you only want to empty the line
break
for line in f:
process(line)
Lets say you want to remove or empty all of the first 5 lines.
with fileinput.input(files=files, mode='rU', inplace=True) as f:
for line in f:
# add print() here if you only want to empty the first 5 lines
if f._filelineno == 5:
break
for line in f:
process(line)
But if you only want to get rid of the first line, just use next before the loop to remove the first line.
with fileinput.input(files=files, mode='rU', inplace=True) as f:
next(f)
for line in f:
process(line)
with open(file) as j: #open file as j
for i in j.readlines()[1:]: #start reading j from second line.