How to define length of lines to read in from a file - python

I am reading lines from a file in Python. Here is my code:
with open('words','rb') as f:
for line in f:
Is there a way to define the amount of lines I want to use? Say for example, the first 1000 lines in the file?

You can use enumerate():
with open('words','rb') as f:
for i, line in enumerate(f):
if i >= 1000:
break
# do work for first 1000 lines

Make a variable to count. I have used i for example below. The value will be incremented in each iteration. When the value reached 999 that is, 1000 times, you can do stuffs there
i = 0
with open('words','rb') as f:
for line in f:
if(i<1000):
#do stuffs
i = i+1

Related

large text editing, paste replace

I have a large text file with over ~200 million lines. It is split into blocks of approximately 50000 lines. What I need to do is replace lines 10-100 from all the blocks with lines 10-100 from the first block. Any ideas how to go about this?
Thanks in advance
Use a list. First read the lines you want to use from the first block into a list. Next, read each other file in turn line by line, and write them out to a new file, but if the line number is between 1-100 then use the line from your list. Example that achieves your goal:
fnames = ["file1.txt", "file2.txt", "file3.txt"]
sub_list_start = 9
sub_list_end = 100
file1_line_10_to_100 = []
with open(fnames[0]) as f:
for i, line in enumerate(f.readlines()):
if i >= sub_list_start and i < sub_list_end:
file1_line_10_to_100.append(line)
if i >= sub_list_end:
break
for fname in fnames[1:]:
with open(fname) as f:
with open(fname + '.new', 'w') as f_out:
for i, line in enumerate(f.readlines()):
if i >= sub_list_start and i < sub_list_end:
f_out.write(file1_line_10_to_100[i - sub_list_start])
else:
f_out.write(line)

How to only read lines in a text file after a certain string?

I'd like to read to a dictionary all of the lines in a text file that come after a particular string. I'd like to do this over thousands of text files.
I'm able to identify and print out the particular string ('Abstract') using the following code (gotten from this answer):
for files in filepath:
with open(files, 'r') as f:
for line in f:
if 'Abstract' in line:
print line;
But how do I tell Python to start reading the lines that only come after the string?
Just start another loop when you reach the line you want to start from:
for files in filepath:
with open(files, 'r') as f:
for line in f:
if 'Abstract' in line:
for line in f: # now you are at the lines you want
# do work
A file object is its own iterator, so when we reach the line with 'Abstract' in it we continue our iteration from that line until we have consumed the iterator.
A simple example:
gen = (n for n in xrange(8))
for x in gen:
if x == 3:
print('Starting second loop')
for x in gen:
print('In second loop', x)
else:
print('In first loop', x)
Produces:
In first loop 0
In first loop 1
In first loop 2
Starting second loop
In second loop 4
In second loop 5
In second loop 6
In second loop 7
You can also use itertools.dropwhile to consume the lines up to the point you want:
from itertools import dropwhile
for files in filepath:
with open(files, 'r') as f:
dropped = dropwhile(lambda _line: 'Abstract' not in _line, f)
next(dropped, '')
for line in dropped:
print(line)
Use a boolean to ignore lines up to that point:
found_abstract = False
for files in filepath:
with open(files, 'r') as f:
for line in f:
if 'Abstract' in line:
found_abstract = True
if found_abstract:
#do whatever you want
You can use itertools.dropwhile and itertools.islice here, a pseudo-example:
from itertools import dropwhile, islice
for fname in filepaths:
with open(fname) as fin:
start_at = dropwhile(lambda L: 'Abstract' not in L.split(), fin)
for line in islice(start_at, 1, None): # ignore the line still with Abstract in
print line
To me, the following code is easier to understand.
with open(file_name, 'r') as f:
while not 'Abstract' in next(f):
pass
for line in f:
#line will be now the next line after the one that contains 'Abstract'
Just to clarify, your code already "reads" all the lines. To start "paying attention" to lines after a certain point, you can just set a boolean flag to indicate whether or not lines should be ignored, and check it at each line.
pay_attention = False
for line in f:
if pay_attention:
print line
else: # We haven't found our trigger yet; see if it's in this line
if 'Abstract' in line:
pay_attention = True
If you don't mind a little more rearranging of your code, you can also use two partial loops instead: one loop that terminates once you've found your trigger phrase ('Abstract'), and one that reads all following lines. This approach is a little cleaner (and a very tiny bit faster).
for skippable_line in f: # First skim over all lines until we find 'Abstract'.
if 'Abstract' in skippable_line:
break
for line in f: # The file's iterator starts up again right where we left it.
print line
The reason this works is that the file object returned by open behaves like a generator, rather than, say, a list: it only produces values as they are requested. So when the first loop stops, the file is left with its internal position set at the beginning of the first "unread" line. This means that when you enter the second loop, the first line you see is the first line after the one that triggered the break.
Making a guess as to how the dictionary is involved, I'd write it this way:
lines = dict()
for filename in filepath:
with open(filename, 'r') as f:
for line in f:
if 'Abstract' in line:
break
lines[filename] = tuple(f)
So for each file, your dictionary contains a tuple of lines.
This works because the loop reads up to and including the line you identify, leaving the remaining lines in the file ready to be read from f.

How to make python disregard first couple of lines of a text file

I was wondering if it was possible to make python disregard the first 4 lines of my text file. Like if I had a text file which looked like this:
aaa
aaa
aaa
aaa
123412
1232134
Can I make it so python starts working from the numbers?
Use next and a loop:
with open("/path/to/myfile.txt") as myfile:
for _ in range(4): # Use xrange here if you are on Python 2.x
next(myfile)
for line in myfile:
print(line) # Just to demonstrate
Because file objects are iterators in Python, a line will be skipped each time you do next(myfile).
This should do the trick
f = open('file.txt')
for index,line in enumerate(f):
if index>3:
print(line)
assuming you know the number of lines to discard, you can use this method:
for i, line in enumerate(open('myfile')):
if i < number_of_lines_to_discard:
continue
# do your stuff here
or if you just want to disregard non numeric lines:
for line in open('myfile'):
if not re.match('^[0-9]+$\n', line):
continue
# do your stuff here
A more robust solution, not relying on the exact number of lines:
with open(filename) as f:
for line in f:
try:
line = int(line)
except ValueError:
continue
# process line

Ways to read/edit multiple lines in python

What i'm trying to do is to take 4 lines from a file that look like this:
#blablabla
blablabla #this string needs to match the amount of characters in line 4
!blablabla
blablabla #there is a string here
This goes on for a few hundred times.
I read the entire thing line by line, make a change to the fourth line, then want to match the second line's character count to the amount in the fourth line.
I can't figure out how to "backtrack" and change the second line after making changes to the fourth.
with fileC as inputA:
for line1 in inputA:
line2 = next(inputA)
line3 = next(inputA)
line4 = next(inputA)
is what i'm currently using, because it lets me handle 4 lines at the same time, but there has to be a better way as causes all sorts of problems when writing away the file. What could I use as an alternative?
you could do:
with open(filec , 'r') as f:
lines = f.readlines() # readlines creates a list of the lines
to access line 4 and do something with it you would access:
lines[3] # as lines is a list
and for line 2
lines[1] # etc.
You could then write your lines back into a file if you wish
EDIT:
Regarding your comment, perhaps something like this:
def change_lines(fileC):
with open(fileC , 'r') as f:
while True:
lines = []
for i in range(4):
try:
lines.append(f.next()) # f.next() returns next line in file
except StopIteration: # this will happen if you reach end of file before finding 4 more lines.
#decide what you want to do here
return
# otherwise this will happen
lines[2] = lines[4] # or whatever you want to do here
# maybe write them to a new file
# remember you're still within the for loop here
EDIT:
Since your file divides into fours evenly, this works:
def change_lines(fileC):
with open(fileC , 'r') as f:
while True:
lines = []
for i in range(4):
try:
lines.append(f.next())
except StopIteration:
return
code code # do something with lines here
# and write to new file etc.
Another way to do it:
import sys
from itertools import islice
def read_in_chunks(file_path, n):
with open(file_path) as fh:
while True:
lines = list(islice(fh, n))
if lines: yield lines
else: break
for lines in read_in_chunks(sys.argv[1], 4):
print lines
Also relevant is the grouper() recipe in the itertools module. In that case, you would need to filter out the None values before yielding them to the caller.
You could read the file with .readlines and then index which ever line you want to change and write that back to the file:
rf = open('/path/to/file')
file_lines = rf.readlines()
rf.close()
line[1] = line[3] # trim/edit however you'd like
wf = open('/path/to/write', 'w')
wf.writelines(file_lines)
wf.close()

Python: Skipping lines from file

I have a data file that has 100 lines and I want create a dictionary that skips the first two lines and then create a dictionary with enumerating keys with the lines as values.
myfile = open(infile, 'r')
d={}
with myfile as f:
next(f)
next(f)
for line in f:
This is what I got, I don't how to use iteritems(), enumerate(), or itervalues() but I feel like I think I will use them or maybe not if anybody can help me.
You could do something like:
from itertools import islice
with open(infile, 'r') as myfile:
d = dict(enumerate(islice(myfile, 2, None)))
But I wish I understood why you want to skip the first two lines – are you sure you don't want linecache?
This is just going to be of the top of my head so the will certainly be room for improvement.
myfile = open(infile, 'r') # open the file
d = {} # initiate the dict
for line in myfile: # iterate over lines in the file
counter = 0 # initiate the counter
if counter <= 1: # if we have counted under 2 iterations
counter += 1 # increase the counter by 1 and do nothing else
else: # if we have counted over 2 iterations
d[counter - 2] = line # make a new key with the name of lines counted (taking in to consideration the 2 lines precounted)
counter += 1 # increase the counter by 1 before continuing
I can not of the top of my head remember where in the code it would be best to close the file but do some experimentation and read this and this. And another time a good place to start would really be google and the python docs in general.

Categories

Resources