Deleting n number of lines after specific line of file in python - python

I am trying to remove a specific number of lines from a file. These lines always occur after a specific comment line. Anyways, talk is cheap, here is an example of what I have.
FILE: --
randomstuff
randomstuff2
randomstuff3
# my comment
extrastuff
randomstuff2
extrastuff2
#some other comment
randomstuff4
So, I am trying to remove the section after # my comment. Perhaps there is someway to delete a line in r+ mode?
Here is what I have so far
with open(file_name, 'a+') as f:
for line in f:
if line == my_comment_text:
f.seek(len(my_comment_text)*-1, 1) # move cursor back to beginning of line
counter = 4
if counter > 0:
del(line) # is there a way to do this?
Not exactly sure how to do this. How do I remove a specific line? I have looked at this possible dup and can't quite figure out how to do it that way either. The answer recommends you read the file, then you re-write it. The problem with this is they are checking for a specific line when they write. I cant do that exactly, plus I dont like the idea of storing the entire files contents in memory. That would eat up a lot of memory with a large file (since every line has to be stored, rather than one at a time).
Any ideas?

You can use the fileinput module for this and open the file in inplace=True mode to allow in-place modification:
import fileinput
counter = 0
for line in fileinput.input('inp.txt', inplace=True):
if not counter:
if line.startswith('# my comment'):
counter = 4
else:
print line,
else:
counter -= 1
Edit per your comment "Or until a blank line is found":
import fileinput
ignore = False
for line in fileinput.input('inp.txt', inplace=True):
if not ignore:
if line.startswith('# my comment'):
ignore = True
else:
print line,
if ignore and line.isspace():
ignore = False

You can make a small modification to your code and stream the content from one file to the other very easily.
with open(file_name, 'r') as f:
with open(second_file_name,'w') a t:
counter = 0
for line in f:
if line == my_comment_text:
counter = 3
elif: counter > 0
counter -= 1
else:
w.write(line)

I like the answer form #Ashwini. I was working on the solution also and something like this should work if you are OK to write a new file with filtered lines:
def rewriteByRemovingSomeLines(inputFile, outputFile):
unDesiredLines = []
count = 0
skipping = False
fhIn = open(inputFile, 'r')
line = fhIn.readline()
while(line):
if line.startswith('#I'):
unDesiredLines.append(count)
skipping = True
while (skipping):
line = fhIn.readline()
count = count + 1
if (line == '\n' or line.startswith('#')):
skipping=False
else:
unDesiredLines.append(count)
count = count + 1
line = fhIn.readline()
fhIn.close()
fhIn = open(inputFile, 'r')
count = 0
#Write the desired lines to a new file
fhOut = open(outputFile, 'w')
for line in fhIn:
if not (count in unDesiredLines):
fhOut.write(line)
count = count + 1
fhIn.close()
fhOut.close

Related

Removing duplicates from text file using python

I have this text file and let's say it contains 10 lines.
Bye
Hi
2
3
4
5
Hi
Bye
7
Hi
Every time it says "Hi" and "Bye" I want it to be removed except for the first time it was said.
My current code is (yes filename is actually pointing towards a file, I just didn't place it in this one)
text_file = open(filename)
for i, line in enumerate(text_file):
if i == 0:
var_Line1 = line
if i = 1:
var_Line2 = line
if i > 1:
if line == var_Line2:
del line
text_file.close()
It does detect the duplicates, but it takes a very long time considering the amount of lines there are, but I'm not sure on how to delete them and save it as well
You could use dict.fromkeys to remove duplicates and preserve order efficiently:
with open(filename, "r") as f:
lines = dict.fromkeys(f.readlines())
with open(filename, "w") as f:
f.writelines(lines)
Idea from Raymond Hettinger
Using a set & some basic filtering logic:
with open('test.txt') as f:
seen = set() # keep track of the lines already seen
deduped = []
for line in f:
line = line.rstrip()
if line not in seen: # if not seen already, write the lines to result
deduped.append(line)
seen.add(line)
# re-write the file with the de-duplicated lines
with open('test.txt', 'w') as f:
f.writelines([l + '\n' for l in deduped])

Python program to number rows

i have a file with data as such.
>1_DL_2021.1123
>2_DL_2021.1206
>3_DL_2021.1202
>3_DL_2021.1214
>4_DL_2021.1214
>4_DL_2021.1214
>6_DL_2021.1214
>7_DL_2021.1214
>8_DL_2021.1214
now as you can see the data is not numbered properly and hence needs to be numbered.
what im aiming for is this:
>1_DL_2021.1123
>2_DL_2021.1206
>3_DL_2021.1202
>4_DL_2021.1214
>5_DL_2021.1214
>6_DL_2021.1214
>7_DL_2021.1214
>8_DL_2021.1214
>9_DL_2021.1214
now the file has a lot of other stuff between these lines starting with > sign. i want only the > sign stuff affected.
could someone please help me out with this.
also there are 563 such lines so manually doing it is out of question.
So, assuming input data file is "input.txt"
You can achieve what you want with this
import re
with open("input.txt", "r") as f:
a = f.readlines()
regex = re.compile(r"^>\d+_DL_2021\.\d+\n$")
counter = 1
for i, line in enumerate(a):
if regex.match(line):
tokens = line.split("_")
tokens[0] = f">{counter}"
a[i] = "_".join(tokens)
counter += 1
with open("input.txt", "w") as f:
f.writelines(a)
So what it does it searches for line with the regex ^>\d+_DL_2021\.\d+\n$, then splits it by _ and gets the first (0th) element and rewrites it, then counts up by 1 and continues the same thing, after all it just writes updated strings back to "input.txt"
sudden_appearance already provided a good answer.
In case you don't like regex too much you can use this code instead:
new_lines = []
with open('test_file.txt', 'r') as f:
c = 1
for line in f:
if line[0] == '>':
after_dash = line.split('_',1)[1]
new_line = '>' + str(c) + '_' + after_dash
c += 1
new_lines.append(new_line)
else:
new_lines.append(line)
with open('test_file.txt', 'w') as f:
f.writelines(new_lines)
Also you can have a look at this split tutorial for more information about how to use split.

Reading a text file from a certain point (python)

I'm trying to make code that can find a specific word in a file and start reading from there until it reads the same word again. In this case the word is "story". The code counts up the lines until the word, and then it starts counting again from 0 in the second loop. I have tried to use functions and global variables, but I keep getting the same number twice and I don't know why.
file = open("testing_area.txt", "r")
line_count = 0
counting = line_count
for line in file.readlines()[counting:]:
if line != "\n":
line_count = line_count + 1
if line.startswith('story'):
#line_count += 1
break
print(line_count)
for line in file.readlines()[counting:]:
if line != "\n":
line_count = line_count + 1
if line.startswith('story'):
#line_count += 1
break
print(line_count)
file.close()
Output:
6
6
Expected output:
6
3
This is the text file:
text
text
text
text
text
story
text
text
story
Code can be simplified to:
with open("testing_area.txt", "r") as file: # Context manager preferred for file open
first, second = None, None # index of first and second occurance of 'story'
for line_count, line in enumerate(file, start = 1): # provides line index and content
if line.startswith('story'): # no need to check separately for blank lines
if first is None:
first = line_count # first is None, so this must be the first
else:
second = line_count # previously found first, so this is the second
break # have now found first & second
print(first, second - first) # index of first occurrence and number of lines between first and second
# Output: 6, 3
There are several issues here. The first is that, for a given file object, readlines() basically only works once. Imagine a text file open in an editor, with a cursor that starts at the beginning. readline() (singular) reads the next line, moving the cursor down one: readlines() (plural) reads all lines from the cursor's current position to the end. Once you've called it once, there are no more lines left to read. You could solve this by putting something like lines = file.readlines() up at the top, and then looping through the resulting list. (See this section in the docs for more info.)
However, you neither reset line_count to 0, nor ever set counting to anything but 0, so the loops still won't do what you intend. You want something more like this:
with open("testing_area.txt") as f:
lines = f.readlines()
first_count = 0
for line in lines:
if line != "\n":
first_count += 1
if line.startswith('story'):
break
print(first_count)
second_count = 0
for line in lines[first_count:]:
if line != "\n":
second_count += 1
if line.startswith('story'):
break
print(second_count)
(This also uses the with keyword, which automatically closes the file even if the program encounters an exception.)
That said, you don't really need two loops in the first place. You're looping through one set of lines, so as long as you reset the line number, you can do it all at once:
line_no = 0
words_found = 0
with open('testing_area.txt') as f:
for line in f:
if line == '\n':
continue
line_no += 1
if line.startswith('story'):
print(line_no)
line_no = 0
words_found += 1
if words_found == 2:
break
(Using if line == '\n': continue is functionally the same as putting the rest of the loop's code inside if line != '\n':, but personally I like avoiding the extra indentation. It's mostly a matter of personal preference.)
As the question doesn't said that it only needs to count the word twice, I provide a solution that will read through the whole file and print every time when "story" found.
# Using with to open file is preferred as file will be properly closed
with open("testing_area.txt") as f:
line_count = 0
for line in f:
line_count += 1
if line.startwith("story"):
print(line_count)
# reset the line_count if "story" found
line_count = 0
Output:
6
3

I want to divide a file according to a specific word, and based on this word I want the above line, then puted in newfile.txt

I want to divide a file based on a specific word, and based on this word if he finds it, I want the line above it, then puts the line above it and the word with the content in a file, and it stops when he finds the line above the specified word ? plz help
this is mu code :
import collections
import itertools
import sys
count = 0
done = False
with open("file".txt") as in_file:
before = collections.deque(maxlen=3)
while not done:
with open(f"newfile{count}.txt", "w") as out_file:
while not done:
try:
line = next(in_file).strip()
except StopIteration:
done = True
break
if "X-IronPort-RCPT-TO" in line:
out_file.write(line)
before.append('\n')
break
else:
out_file.writelines(before)
out_file.write('\n')
out_file.write(line)
count += 1
not sure if that what you want
with open("file".txt") as in_file:
lines = in_file.readlines():
for l in range(len(lines)):
if "X-IronPort-RCPT-TO" in lines[l]:
line_above = lines[l-1]

Locate a specific line in a file based on user input then delete a specific number of lines

I'm trying to delete specific lines in a text file the way I need to go about it is by prompting the user to input a string (a phrase that should exist in the file) the file is then searched and if the string is there the data on that line and the number line number are both stored.
After the phrase has been found it and the five following lines are printed out. Now I have to figure out how to delete those six lines without changing any other text in the file which is my issue lol.
Any Ideas as to how I can delete those six lines?
This was my latest attempt to delete the lines
file = open('C:\\test\\example.txt', 'a')
locate = "example string"
for i, line in enumerate(file):
if locate in line:
line[i] = line.strip()
i = i+1
line[i] = line.strip()
i = i+1
line[i] = line.strip()
i = i+1
line[i] = line.strip()
i = i + 1
line[i] = line.strip()
i = i+1
line[i] = line.strip()
break
Usually I would not think it's desirable to overwrite the source file - what if the user does something by mistake? If your project allows, I would write the changes out to a new file.
with open('source.txt', 'r') as ifile:
with open('output.txt', 'w') as ofile:
locate = "example string"
skip_next = 0
for line in ifile:
if locate in line:
skip_next = 6
print(line.rstrip('\n'))
elif skip_next > 0:
print(line.rstrip('\n'))
skip_next -= 1
else:
ofile.write(line)
This is also robust to finding the phrase multiple times - it will just start counting lines to remove again.
You can find the occurrences, copy the list items between the occurrences to a new list and then save the new list into the file.
_newData = []
_linesToSkip = 3
with open('data.txt', 'r') as _file:
data = _file.read().splitlines()
occurrences = [i for i, x in enumerate(data) if "example string" in x]
_lastOcurrence = 0
for ocurrence in occurrences:
_newData.extend(data[_lastOcurrence : ocurrence])
_lastOcurrence = ocurrence + _linesToSkip
_newData.extend(data[_lastOcurrence:])
# Save new data into the file
There are a couple of points that you clearly misunderstand here:
.strip() removes whitespace or given characters:
>>> print(str.strip.__doc__)
S.strip([chars]) -> str
Return a copy of the string S with leading and trailing
whitespace removed.
If chars is given and not None, remove characters in chars instead.
incrementing i doesn't actually do anything:
>>> for i, _ in enumerate('ignore me'):
... print(i)
... i += 10
...
0
1
2
3
4
5
6
7
8
You're assigning to the ith element of the line, which should raise an exception (that you neglected to tell us about)
>>> line = 'some text'
>>> line[i] = line.strip()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment
Ultimately...
You have to write to a file if you want to change its contents. Writing to a file that you're reading from is tricky business. Writing to an alternative file, or just storing the file in memory if it's small enough is a much healthier approach.
search_string = 'example'
lines = []
with open('/tmp/fnord.txt', 'r+') as f: #`r+` so we can read *and* write to the file
for line in f:
line = line.strip()
if search_string in line:
print(line)
for _ in range(5):
print(next(f).strip())
else:
lines.append(line)
f.seek(0) # back to the beginning!
f.truncate() # goodbye, original lines
for line in lines:
print(line, file=f) # python2 requires `from __future__ import print_function`
There is a fatal flaw in this approach, though - if the sought after line is any closer than the 6th line from the end, it's going to have problems. I'll leave that as an exercise for the reader.
You are appending to your file by using open with 'a'. Also, you are not closing your file (bad habit). str.strip() does not delete the line, it removes whitespace by default. Also, this would usually be done in a loop.
This to get started:
locate = "example string"
n=0
with open('example.txt', 'r+') as f:
for i,line in enumerate(f):
if locate in line:
n = 6
if n:
print( line, end='' )
n-=1
print( "done" )
Edit:
Read-modify-write solution:
locate = "example string"
filename='example.txt'
removelines=5
with open(filename) as f:
lines = f.readlines()
with open(filename, 'w') as f:
n=0
for line in lines:
if locate in line:
n = removelines+1
if n:
n-=1
else:
f.write(line)

Categories

Resources