remove specific lines if some specific word is found

remove specific lines if some specific word is found - python

Lets say I have a text file that contains following words
a
b
c
d
e>
f
g
h
I>
j
whenever I find a words that contains >, I would like to replace the last two lines from it and itself too.
For example, the output would be this.
a
b
f
j
Is it possible to achieve this ?. For simple replace, I can do this
with open ('Final.txt', 'w') as f2:
with open('initial.txt', 'r') as f1:
for line in f1:
f2.write(line.replace('>', ''))
But I am stuck on how do I go back and delete the last two lines and also the line where the replace happen.

This is one approach using a simple iteration and list slicing.
Ex:
res = []
with open('initial.txt') as infile:
for line in infile:
if ">" in line:
res = res[:-2]
else:
res.append(line)
with open('Final.txt', "w") as f2:
for line in res:
f2.write(line)
Output:
a
b
f
j

Use re.
Here I am assuming that your data is a flat list of lines.
import re
print(re.sub('.*\n.*\n.*>\n','',''.join(data)))

Related

Combine two wordlist in one file Python

I have two wordlists, as per the examples below:
wordlist1.txt
aa
bb
cc
wordlist2.txt
11
22
33
I want to take every line from wordlist2.txt and put it after each line in wordlist1.txt and combine them in wordlist3.txt like this:
aa
11
bb
22
cc
33
.
.
Can you please help me with how to do it? Thanks!

Try to always try to include what you have tried.
However, this is a great place to start.
def read_file_to_list(filename):
with open(filename) as file:
lines = file.readlines()
lines = [line.rstrip() for line in lines]
return lines
wordlist1= read_file_to_list("wordlist1.txt")
wordlist2= read_file_to_list("wordlist2.txt")
with open("wordlist3.txt",'w',encoding = 'utf-8') as f:
for x,y in zip(wordlist1,wordlist2):
f.write(x+"\n")
f.write(y+"\n")
Check the following question for more ideas and understanding: How to read a file line-by-line into a list?
Cheers

Open wordlist1.txt and wordlist2.txt for reading and wordlist3.txt for writing. Then it's as simple as:
with open('wordlist3.txt', 'w') as w3, open('wordlist1.txt') as w1, open('wordlist2.txt') as w2:
for l1, l2 in zip(map(str.rstrip, w1), map(str.rstrip, w2)):
print(f'{l1}\n{l2}', file=w3)

Instead of using .splitlines(), you can also iterate over the files directly. Here's the code:
wordlist1 = open("wordlist1.txt", "r")
wordlist2 = open("wordlist2.txt", "r")
wordlist3 = open("wordlist3.txt", "w")
for txt1,txt2 in zip(wordlist1, wordlist2):
if not txt1.endswith("\n"):
txt1+="\n"
wordlist3.write(txt1)
wordlist3.write(txt2)
wordlist1.close()
wordlist2.close()
wordlist3.close()
In the first block, we are opening the files. For the first two, we use "r", which stands for read, as we don't want to change anything to the files. We can omit this, as "r" is the default argument of the open function. For the second one, we use "w", which stands for write. If the file didn't exist yet, it will create a new file.
Next, we use the zip function in the for loop. It creates an iterator containing tuples from all iterables provided as arguments. In this loop, it will contain tuples containing each one line of wordlist1.txt and one of wordlist2.txt. These tuples are directly unpacked into the variables txt1 and txt2.
Next we use an if statement to check whether the line of wordlist1.txt ends with a newline. This might not be the case with the last line, so this needs to be checked. We don't check it with the second line, as it is no problem that the last line has no newline because it will also be at the end of the resulting file.
Next, we are writing the text to wordlist3.txt. This means that the text is appended to the end of the file. However, the text that was already in the file before the opening, is lost.
Finally, we close the files. This is very important to do, as otherwise some progress might not be saved and no other applications can use the file meanwhile.

Try this:
with open('wordlist1.txt', 'r') as f1:
f1_list = f1.read().splitlines()
with open('wordlist2.txt', 'r') as f2:
f2_list = f2.read().splitlines()
f3_list = [x for t in list(zip(f1, f2)) for x in t]
with open('wordlist3.txt', 'w') as f3:
f3.write("\n".join(f3_list))

with open('wordlist1.txt') as w1,\
open('wordlist2.txt') as w2,\
open('wordlist3.txt', 'w') as w3:
for wordlist1, wordlist2 in zip(w1.readlines(), w2.readlines()):
if wordlist1[-1] != '\n':
wordlist1 += '\n'
if wordlist2[-1] != '\n':
wordlist2 += '\n'
w3.write(wordlist1)
w3.write(wordlist2)

Here you go :)
with open('wordlist1.txt', 'r') as f:
file1 = f.readlines()
with open('wordlist2.txt', 'r') as f:
file2 = f.readlines()
with open('wordlist3.txt', 'w') as f:
for x in range(len(file1)):
if not file1[x].endswith('\n'):
file1[x] += '\n'
f.write(file1[x])
if not file2[x].endswith('\n'):
file2[x] += '\n'
f.write(file2[x])

Open wordlist 1 and 2 and make a line paring, separate each pair by a newline character then join all the pairs together and separated again by a newline.
# paths
wordlist1 = #
wordlist2 = #
wordlist3 = #
with open(wordlist1, 'r') as fd1, open(wordlist2, 'r') as fd2:
out = '\n'.join(f'{l1}\n{l2}' for l1, l2 in zip(fd1.read().split(), fd2.read().split()))
with open(wordlist3, 'w') as fd:
fd.write(out)

How do I compare a .txt to another and return what is not in it?

I am trying to compare a .txt file to another and return what is not in it.
For example
one.txt
a
b
c
d
two.txt
b
c
d
e
output
e
I have tried using symmetric_difference() but this will return the difference between both of them. Using the example, it will return e and a.
with open('text_one.txt', 'r') as file1:
with open('text_two.txt', 'r') as file2:
same = set(file1).symmetric_difference(file2)
same.discard('\n')
with open('output.txt', 'w') as file_out:
for line in same:
file_out.write(line)

If you want the items of file2 that are not in file1,just replace this:
same = set(file1).symmetric_difference(file2)
by this:
same = set(file2)-set(file1)

I think you should take care of newline characters before comparing the sets of lines to avoid unexpected output if an equal line is the last item in one file. b difference a will return only the missing items in a.
strip = lambda x: x.strip()
with open('one.txt') as f1, open('two.txt') as f2, open('output.txt', 'w') as out:
out.write(
'\n'.join(
set(map(strip, f2))
.difference(map(strip, f1))))
For the output I'm assuming you want every missing line as a new line.
output.txt
e

Changing the contents of a text file and making a new file with same format

I have a big text file with a lot of parts. Every part has 4 lines and next part starts immediately after the last part.
The first line of each part starts with #, the 2nd line is a sequence of characters, the 3rd line is a + and the 4th line is again a sequence of characters.
Small example:
#M00872:462:000000000-D47VR:1:1101:15294:1338 1:N:0:ACATCG
TGCTCGGTGTATGTAAACTTCCGACTTCAACTGTATAGGGATCCAATTTTGACAAAATATTAACGCTTATCGATAAAATTTTGAATTTTGTAACTTGTTTTTGTAATTCTTTAGTTTGTATGTCTGTTGCTATTATGTCTACTATTCTTTCCCCTGCACTGTACCCCCCAATCCCCCCTTTTCTTTTAAAAGTTAACCGATACCGTCGAGATCCGTTCACTAATCGAACGGATCTGTCTCTGTCTCTCTC
+
BAABBADBBBFFGGGGGGGGGGGGGGGHHGHHGH55FB3A3GGH3ADG5FAAFEGHHFFEFHD5AEG1EF511F1?GFH3#BFADGD55F?#GFHFGGFCGG/GHGHHHHHHHDBG4E?FB?BGHHHHHHHHHHHHHHHHHFHHHHHHHHHGHGHGHHHHHFHHHHHGGGGHHHHGGGGHHHHHHHGHGHHHHHHFGHCFGGGHGGGGGGGGFGGEGBFGGGGGGGGGFGGGGFFB9/BFFFFFFFFFF/
I want to change the 2nd and the 4th line of each part and make a new file with similar structure (4 lines for each part). In fact I want to keep the 1st 65 characters (in lines 2 and 4) and remove the rest of characters. The expected output for the small example would look like this:
#M00872:462:000000000-D47VR:1:1101:15294:1338 1:N:0:ACATCG
TGCTCGGTGTATGTAAACTTCCGACTTCAACTGTATAGGGATCCAATTTTGACAAAATATTAACG
+
BAABBADBBBFFGGGGGGGGGGGGGGGHHGHHGH55FB3A3GGH3ADG5FAAFEGHHFFEFHD5A
I wrote the following code:
infile = open("file.fastq", "r")
new_line=[]
for line_number in len(infile.readlines()):
if line_number ==2 or line_number ==4:
new_line.append(infile[line_number])
with open('out_file.fastq', 'w') as f:
for item in new_line:
f.write("%s\n" % item)
but it does not return what I want. How to fix it to get the expected output?

This code will achieve what you want -
from itertools import islice
with open('bio.txt', 'r') as infile:
while True:
lines_gen = list(islice(infile, 4))
if not lines_gen:
break
a,b,c,d = lines_gen
b = b[0:65]+'\n'
d = d[0:65]+'\n'
with open('mod_bio.txt', 'a+') as f:
f.write(a+b+c+d)
How it works?
We first make a generator that gives 4 lines at a time as you mention.
Then we open the lines into individual lines a,b,c,d and perform string slicing. Eventually we join that string and write it to a new file.

I think some itertools.cycle could be nice here:
import itertools
with open("transformed.file.fastq", "w+") as output_file:
with open("file.fastq", "r") as input_file:
for i in itertools.cycle((1,2,3,4)):
line = input_file.readline().strip()
if not line:
break
if i in (2,4):
line = line[:65]
output_file.write("{}\n".format(line))

readlines() will return list of each line in your file. You don't need to prepare a list new_line. Directly iterate over index-value pair of list, then you can modify all the values in your desired position.
By modifying your code, try this
infile = open("file.fastq", "r")
new_lines = infile.readlines()
for i, t in enumerate(new_lines):
if i == 1 or i == 3:
new_lines[i] = new_lines[i][:65]
with open('out_file.fastq', 'w') as f:
for item in new_lines:
f.write("%s" % item)

How would I read only the first word of each line of a text file?

I wanted to know how I could read ONLY the FIRST WORD of each line in a text file. I tried various codes and tried altering codes but can only manage to read whole lines from a text file.
The code I used is as shown below:
QuizList = []
with open('Quizzes.txt','r') as f:
for line in f:
QuizList.append(line)
line = QuizList[0]
for word in line.split():
print(word)
This refers to an attempt to extract only the first word from the first line. In order to repeat the process for every line i would do the following:
QuizList = []
with open('Quizzes.txt','r') as f:
for line in f:
QuizList.append(line)
capacity = len(QuizList)
capacity = capacity-1
index = 0
while index!=capacity:
line = QuizList[index]
for word in line.split():
print(word)
index = index+1

You are using split at the wrong point, try:
for line in f:
QuizList.append(line.split(None, 1)[0]) # add only first word

Changed to a one-liner that's also more efficient with the strip as Jon Clements suggested in a comment.
with open('Quizzes.txt', 'r') as f:
wordlist = [line.split(None, 1)[0] for line in f]
This is pretty irrelevant to your question, but just so the line.split(None, 1) doesn't confuse you, it's a bit more efficient because it only splits the line 1 time.
From the str.split([sep[, maxsplit]]) docs
If sep is not specified or is None, a different splitting algorithm is
applied: runs of consecutive whitespace are regarded as a single
separator, and the result will contain no empty strings at the start
or end if the string has leading or trailing whitespace. Consequently,
splitting an empty string or a string consisting of just whitespace
with a None separator returns [].
' 1 2 3 '.split() returns ['1', '2', '3']
and
' 1 2 3 '.split(None, 1) returns ['1', '2 3 '].

with Open(filename,"r") as f:
wordlist = [r.split()[0] for r in f]

I'd go for the str.split and similar approaches, but for completness here's one that uses a combination of mmap and re if you needed to extract more complicated data:
import mmap, re
with open('quizzes.txt') as fin:
mf = mmap.mmap(fin.fileno(), 0, access=mmap.ACCESS_READ)
wordlist = re.findall('^(\w+)', mf, flags=re.M)

You should read one character at a time:
import string
QuizList = []
with open('Quizzes.txt','r') as f:
for line in f:
for i, c in enumerate(line):
if c not in string.letters:
print line[:i]
break

l=[]
with open ('task-1.txt', 'rt') as myfile:
for x in myfile:
l.append(x)
for i in l:
print[i.split()[0] ]

Count number of lines in a txt file with Python excluding blank lines

I wish to count the number of lines in a .txt file which looks something like this:
apple
orange
pear
hippo
donkey
Where there are blank lines used to separate blocks. The result I'm looking for, based on the above sample, is five (lines).
How can I achieve this?
As a bonus, it would be nice to know how many blocks/paragraphs there are. So, based on the above example, that would be two blocks.

non_blank_count = 0
with open('data.txt') as infp:
for line in infp:
if line.strip():
non_blank_count += 1
print 'number of non-blank lines found %d' % non_blank_count
UPDATE: Re-read the question, OP wants to count non-blank lines .. (sigh .. thanks #RanRag).
(I need a break from the computer ...)

A short way to count the number of non-blank lines could be:
with open('data.txt', 'r') as f:
lines = f.readlines()
num_lines = len([l for l in lines if l.strip(' \n') != ''])

I am surprised to see that there isn't a clean pythonic answer yet (as of Jan 1, 2019). Many of the other answers create unnecessary lists, count in a non-pythonic way, loop over the lines of the file in a non-pythonic way, do not close the file properly, do unnecessary things, assume that the end of line character can only be '\n', or have other smaller issues.
Here is my suggested solution:
with open('myfile.txt') as f:
line_count = sum(1 for line in f if line.strip())
The question does not define what blank line is. My definition of blank line: line is a blank line if and only if line.strip() returns the empty string. This may or may not be your definition of blank line.

sum([1 for i in open("file_name","r").readlines() if i.strip()])

Considering the blank lines will only contain the new line character, it would be pretty faster to avoid calling str.strip which creates a new string but instead to check if the line contains only spaces using str.isspace and then skip it:
with open('data.txt') as f:
non_blank_lines = sum(not line.isspace() for line in f)
Demo:
from io import StringIO
s = '''apple
orange
pear
hippo
donkey'''
non_blank_lines = sum(not line.isspace() for line in StringIO(s)))
# 5
You can further use str.isspace with itertools.groupby to count the number of contiguous lines/blocks in the file:
from itertools import groupby
no_paragraphs = sum(k for k, _ in groupby(StringIO(s), lambda x: not x.isspace()))
print(no_paragraphs)
# 2

Not blank lines Counter:
lines_counter = 0
with open ('test_file.txt') as f:
for line in f:
if line != '\n':
lines_counter += 1
Blocks Counter:
para_counter = 0
prev = '\n'
with open ('test_file.txt') as f:
for line in f:
if line != '\n' and prev == '\n':
para_counter += 1
prev = line

This bit of Python code should solve your problem:
with open('data.txt', 'r') as f:
lines = len(list(filter(lambda x: x.strip(), f)))

This is how I would've done it:
f = open("file.txt")
l = [x for x in f.readlines() if x != "\n"]
print len(l)
readlines() will make a list of all the lines in the file and then you can just take those lines that have at least something in them.
Looks pretty straightforward to me!

Pretty straight one! I believe
f = open('path','r')
count = 0
for lines in f:
if lines.strip():
count +=1
print count

My one liner would be
print(sum(1 for line in open(path_to_file,'r') if line.strip()))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

remove specific lines if some specific word is found - python

This is one approach using a simple iteration and list slicing. Ex: res = [] with open('initial.txt') as infile: for line in infile: if ">" in line: res = res[:-2] else: res.append(line) with open('Final.txt', "w") as f2: for line in res: f2.write(line) Output: a b f j

Use re. Here I am assuming that your data is a flat list of lines. import re print(re.sub('.\n.\n.*>\n','',''.join(data)))

Related

Combine two wordlist in one file Python

How do I compare a .txt to another and return what is not in it?

Changing the contents of a text file and making a new file with same format

How would I read only the first word of each line of a text file?

Count number of lines in a txt file with Python excluding blank lines

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

remove specific lines if some specific word is found - python

This is one approach using a simple iteration and list slicing. Ex: res = [] with open('initial.txt') as infile: for line in infile: if ">" in line: res = res[:-2] else: res.append(line) with open('Final.txt', "w") as f2: for line in res: f2.write(line) Output: a b f j

Use re. Here I am assuming that your data is a flat list of lines. import re print(re.sub('.*\n.*\n.*>\n','',''.join(data)))

Related

Combine two wordlist in one file Python

How do I compare a .txt to another and return what is not in it?

Changing the contents of a text file and making a new file with same format

How would I read only the first word of each line of a text file?

Count number of lines in a txt file with Python excluding blank lines

Categories

Resources

Use re. Here I am assuming that your data is a flat list of lines. import re print(re.sub('.\n.\n.*>\n','',''.join(data)))