Combine two wordlist in one file Python - python

I have two wordlists, as per the examples below:
wordlist1.txt
aa
bb
cc
wordlist2.txt
11
22
33
I want to take every line from wordlist2.txt and put it after each line in wordlist1.txt and combine them in wordlist3.txt like this:
aa
11
bb
22
cc
33
.
.
Can you please help me with how to do it? Thanks!

Try to always try to include what you have tried.
However, this is a great place to start.
def read_file_to_list(filename):
with open(filename) as file:
lines = file.readlines()
lines = [line.rstrip() for line in lines]
return lines
wordlist1= read_file_to_list("wordlist1.txt")
wordlist2= read_file_to_list("wordlist2.txt")
with open("wordlist3.txt",'w',encoding = 'utf-8') as f:
for x,y in zip(wordlist1,wordlist2):
f.write(x+"\n")
f.write(y+"\n")
Check the following question for more ideas and understanding: How to read a file line-by-line into a list?
Cheers

Open wordlist1.txt and wordlist2.txt for reading and wordlist3.txt for writing. Then it's as simple as:
with open('wordlist3.txt', 'w') as w3, open('wordlist1.txt') as w1, open('wordlist2.txt') as w2:
for l1, l2 in zip(map(str.rstrip, w1), map(str.rstrip, w2)):
print(f'{l1}\n{l2}', file=w3)

Instead of using .splitlines(), you can also iterate over the files directly. Here's the code:
wordlist1 = open("wordlist1.txt", "r")
wordlist2 = open("wordlist2.txt", "r")
wordlist3 = open("wordlist3.txt", "w")
for txt1,txt2 in zip(wordlist1, wordlist2):
if not txt1.endswith("\n"):
txt1+="\n"
wordlist3.write(txt1)
wordlist3.write(txt2)
wordlist1.close()
wordlist2.close()
wordlist3.close()
In the first block, we are opening the files. For the first two, we use "r", which stands for read, as we don't want to change anything to the files. We can omit this, as "r" is the default argument of the open function. For the second one, we use "w", which stands for write. If the file didn't exist yet, it will create a new file.
Next, we use the zip function in the for loop. It creates an iterator containing tuples from all iterables provided as arguments. In this loop, it will contain tuples containing each one line of wordlist1.txt and one of wordlist2.txt. These tuples are directly unpacked into the variables txt1 and txt2.
Next we use an if statement to check whether the line of wordlist1.txt ends with a newline. This might not be the case with the last line, so this needs to be checked. We don't check it with the second line, as it is no problem that the last line has no newline because it will also be at the end of the resulting file.
Next, we are writing the text to wordlist3.txt. This means that the text is appended to the end of the file. However, the text that was already in the file before the opening, is lost.
Finally, we close the files. This is very important to do, as otherwise some progress might not be saved and no other applications can use the file meanwhile.

Try this:
with open('wordlist1.txt', 'r') as f1:
f1_list = f1.read().splitlines()
with open('wordlist2.txt', 'r') as f2:
f2_list = f2.read().splitlines()
f3_list = [x for t in list(zip(f1, f2)) for x in t]
with open('wordlist3.txt', 'w') as f3:
f3.write("\n".join(f3_list))

with open('wordlist1.txt') as w1,\
open('wordlist2.txt') as w2,\
open('wordlist3.txt', 'w') as w3:
for wordlist1, wordlist2 in zip(w1.readlines(), w2.readlines()):
if wordlist1[-1] != '\n':
wordlist1 += '\n'
if wordlist2[-1] != '\n':
wordlist2 += '\n'
w3.write(wordlist1)
w3.write(wordlist2)

Here you go :)
with open('wordlist1.txt', 'r') as f:
file1 = f.readlines()
with open('wordlist2.txt', 'r') as f:
file2 = f.readlines()
with open('wordlist3.txt', 'w') as f:
for x in range(len(file1)):
if not file1[x].endswith('\n'):
file1[x] += '\n'
f.write(file1[x])
if not file2[x].endswith('\n'):
file2[x] += '\n'
f.write(file2[x])

Open wordlist 1 and 2 and make a line paring, separate each pair by a newline character then join all the pairs together and separated again by a newline.
# paths
wordlist1 = #
wordlist2 = #
wordlist3 = #
with open(wordlist1, 'r') as fd1, open(wordlist2, 'r') as fd2:
out = '\n'.join(f'{l1}\n{l2}' for l1, l2 in zip(fd1.read().split(), fd2.read().split()))
with open(wordlist3, 'w') as fd:
fd.write(out)

Related

How to save each line of a file to a new file (every line a new file) and do that for multiple original files

I have 5 files from which i want to take each line (24 lines in total) and save it to a new file. I managed to find a code which will do that but they way it is, every time i have to manually change the number of the appropriate original file and of the file i want to save it to and also the number of each line every time.
The code:
x1= np.loadtxt("x_p2_40.txt")
x2= np.loadtxt("x_p4_40.txt")
x3= np.loadtxt("x_p6_40.txt")
x4= np.loadtxt("x_p8_40.txt")
x5= np.loadtxt("x_p1_40.txt")
with open("x_p1_40.txt", "r") as file:
content = file.read()
first_line = content.split('\n', 1)[0]
with open("1_p_40_x.txt", "a" ) as f :
f.write("\n")
with open("1_p_40_x.txt", "a" ) as fa :
fa.write(first_line)
print(first_line)
I am a beginner at python, and i'm not sure how to make a loop for this, because i assume i need a loop?
Thank you!
Since you have multiple files here, you could define their names in a list, and use a list comprehension to open file handles to them all:
input_files = ["x_p2_40.txt", "x_p4_40.txt", "x_p6_40.txt", "x_p8_40.txt", "x_p1_40.txt"]
file_handles = [open(f, "r") for f in input_files]
Since each of these file handles is an iterator that yields a single line every time you iterate over it, you could simply zip() all these file handles to iterate over them simultaneously. Also throw in an enumerate() to get the line numbers:
for line_num, files_lines in enumerate(zip(*file_handles), 1):
out_file = f"{line_num}_p_40.txt"
# Remove trailing whitespace on all lines, then add a newline
files_lines = [f.rstrip() + "\n" for f in files_lines]
with open(out_file, "w") as of:
of.writelines(files_lines)
With three files:
x_p2_40.txt:
2_1
2_2
2_3
2_4
x_p4_40.txt:
4_1
4_2
4_3
4_4
x_p6_40.txt:
6_1
6_2
6_3
6_4
I get the following output:
1_p_40.txt:
2_1
4_1
6_1
2_p_40.txt:
2_2
4_2
6_2
3_p_40.txt:
2_3
4_3
6_3
4_p_40.txt:
2_4
4_4
6_4
Finally, since we didn't use a context manager to open the original file handles, remember to close them after we're done:
for fh in file_handles:
fh.close()
If you have files with an unequal number of lines and you want to create files for all lines, consider using itertools.zip_longest() instead of zip()
In order to read each of your input files, you can store them in a list and iterate over it with a for loop. Then we add every line to a single list with the function extend() :
inputFiles = ["x_p2_40.txt", "x_p4_40.txt", "x_p6_40.txt", "x_p8_40.txt", "x_p1_40.txt"]
outputFile = "outputfile.txt"
lines = []
for filename in inputFiles:
with open(filename, 'r') as f:
lines.extend(f.readlines())
lines[-1] += '\n'
Finally you can write all the line to your output file :
with open(outputFile, 'w') as f:
f.write(''.join(lines))

Split and print the word before and after the \ of *n of lines, from a txt to two different txt's

I searched around a bit, but I couldn't find a solution that fits my needs.
I'm new to python, so I'm sorry if what I'm asking is pretty obvious.
I have a .txt file (for simplicity I will call it inputfile.txt) with a list of names of folder\files like this:
camisos\CROWDER_IMAG_1.mov
camisos\KS_HIGHENERGY.mov
camisos\KS_LOWENERGY.mov
What I need is to split the first word (the one before the \) and write it to a txt file (for simplicity I will call it outputfile.txt).
Then take the second (the one after the \) and write it in another txt file.
This is what i did so far:
with open("inputfile.txt", "r") as f:
lines = f.readlines()
with open("outputfile.txt", "w") as new_f:
for line in lines:
text = input()
print(text.split()[0])
This in my mind should print only the first word in the new txt, but I only got an empty txt file without any error.
Any advice is much appreciated, thanks in advance for any help you could give me.
You can read the file in a list of strings and split each string to create 2 separate lists.
with open("inputfile.txt", "r") as f:
lines = f.readlines()
X = []
Y = []
for line in lines:
X.append(line.split('\\')[0] + '\n')
Y.append(line.split('\\')[1])
with open("outputfile1.txt", "w") as f1:
f1.writelines(X)
with open("outputfile2.txt", "w") as f2:
f2.writelines(Y)

reading .txt file in python

I have a problem with a code in python. I want to read a .txt file. I use the code:
f = open('test.txt', 'r') # We need to re-open the file
data = f.read()
print(data)
I would like to read ONLY the first line from this .txt file. I use
f = open('test.txt', 'r') # We need to re-open the file
data = f.readline(1)
print(data)
But I am seeing that in screen only the first letter of the line is showing.
Could you help me in order to read all the letters of the line ? (I mean to read whole the line of the .txt file)
with open("file.txt") as f:
print(f.readline())
This will open the file using with context block (which will close the file automatically when we are done with it), and read the first line, this will be the same as:
f = open(“file.txt”)
print(f.readline())
f.close()
Your attempt with f.readline(1) won’t work because it the argument is meant for how many characters to print in the file, therefore it will only print the first character.
Second method:
with open("file.txt") as f:
print(f.readlines()[0])
Or you could also do the above which will get a list of lines and print only the first line.
To read the fifth line, use
with open("file.txt") as f:
print(f.readlines()[4])
Or:
with open("file.txt") as f:
lines = []
lines += f.readline()
lines += f.readline()
lines += f.readline()
lines += f.readline()
lines += f.readline()
print(lines[-1])
The -1 represents the last item of the list
Learn more:
with statement
files in python
readline method
Your first try is almost there, you should have done the following:
f = open('my_file.txt', 'r')
line = f.readline()
print(line)
f.close()
A safer approach to read file is:
with open('my_file.txt', 'r') as f:
print(f.readline())
Both ways will print only the first line.
Your error was that you passed 1 to readline which means you want to read size of 1, which is only a single character. please refer to https://www.w3schools.com/python/ref_file_readline.asp
I tried this and it works, after your suggestions:
f = open('test.txt', 'r')
data = f.readlines()[1]
print(data)
Use with open(...) instead:
with open("test.txt") as file:
line = file.readline()
print(line)
Keep f.readline() without parameters.
It will return you first line as a string and move cursor to second line.
Next time you use f.readline() it will return second line and move cursor to the next, etc...

Read 2 files, add something in between each line

Here's my code:
eeee = input('\nWhat do you want to combine each other with? ')
first = []
second = []
with open('First.txt', 'r') as f:
for line in f.readlines():
first.append(line)
with open('Second.txt', 'r') as f:
for line in f.readlines():
second.append(line)
with open('NewStuff.txt', 'a') as f:
for thing in first:
for thing2 in second:
f.write(thing + str(eeee) + thing2)
I want to get first line from file1, add something in the middle of it (whatever eeee is inputted as) and then print the first line from file2 and then get second line and repeat
You could try something like this, using zip and openning multiple files:
eeee = input('\nWhat do you want to combine each other with? ')
with open('First.txt', 'r') as f1, open('Second.txt', 'r') as f2,open('NewStuff.txt', 'a') as fnew:
for first, second in zip(f1.readlines(),f2.readlines())
fnew.write(first.replace('\n','')+' '+ str(eeee)+' '+ second)
Since you want str(eeee) for all strings in second, you could just add it to the starting of all items of that list.
third = [str(eeee)+i for i in second]
Your code is almost correct, just change this part:
with open('NewStuff.txt', 'a') as f:
for thing in first:
for thing2 in second:
f.write(thing + str(eeee) + thing2)
to:
with open('NewStuff.txt', 'a') as f:
for thing,thing2 in zip(first,second):
f.write(thing + str(eeee) + thing2)
Since you're not combining mutiple lines of First.txt together, nor multiple lines of Second.txt with each other, I wouldn't use any solution involving readlines() as there's no reason to read either file completely into memory.
The most you need to read at any time is one line of each file. I'd suggest a solution along lines of:
eeee = input('\nWhat do you want to combine each other with? ')
with open('First.txt') as left, open('Second.txt') as right:
with open('NewStuff.txt', 'a') as output:
for line in left:
output.write(line.rstrip('\n') + eeee + right.readline())
And avoid any solution that preserves your calls to readlines() or introduces new ones. Note that we rstrip('\n') the end of the left line so that we end up with a single output line. Now you need to consider what happens if the two input files do not contain the same number of lines.

Changing the contents of a text file and making a new file with same format

I have a big text file with a lot of parts. Every part has 4 lines and next part starts immediately after the last part.
The first line of each part starts with #, the 2nd line is a sequence of characters, the 3rd line is a + and the 4th line is again a sequence of characters.
Small example:
#M00872:462:000000000-D47VR:1:1101:15294:1338 1:N:0:ACATCG
TGCTCGGTGTATGTAAACTTCCGACTTCAACTGTATAGGGATCCAATTTTGACAAAATATTAACGCTTATCGATAAAATTTTGAATTTTGTAACTTGTTTTTGTAATTCTTTAGTTTGTATGTCTGTTGCTATTATGTCTACTATTCTTTCCCCTGCACTGTACCCCCCAATCCCCCCTTTTCTTTTAAAAGTTAACCGATACCGTCGAGATCCGTTCACTAATCGAACGGATCTGTCTCTGTCTCTCTC
+
BAABBADBBBFFGGGGGGGGGGGGGGGHHGHHGH55FB3A3GGH3ADG5FAAFEGHHFFEFHD5AEG1EF511F1?GFH3#BFADGD55F?#GFHFGGFCGG/GHGHHHHHHHDBG4E?FB?BGHHHHHHHHHHHHHHHHHFHHHHHHHHHGHGHGHHHHHFHHHHHGGGGHHHHGGGGHHHHHHHGHGHHHHHHFGHCFGGGHGGGGGGGGFGGEGBFGGGGGGGGGFGGGGFFB9/BFFFFFFFFFF/
I want to change the 2nd and the 4th line of each part and make a new file with similar structure (4 lines for each part). In fact I want to keep the 1st 65 characters (in lines 2 and 4) and remove the rest of characters. The expected output for the small example would look like this:
#M00872:462:000000000-D47VR:1:1101:15294:1338 1:N:0:ACATCG
TGCTCGGTGTATGTAAACTTCCGACTTCAACTGTATAGGGATCCAATTTTGACAAAATATTAACG
+
BAABBADBBBFFGGGGGGGGGGGGGGGHHGHHGH55FB3A3GGH3ADG5FAAFEGHHFFEFHD5A
I wrote the following code:
infile = open("file.fastq", "r")
new_line=[]
for line_number in len(infile.readlines()):
if line_number ==2 or line_number ==4:
new_line.append(infile[line_number])
with open('out_file.fastq', 'w') as f:
for item in new_line:
f.write("%s\n" % item)
but it does not return what I want. How to fix it to get the expected output?
This code will achieve what you want -
from itertools import islice
with open('bio.txt', 'r') as infile:
while True:
lines_gen = list(islice(infile, 4))
if not lines_gen:
break
a,b,c,d = lines_gen
b = b[0:65]+'\n'
d = d[0:65]+'\n'
with open('mod_bio.txt', 'a+') as f:
f.write(a+b+c+d)
How it works?
We first make a generator that gives 4 lines at a time as you mention.
Then we open the lines into individual lines a,b,c,d and perform string slicing. Eventually we join that string and write it to a new file.
I think some itertools.cycle could be nice here:
import itertools
with open("transformed.file.fastq", "w+") as output_file:
with open("file.fastq", "r") as input_file:
for i in itertools.cycle((1,2,3,4)):
line = input_file.readline().strip()
if not line:
break
if i in (2,4):
line = line[:65]
output_file.write("{}\n".format(line))
readlines() will return list of each line in your file. You don't need to prepare a list new_line. Directly iterate over index-value pair of list, then you can modify all the values in your desired position.
By modifying your code, try this
infile = open("file.fastq", "r")
new_lines = infile.readlines()
for i, t in enumerate(new_lines):
if i == 1 or i == 3:
new_lines[i] = new_lines[i][:65]
with open('out_file.fastq', 'w') as f:
for item in new_lines:
f.write("%s" % item)

Categories

Resources