Python linking 2 strings - python

I am working on python program where the goal is to create a tool that takes the first word word from a file and put it beside another line in a different file.
This is the code snippet:
lines = open("x.txt", "r").readlines()
lines2 = open("c.txt", "r").readlines()
for line in lines:
r = line.split()
line1 = str(r[0])
for line2 in lines2:
l2 = line2
rn = open("b.txt", "r").read()
os = open("b.txt", "w").write(rn + line1+ "\t" + l2)
but it doesn't work correctly.
My question is that I want to make this tool to take the first word from a file, and put it beside a line in from another file for all lines in the file.
For example:
File: 1.txt :
hello there
hi there
File: 2.txt :
michal smith
takawa sama
I want the result to be :
Output:
hello michal smith
hi takaua sama

By using the zip function, you can loop through both simultaneously. Then you can pull the first word from your greeting and add it to the name to write to the file.
greetings = open("x.txt", "r").readlines()
names = open("c.txt", "r").readlines()
with open("b.txt", "w") as output_file:
for greeting, name in zip(greetings, names):
greeting = greeting.split(" ")[0]
output = "{0} {1}\n".format(greeting, name)
output_file.write(output)

Yes , like Tigerhawk indicated you want to use zip function, which combines elements from different iterables at the same index to create a list of tuples (each ith tuple having elements from ith index from each list).
Example code -
lines = open("x.txt", "r").readlines()
lines2 = open("c.txt", "r").readlines()
newlines = ["{} {}".format(x.split()[0] , y) for x, y in zip(lines,lines2)]
with open("b.txt", "w") as opfile:
opfile.write(newlines)

from itertools import *
with open('x.txt', 'r') as lines:
with open('c.txt', 'r') as lines2:
with open('b.txt', 'w') as result:
words = imap(lambda x: str(x.split()[0]), lines)
results = izip(words, lines2)
for word, line in results:
result_line = '{0} {1}'.format(word, line)
result.write(result_line)
This code will work without loading files into memory.

Related

Combine two wordlist in one file Python

I have two wordlists, as per the examples below:
wordlist1.txt
aa
bb
cc
wordlist2.txt
11
22
33
I want to take every line from wordlist2.txt and put it after each line in wordlist1.txt and combine them in wordlist3.txt like this:
aa
11
bb
22
cc
33
.
.
Can you please help me with how to do it? Thanks!
Try to always try to include what you have tried.
However, this is a great place to start.
def read_file_to_list(filename):
with open(filename) as file:
lines = file.readlines()
lines = [line.rstrip() for line in lines]
return lines
wordlist1= read_file_to_list("wordlist1.txt")
wordlist2= read_file_to_list("wordlist2.txt")
with open("wordlist3.txt",'w',encoding = 'utf-8') as f:
for x,y in zip(wordlist1,wordlist2):
f.write(x+"\n")
f.write(y+"\n")
Check the following question for more ideas and understanding: How to read a file line-by-line into a list?
Cheers
Open wordlist1.txt and wordlist2.txt for reading and wordlist3.txt for writing. Then it's as simple as:
with open('wordlist3.txt', 'w') as w3, open('wordlist1.txt') as w1, open('wordlist2.txt') as w2:
for l1, l2 in zip(map(str.rstrip, w1), map(str.rstrip, w2)):
print(f'{l1}\n{l2}', file=w3)
Instead of using .splitlines(), you can also iterate over the files directly. Here's the code:
wordlist1 = open("wordlist1.txt", "r")
wordlist2 = open("wordlist2.txt", "r")
wordlist3 = open("wordlist3.txt", "w")
for txt1,txt2 in zip(wordlist1, wordlist2):
if not txt1.endswith("\n"):
txt1+="\n"
wordlist3.write(txt1)
wordlist3.write(txt2)
wordlist1.close()
wordlist2.close()
wordlist3.close()
In the first block, we are opening the files. For the first two, we use "r", which stands for read, as we don't want to change anything to the files. We can omit this, as "r" is the default argument of the open function. For the second one, we use "w", which stands for write. If the file didn't exist yet, it will create a new file.
Next, we use the zip function in the for loop. It creates an iterator containing tuples from all iterables provided as arguments. In this loop, it will contain tuples containing each one line of wordlist1.txt and one of wordlist2.txt. These tuples are directly unpacked into the variables txt1 and txt2.
Next we use an if statement to check whether the line of wordlist1.txt ends with a newline. This might not be the case with the last line, so this needs to be checked. We don't check it with the second line, as it is no problem that the last line has no newline because it will also be at the end of the resulting file.
Next, we are writing the text to wordlist3.txt. This means that the text is appended to the end of the file. However, the text that was already in the file before the opening, is lost.
Finally, we close the files. This is very important to do, as otherwise some progress might not be saved and no other applications can use the file meanwhile.
Try this:
with open('wordlist1.txt', 'r') as f1:
f1_list = f1.read().splitlines()
with open('wordlist2.txt', 'r') as f2:
f2_list = f2.read().splitlines()
f3_list = [x for t in list(zip(f1, f2)) for x in t]
with open('wordlist3.txt', 'w') as f3:
f3.write("\n".join(f3_list))
with open('wordlist1.txt') as w1,\
open('wordlist2.txt') as w2,\
open('wordlist3.txt', 'w') as w3:
for wordlist1, wordlist2 in zip(w1.readlines(), w2.readlines()):
if wordlist1[-1] != '\n':
wordlist1 += '\n'
if wordlist2[-1] != '\n':
wordlist2 += '\n'
w3.write(wordlist1)
w3.write(wordlist2)
Here you go :)
with open('wordlist1.txt', 'r') as f:
file1 = f.readlines()
with open('wordlist2.txt', 'r') as f:
file2 = f.readlines()
with open('wordlist3.txt', 'w') as f:
for x in range(len(file1)):
if not file1[x].endswith('\n'):
file1[x] += '\n'
f.write(file1[x])
if not file2[x].endswith('\n'):
file2[x] += '\n'
f.write(file2[x])
Open wordlist 1 and 2 and make a line paring, separate each pair by a newline character then join all the pairs together and separated again by a newline.
# paths
wordlist1 = #
wordlist2 = #
wordlist3 = #
with open(wordlist1, 'r') as fd1, open(wordlist2, 'r') as fd2:
out = '\n'.join(f'{l1}\n{l2}' for l1, l2 in zip(fd1.read().split(), fd2.read().split()))
with open(wordlist3, 'w') as fd:
fd.write(out)

How to delete specific blank line and concat two line in a text file?

I have an existing txt file(test1) like that,
line1
line2
supp-linex
line3
supp-linex
line4
line5
I want to find the line with "supp" and add this line directly behind the previous line like,(others blank line is not change)
line1
line2linex
line3linex
line4
line5
I know less about how to tackling txt file so in this code,
a_file = open("test1.txt", "r")
lines = a_file.readlines()
a_file.close()
new_file = open("test2.txt", "w")
for line in lines:
if "supp" in line:
#del blank and concat line,I dont know how to del and concat in detail
new_file.write(lines)
new_file.close()
Here is a way that does it without a new list
a_file = open("test.txt", "r")
lines = a_file.readlines()
a_file.close()
new_file = open("test2.txt", "w")
for i, line in enumerate(lines):
if "supp" in line:
j = i
while lines[j-1] == "\n":
del(lines[j-1])
j -= 1
lines[j-1] = lines[j-1].strip() + line.strip("supp-")
del(lines[j])
for line in lines:
new_file.write(line)
new_file.close()
You can use a new list to save the result.
with open("test1.txt") as f:
a_file = f.read().splitlines()
b_file = []
for line in a_file:
if line.startswith('supp-'):
# Removes previous empty lines if possible.
while b_file and len(b_file[-1]) == 0:
b_file.pop()
if b_file:
# Concatenate previous line
b_file[-1] += line[5:]
else:
# When there's no previous lines, appends as is
b_file.append(line[5:])
else:
b_file.append(line)
with open('test2.txt', 'w') as f:
f.write('\n'.join(b_file) + '\n')
You could use re (regular expression) module in Python's standard library to find the pattern and replace it via module's sub() function. To help understand how it works, think of the contents of the whole text file as a single long string containing this:
"line1\nline2\n\nsupp-linex\n\nline3\n\nsupp-linex\nline4\nline5\n"
The regular expression pattern shown in the code below matches a line of characters followed by a blank line, then another prefixed with literal string "supp-". The groups of characters from the match group are also assigned the names prev and extra so they can easily be referred to in the replacement text. The substitution process is applied to the whole file with one sub() call, and then the result of that gets written out to the second text file.
Note: There's a really good Regular Expression HOWTO in the onliinr documentation.
import re
with open('test1.txt', 'r') as a_file:
lines = a_file.read()
pattern = r'''(?P<prev>.+)\n\nsupp-(?P<extra>.+)\n'''
replacement = r'''\g<prev>\g<extra>\n'''
with open('test2.txt', 'w') as new_file:
result = re.sub(pattern, replacement, lines)
new_file.write(result)
print('fini')
Here's the contents of test2.txt after running:
line1
line2linex
line3linex
line4
line5

Show different lines in files

I have found a script which shows different lines in the file NEW.txt which do not exist in OLD.txt file. It works fine, but the problem is that script is messing the lines order when I get the output. This is the script:
with open(r'C:\Users\AMB\NEW.txt') as f, open(r'C:\Users\AMB\OLD.txt') as f2:
lines1 = set(map(str.rstrip, f))
s = str(lines1.difference(map(str.rstrip, f2)))
s = s.replace(',', '\n').replace("'", '').replace("{", '').replace("}", '')
print(s)
So let's suppose that this is the OLD.txt content:
aaaaaaaaaaaa
cccccccccccc
eeeeeeeeeeee
And this is the NEW.txt content:
aaaaaaaaaaaa
bbbbbbbbbbbb
cccccccccccc
dddddddddddd
eeeeeeeeeeee
hhhhhhhhhhhh
I would like to get this output:
bbbbbbbbbbbb
dddddddddddd
hhhhhhhhhhhh
But I am getting a random line order, for example:
dddddddddddd
bbbbbbbbbbbb
hhhhhhhhhhhh
(the output is random, and not always the same)
Is there a way to keep the order for output lines in NEW.txt file? Thanks in advance.
You can remove the set and do all the operations using list.
See this link for more help.
Solution:
s = ""
with open(r'C:\Users\AMB\NEW.txt') as f, open(r'C:\Users\AMB\OLD.txt') as f2:
lines1 = list(map(str.rstrip, f)) #list of words in f
lines2 = list(map(str.rstrip, f2)) #list of words in f2
#finds the difference between both lists
diff = [i for i in lines1 + lines2 if i not in lines1 or i not in lines2]
for words in diff:
s = s + words + '\n' #Appending all words to form a single string
s = s.rstrip() #remove last line whitespace
print(s)

Having next items in a list while we run a for loop

I'm a beginner in python and I'm writing a code which I'm trying to check if I find a specific item I can check some items before and if I can print the current value.
I wrote this code but I can't get the problem:
import re
file1 = open('A.txt', 'r')
file2 = open('B.txt', 'w')
line = file1.readlines()
for index, line in enumerate(file1):
match = re.search(r'R', line)
if match:
for a in range(index, index+2):
same = re.search(r'T', line.next())
if same:
file2.writelines(line)
file2.close()
file1.close()
You weren't very clear with your question, but I suspect that you're trying to do something like this:
You want to make sure that line1 from fine A.txt goes into file B.txt if it has R character in it, and it is followed by a line that has T character in it. Here's a solution for that problem.
import re
file1 = open('A.txt', 'r')
file2 = open('B.txt', 'w')
last_line = file1.readline()
while (current_line := file1.readline()):
match_last = re.search(r'R', last_line)
match_current = re.search(r'T', current_line)
if match_last and match_current:
file2.writelines(last_line)
last_line = current_line
file2.close()
file1.close()

How can I split a text file into multiple text files using python?

I have a text file that contains the following contents. I want to split this file into multiple files (1.txt, 2.txt, 3.txt...). Each a new output file will be as the following. The code I tried doesn't split the input file properly. How can I split the input file into multiple files?
My code:
#!/usr/bin/python
with open("input.txt", "r") as f:
a1=[]
a2=[]
a3=[]
for line in f:
if not line.strip() or line.startswith('A') or line.startswith('$$'): continue
row = line.split()
a1.append(str(row[0]))
a2.append(float(row[1]))
a3.append(float(row[2]))
f = open('1.txt','a')
f = open('2.txt','a')
f = open('3.txt','a')
f.write(str(a1))
f.close()
Input file:
A
x
k
..
$$
A
z
m
..
$$
A
B
l
..
$$
Desired output 1.txt
A
x
k
..
$$
Desired output 2.txt
A
z
m
..
$$
Desired output 3.txt
A
B
l
..
$$
Read your input file and write to an output each time you find a "$$" and increase the counter of output files, code :
with open("input.txt", "r") as f:
buff = []
i = 1
for line in f:
if line.strip(): #skips the empty lines
buff.append(line)
if line.strip() == "$$":
output = open('%d.txt' % i,'w')
output.write(''.join(buff))
output.close()
i+=1
buff = [] #buffer reset
EDIT: should be efficient too https://wiki.python.org/moin/PythonSpeed/PerformanceTips#String_Concatenation
try re.findall() function:
import re
with open('input.txt', 'r') as f:
data = f.read()
found = re.findall(r'\n*(A.*?\n\$\$)\n*', data, re.M | re.S)
[open(str(i)+'.txt', 'w').write(found[i-1]) for i in range(1, len(found)+1)]
Minimalistic approach for the first 3 occurrences:
import re
found = re.findall(r'\n*(A.*?\n\$\$)\n*', open('input.txt', 'r').read(), re.M | re.S)
[open(str(found.index(f)+1)+'.txt', 'w').write(f) for f in found[:3]]
Some explanations:
found = re.findall(r'\n*(A.*?\n\$\$)\n*', data, re.M | re.S)
will find all occurrences matching the specified RegEx and will put them into the list, called found
[open(str(found.index(f)+1)+'.txt', 'w').write(f) for f in found]
iterate (using list comprehensions) through all elements belonging to found list and for each element create text file (which is called like "index of the element + 1.txt") and write that element (occurrence) to that file.
Another version, without RegEx's:
blocks_to_read = 3
blk_begin = 'A'
blk_end = '$$'
with open('35916503.txt', 'r') as f:
fn = 1
data = []
write_block = False
for line in f:
if fn > blocks_to_read:
break
line = line.strip()
if line == blk_begin:
write_block = True
if write_block:
data.append(line)
if line == blk_end:
write_block = False
with open(str(fn) + '.txt', 'w') as fout:
fout.write('\n'.join(data))
data = []
fn += 1
PS i, personally, don't like this version and i would use the one using RegEx
open 1.txt in the beginning for writing. Write each line to the current output file. Additionally, if line.strip() == '$$', close the old file and open a new one for writing.
The blocks are divided by empty lines. Try this:
import sys
lines = [line for line in sys.stdin.readlines()]
i = 1
o = open("1{}.txt".format(i), "w")
for line in lines:
if len(line.strip()) == 0:
o.close()
i = i + 1
o = open("{}.txt".format(i), "w")
else:
o.write(line)
Looks to me that the condition that you should be checking for is a line that contains just the carriage return (\n) character. When you encounter such a line, write the contents of the parsed file so far, close the file, and open another one for writing.
A very easy way would if you want to split it in 2 files for example:
with open("myInputFile.txt",'r') as file:
lines = file.readlines()
with open("OutputFile1.txt",'w') as file:
for line in lines[:int(len(lines)/2)]:
file.write(line)
with open("OutputFile2.txt",'w') as file:
for line in lines[int(len(lines)/2):]:
file.write(line)
making that dynamic would be:
with open("inputFile.txt",'r') as file:
lines = file.readlines()
Batch = 10
end = 0
for i in range(1,Batch + 1):
if i == 1:
start = 0
increase = int(len(lines)/Batch)
end = end + increase
with open("splitText_" + str(i) + ".txt",'w') as file:
for line in lines[start:end]:
file.write(line)
start = end

Categories

Resources