Join the content of files to one file - python

I have two files and I want to join the content of them into one file side-by-side, i.e., line n of the output file should consist of line n of file 1 and line n of file 2. The files have the same number of lines.
What I have until now:
with open('test1.txt', 'r') as f1, open('test2.txt', 'r') as f2:
with open('joinfile.txt', 'w') as fout:
fout.write(f1+f2)
but it gives an error saying -
TypeError: unsupported operand type(s) for +: 'file' and 'file'
What am I doing wrong?

I'd try itertools.chain() and work line per line (you use "r" to open your files, so I assume you do not red binary files:
from itertools import chain
with open('test1.txt', 'r') as f1, open('test2.txt', 'r') as f2:
with open('joinfile.txt', 'w') as fout:
for line in chain(f1, f2):
fout.write(line)
It works as generator, so no memory problems are likely, even for huge files.
Edit
New reuqirements, new sample:
from itertools import izip_longest
separator = " "
with open('test1.txt', 'r') as f1, open('test2.txt', 'r') as f2:
with open('joinfile.txt', 'w') as fout:
for line1, line2 in izip_longest(f1, f2, fillvalue=""):
line1 = line1.rstrip("\n")
fout.write(line1 + separator + line2)
I added a separator string which is put between the lines.
izip_longest also works if one file has more lines than the other. The fill_value "" is then used for the missing line. izip_longestalso works as generator.
Important is also the line line1 = line1.rstrip("\n"), I guess it's obvious what it does.

You can do it with:
fout.write(f1.read())
fout.write(f2.read())

You are actualy concatenating 2 file objects, however, you want to conctenate strings.
Read the file contents first with f.read. For example, this way:
with open('test1.txt', 'r') as f1, open('test2.txt', 'r') as f2:
with open('joinfile.txt', 'w') as fout:
fout.write(f1.read()+f2.read())

I would prefer to use shutil.copyfileobj. You can easily combine it with glob.glob to concatenate a bunch of files by patterns
>>> import shutil
>>> infiles = ["test1.txt", "test2.txt"]
>>> with open("test.out","wb") as fout:
for fname in infiles:
with open(fname, "rb") as fin:
shutil.copyfileobj(fin, fout)
Combining with glob.glob
>>> import glob
>>> with open("test.out","wb") as fout:
for fname in glob.glob("test*.txt"):
with open(fname, "rb") as fin:
shutil.copyfileobj(fin, fout)
But over and above that if you are in a system where you can use posix utilities, prefer its use
D:\temp>cat test1.txt test2.txt > test.out
In case you are using windows, you can issue the following from command prompt.
D:\temp>copy/Y test1.txt+test2.txt test.out
test1.txt
test2.txt
1 file(s) copied.
Note
Based on your latest update
Yes it has the same number of lines and I want to join every line of
one file with the other file
with open("test.out","wb") as fout:
fout.writelines('\n'.join(''.join(map(str.strip, e))
for e in zip(*(open(fname) for fname in infiles))))
And on posix system, you can do
paste test1.txt test2.txt

Related

Replacing text from one file from another file

The f1.write(line2) works but it does not replace the text in the file, it just adds it to the file. I want the file1 to be identical to file2 if they are different by overwriting the text from file1 with the text from file2
Here is my code:
with open("file1.txt", "r+") as f1, open("file2.txt", "r") as f2:
for line1 in f1:
for line2 in f2:
if line1 == line2:
print("same")
else:
print("different")
f1.write(line2)
break
f1.close()
f2.close()
I would read both files create a new list with the different elements replaced and then write the entire list to the file
with open('file2.txt', 'r') as f:
content = [line.strip() for line in f]
with open('file1.txt', 'r') as j:
content_a = [line.strip() for line in j]
for idx, item in enumerate(content_a):
if content_a[idx] == content[idx]:
print('same')
pass
else:
print('different')
content_a[idx] = content[idx]
with open('file1.txt', 'w') as k:
k.write('\n'.join(content_a))
file1.txt before:
chrx#chrx:~/python/stackoverflow/9.28$ cat file1.txt
this
that
this
that
who #replacing
that
what
blah
code output:
same
same
same
same
different
same
same
same
file1.txt after:
chrx#chrx:~/python/stackoverflow/9.28$ cat file1.txt
this
that
this
that
vash #replaced who
that
what
blah
I want the file1 to be identical to file2
import shutil
with open('file2', 'rb') as f2, open('file1', 'wb') as f1:
shutil.copyfileobj(f2, f1)
This will be faster as you don't have to read file1.
Your code is not working because you'd have to position the file current pointer (with f1.seek() in the correct position to write the line.
In your code, you're reading a line first, and that positions the pointer after the line just read. When writing, the line data will be written in the file in that point, thus duplicating the line.
Since lines can have different sizes, making this work won't be easy, because even if you position the pointer correctly, if some line is modified to get bigger it would overwrite part of the next line inside the file when you write it. You would end up having to cache at least part of the file contents in memory anyway.
Better truncate the file (erase its contents) and write the other file data directly - then they will be identical. That's what the code in the answer does.

Copying content of file into another file in Python [duplicate]

I would like to copy certain lines of text from one text file to another. In my current script when I search for a string it copies everything afterwards, how can I copy just a certain part of the text? E.g. only copy lines when it has "tests/file/myword" in it?
current code:
#!/usr/bin/env python
f = open('list1.txt')
f1 = open('output.txt', 'a')
doIHaveToCopyTheLine=False
for line in f.readlines():
if 'tests/file/myword' in line:
doIHaveToCopyTheLine=True
if doIHaveToCopyTheLine:
f1.write(line)
f1.close()
f.close()
The oneliner:
open("out1.txt", "w").writelines([l for l in open("in.txt").readlines() if "tests/file/myword" in l])
Recommended with with:
with open("in.txt") as f:
lines = f.readlines()
lines = [l for l in lines if "ROW" in l]
with open("out.txt", "w") as f1:
f1.writelines(lines)
Using less memory:
with open("in.txt") as f:
with open("out.txt", "w") as f1:
for line in f:
if "ROW" in line:
f1.write(line)
readlines() reads the entire input file into a list and is not a good performer. Just iterate through the lines in the file. I used 'with' on output.txt so that it is automatically closed when done. That's not needed on 'list1.txt' because it will be closed when the for loop ends.
#!/usr/bin/env python
with open('output.txt', 'a') as f1:
for line in open('list1.txt'):
if 'tests/file/myword' in line:
f1.write(line)
Just a slightly cleaned up way of doing this. This is no more or less performant than ATOzTOA's answer, but there's no reason to do two separate with statements.
with open(path_1, 'a') as file_1, open(path_2, 'r') as file_2:
for line in file_2:
if 'tests/file/myword' in line:
file_1.write(line)
Safe and memory-saving:
with open("out1.txt", "w") as fw, open("in.txt","r") as fr:
fw.writelines(l for l in fr if "tests/file/myword" in l)
It doesn't create temporary lists (what readline and [] would do, which is a non-starter if the file is huge), all is done with generator comprehensions, and using with blocks ensure that the files are closed on exit.
f=open('list1.txt')
f1=open('output.txt','a')
for x in f.readlines():
f1.write(x)
f.close()
f1.close()
this will work 100% try this once
in Python 3.10 with parenthesized context managers, you can use multiple context managers in one with block:
with (open('list1.txt', 'w') as fout, open('output.txt') as fin):
fout.write(fin.read())
f = open('list1.txt')
f1 = open('output.txt', 'a')
# doIHaveToCopyTheLine=False
for line in f.readlines():
if 'tests/file/myword' in line:
f1.write(line)
f1.close()
f.close()
Now Your code will work. Try This one.

Files are not merging : python

I want to merge the contents of two files into one new output file .
I have read other threads about merging file contents and I tried several options but I only get one file in my output. Here's one of the codes that I tried and I can't see anything wrong with it.
I only get one file in my output and even if I switch position of file1 and file2 in the list, i still only get only file1 in my output.
Here is my Code :
filenames = ['file1','file2']
with open('output.data', 'w') as outfile:
for fname in filenames:
with open(fname) as infile:
outfile.write(infile.read())
How can i do this ?
My whole code that leads to merging to these two files
source1 = open('A','r')
output = open('file1','w')
output.write(',yes.\n'.join(','.join(line) for line in source1.read().split('\n')))
source1 = open('B', 'r')
output = open('file2','w')
output.write(',no.\n'.join(','.join(line) for line in source2.read().split('\n')))
filenames = ['file1','file2']
with open('output.data', 'w') as outfile:
for fname in filenames:
with open(fname) as infile:
outfile.write(infile.read())
After the edit it's clear where your mistake is. You need to close (or flush) the file after writing, before it can be read by the same code.
source1 = open('A','r')
output = open('file1','w')
output.write(',yes.\n'.join(','.join(line) for line in source1.read().split('\n')))
output.close()
source2 = open('B', 'r')
output = open('file2','w')
output.write(',no.\n'.join(','.join(line) for line in source2.read().split('\n')))
output.close()
filenames = ['file1','file2']
with open('output.data', 'w') as outfile:
for fname in filenames:
with open(fname) as infile:
outfile.write(infile.read())
The reason why the first file is available is because you remove the reference to the file descriptor of file1 by reassigning the variable output to hold the file descriptor for file2, and it will be closed automatically by Python.
As #gnibbler suggested, it's best to use with statements to avoid this kind of problem in the future. You should enclose the source1, source2, and output in a with statement, as you did for the last part.
The first file is being closed/flushed when you rebind output to a new file. This is the behaviour of CPython, but it's not good to rely on it
Use context managers to make sure that the files are flushed (and closed) properly before you try to read from them
with open('A','r') as source1, open('file1','w') as output:
output.write(',yes.\n'.join(','.join(line) for line in source1.read().split('\n')))
with open('B','r') as source2, open('file2','w') as output:
output.write(',no.\n'.join(','.join(line) for line in source2.read().split('\n')))
filenames = ['file1','file2']
with open('output.data', 'w') as outfile:
for fname in filenames:
with open(fname) as infile:
print("Reading from: " + fname)
data = infile.read()
print(len(data))
outfile.write(data)
There is a fair bit of duplication in the first two blocks. Maybe you can use a function there.
You can just combine your read and writes into one with statement (if you don't really need the intermediary files); this will also solve your closing problem:
with open('A') as a, open('B') as b, open('out.txt','w') as out:
for line in a:
out.write(',yes.\n'.join(','.join(line)))
for line in b:
out.write(',no.\n'.join(','.join(line)))

make list from text file and compare the lists

The full.txt contains:
www.example.com/a.jpg
www.example.com/b.jpg
www.example.com/k.jpg
www.example.com/n.jpg
www.example.com/x.jpg
The partial.txt contains:
a.jpg
k.jpg
Why the following code does not provide the desired result?
with open ('full.txt', 'r') as infile:
lines_full=[line for line in infile]
with open ('partial.txt', 'r') as infile:
lines_partial=[line for line in infile]
with open ('remaining.txt', 'w') as outfile:
for element in lines_full:
if element[16:21] not in lines_partial: #element[16:21] means like a.jpg
outfile.write (element)
The desired remaining.txt should have those elements of full.txt that are not in partial.txt exactly as follows:
www.example.com/b.jpg
www.example.com/n.jpg
www.example.com/x.jpg
you can use os.path library:
from os import path
with open ('full.txt', 'r') as f:
lines_full = f.read().splitlines()
with open ('partial.txt', 'r') as f:
lines_partial = set(f.read().splitlines()) # create set for faster checking
lines_new = [x + '\n' for x in lines_full if path.split(x)[1] not in lines_partial]
with open('remaining.txt', 'w') as f:
f.writelines(lines_new)
This code will include the newline character at the end of each line, which means it will never match "a.jpg" or "k.jpg" precisely.
with open ('partial.txt', 'r') as infile:
lines_partial=[line for line in infile]
Change it to
with open ('partial.txt', 'r') as infile:
lines_partial=[line[:-1] for line in infile]
to get rid of the newline characters (line[:-1] means "without the last character of the line")

Insert string at the beginning of each line

How can I insert a string at the beginning of each line in a text file, I have the following code:
f = open('./ampo.txt', 'r+')
with open('./ampo.txt') as infile:
for line in infile:
f.insert(0, 'EDF ')
f.close
I get the following error:
'file' object has no attribute 'insert'
Python comes with batteries included:
import fileinput
import sys
for line in fileinput.input(['./ampo.txt'], inplace=True):
sys.stdout.write('EDF {l}'.format(l=line))
Unlike the solutions already posted, this also preserves file permissions.
You can't modify a file inplace like that. Files do not support insertion. You have to read it all in and then write it all out again.
You can do this line by line if you wish. But in that case you need to write to a temporary file and then replace the original. So, for small enough files, it is just simpler to do it in one go like this:
with open('./ampo.txt', 'r') as f:
lines = f.readlines()
lines = ['EDF '+line for line in lines]
with open('./ampo.txt', 'w') as f:
f.writelines(lines)
Here's a solution where you write to a temporary file and move it into place. You might prefer this version if the file you are rewriting is very large, since it avoids keeping the contents of the file in memory, as versions that involve .read() or .readlines() will. In addition, if there is any error in reading or writing, your original file will be safe:
from shutil import move
from tempfile import NamedTemporaryFile
filename = './ampo.txt'
tmp = NamedTemporaryFile(delete=False)
with open(filename) as finput:
with open(tmp.name, 'w') as ftmp:
for line in finput:
ftmp.write('EDF '+line)
move(tmp.name, filename)
For a file not too big:
with open('./ampo.txt', 'rb+') as f:
x = f.read()
f.seek(0,0)
f.writelines(('EDF ', x.replace('\n','\nEDF ')))
f.truncate()
Note that , IN THEORY, in THIS case (the content is augmented), the f.truncate() may be not really necessary. Because the with statement is supposed to close the file correctly, that is to say, writing an EOF (end of file ) at the end before closing.
That's what I observed on examples.
But I am prudent: I think it's better to put this instruction anyway. For when the content diminishes, the with statement doesn't write an EOF to close correctly the file less far than the preceding initial EOF, hence trailing initial characters remains in the file.
So if the with statement doens't write EOF when the content diminishes, why would it write it when the content augments ?
For a big file, to avoid to put all the content of the file in RAM at once:
import os
def addsomething(filepath, ss):
if filepath.rfind('.') > filepath.rfind(os.sep):
a,_,c = filepath.rpartition('.')
tempi = a + 'temp.' + c
else:
tempi = filepath + 'temp'
with open(filepath, 'rb') as f, open(tempi,'wb') as g:
g.writelines(ss + line for line in f)
os.remove(filepath)
os.rename(tempi,filepath)
addsomething('./ampo.txt','WZE')
f = open('./ampo.txt', 'r')
lines = map(lambda l : 'EDF ' + l, f.readlines())
f.close()
f = open('./ampo.txt', 'w')
map(lambda l : f.write(l), lines)
f.close()

Categories

Resources