Files are not merging : python - python

I want to merge the contents of two files into one new output file .
I have read other threads about merging file contents and I tried several options but I only get one file in my output. Here's one of the codes that I tried and I can't see anything wrong with it.
I only get one file in my output and even if I switch position of file1 and file2 in the list, i still only get only file1 in my output.
Here is my Code :
filenames = ['file1','file2']
with open('output.data', 'w') as outfile:
for fname in filenames:
with open(fname) as infile:
outfile.write(infile.read())
How can i do this ?
My whole code that leads to merging to these two files
source1 = open('A','r')
output = open('file1','w')
output.write(',yes.\n'.join(','.join(line) for line in source1.read().split('\n')))
source1 = open('B', 'r')
output = open('file2','w')
output.write(',no.\n'.join(','.join(line) for line in source2.read().split('\n')))
filenames = ['file1','file2']
with open('output.data', 'w') as outfile:
for fname in filenames:
with open(fname) as infile:
outfile.write(infile.read())

After the edit it's clear where your mistake is. You need to close (or flush) the file after writing, before it can be read by the same code.
source1 = open('A','r')
output = open('file1','w')
output.write(',yes.\n'.join(','.join(line) for line in source1.read().split('\n')))
output.close()
source2 = open('B', 'r')
output = open('file2','w')
output.write(',no.\n'.join(','.join(line) for line in source2.read().split('\n')))
output.close()
filenames = ['file1','file2']
with open('output.data', 'w') as outfile:
for fname in filenames:
with open(fname) as infile:
outfile.write(infile.read())
The reason why the first file is available is because you remove the reference to the file descriptor of file1 by reassigning the variable output to hold the file descriptor for file2, and it will be closed automatically by Python.
As #gnibbler suggested, it's best to use with statements to avoid this kind of problem in the future. You should enclose the source1, source2, and output in a with statement, as you did for the last part.

The first file is being closed/flushed when you rebind output to a new file. This is the behaviour of CPython, but it's not good to rely on it
Use context managers to make sure that the files are flushed (and closed) properly before you try to read from them
with open('A','r') as source1, open('file1','w') as output:
output.write(',yes.\n'.join(','.join(line) for line in source1.read().split('\n')))
with open('B','r') as source2, open('file2','w') as output:
output.write(',no.\n'.join(','.join(line) for line in source2.read().split('\n')))
filenames = ['file1','file2']
with open('output.data', 'w') as outfile:
for fname in filenames:
with open(fname) as infile:
print("Reading from: " + fname)
data = infile.read()
print(len(data))
outfile.write(data)
There is a fair bit of duplication in the first two blocks. Maybe you can use a function there.

You can just combine your read and writes into one with statement (if you don't really need the intermediary files); this will also solve your closing problem:
with open('A') as a, open('B') as b, open('out.txt','w') as out:
for line in a:
out.write(',yes.\n'.join(','.join(line)))
for line in b:
out.write(',no.\n'.join(','.join(line)))

Related

how to open multiple file from single file which is having list of files in python and how to do processing on them?

i am having single file called bar.txt which is having list of files as follows,
bar.txt -
1.txt
2.txt
3.txt
each file inside bar.txt having some similar contents.
1.txt -
spec = sadasdsad
2.txt -
spec = dddddd
3.txt -
spec = ppppppppp
how can i open all files inside bar.txt and extract the data from all file and store in other file called foo.txt ?
in foo.txt i want extracted data mentioned below,
foo.txt -
spec = sadasdsad
spec = dddddd
spec = ppppppppp
outfile = open('bar.txt', "rw")
outfile_1 = open('foo.txt', "w")
for f in outfile:
f=f.rstrip()
lines = open(f,'rw')
lines = re.findall(".*SPEC.*\\n",lines)
outfile_1.write(lines)
outfile.close()
You're code is almost correct. I guess you messed everything up by using f variable for almost everything. So you assign multiple different things to one f variable. First its single line on outfile, then it's the same line striped, then it's another opened file, and finally you try to use the same f variable outside of it's scope (outside for loop). Try using different variables for all theses beings.
Also make sure you have correct indents (e.g for loop indent is incorrect in your example), and not that regex findall works on string, not file-like object, so second argument of findall should be contentfile.read().
infile = open('bar.txt', "r")
outfile = open('foo.txt', "w")
for f in infile:
name=f.rstrip()
contentfile = open(name,'rw')
#all_matches= re.findall(<define your real pattern here>,contentfile.read())
result = 0 #do something with your all_matches
outfile.write(result)
contentfile.close()
outfile.close()
infile.close()
I'd do this:
infile = open('bar.txt', "r")
outfile = open('foo.txt', "w")
line = infile.readline()
while line:
f = line.rstrip()
contents_file = open(f,'rw')
contents = contents_file.read()
outfile.write(contents)
f = infile.readline()
outfile.close()

python file concatenation and combining files

My main problem is this:
I have a set of files, and I am concatenating them this way in python:
sys.stdout=open("out.dat","w")
filenames = ['bla.txt', 'bla.txt', 'bla.txt']
with open('out.dat', 'w') as outfile:
for fname in filenames:
with open(fname) as infile:
outfile.write(infile.read())
with open('out.dat') as f:
print "".join(line.strip() for line in f)
sys.stdout.close()
The bla.txt file looks like
aaa
and the intention is to make it look like
aaaaaaaaa
(3 times the same string, not on a new line each time...)
for some reason what I do produces an output that looks like
aaaaaa
a
I am not sure why this is happening and if there is a simpler/more elegant solution.
More second problem is that eventually, my plan is to have a number of different files (letter triplets for example) that I could concatenate in all possible combinations: aaabbbccc,aaacccbbb, ...,etc
Any guidance appreciated! Thank you!
There are some confusing things about your code, I'll leave some comments on the respective places:
# Not sure what is reason for this
sys.stdout=open("out.dat","w")
filenames = ['bla.txt', 'bla.txt', 'bla.txt']
# This does what you need
with open('out.dat', 'w') as outfile:
for fname in filenames:
with open(fname) as infile:
outfile.write(infile.read())
# Here, you open `out.dat` and rewrites it content back into it -
# because you made `sys.stdout = open("out.dat", "w")` above.
# All these lines could be removed (along with `sys.stdout` assignment above)
with open('out.dat') as f:
print "".join(line.strip() for line in f)
sys.stdout.close()
The most minimalistic approach I could think of:
# Open output
with open('out.dat', 'w') as outfile:
# Iterate over each input
for infilename in ['bla.txt'] * 3:
# Open each input and write it to output
with open(infilename) as infile:
outfile.write(infile.read())
As for your error, it should not be happening, could you confirm that the content of bla.txt is exactly aaa?
Nihey Takizawa post almost answers why you've got this error. First, let's see what is going on on each step of the program execution.
sys.stdout=open("out.dat","w")
This is pretty important. Because you replace stdout with file handler to "out.dat", every internal function or statement that use it will write to "out.dat" from now on.
with open('out.dat', 'w') as outfile:
for fname in filenames:
with open(fname) as infile:
outfile.write(infile.read())
After this block, content of the file "out.dat" is:
aaa
aaa
aaa
...or in other words: aaa\naaa\naaa\n where \n is single character standing for newline. Number of chars: 12 (9 times a and 3 times newline \n).
with open('out.dat') as f:
print "".join(line.strip() for line in f)
Here is important thing. Remember, that because in step 1 you've changed sys.stdout to "out.dat" internal function print writes output to "out.dat".
You strip each line and join them, so you write "aaaaaaaaa" to "out.dat".
1 2 3 4 5 6 7 8 9 10 11 12
a a a \n a a a \n a a a \n # this is content of the file before print
a a a a a a a a a \n # that you write, 9 a chars + \n
# which is added by print function by default
Note, that you've replaced 10 out of 12 characters and close the file, so 11 and 12 chars would remain the same. Result is your output.
Solution? NEVER mess with things like by changing sys.stdout file handler unless you know what you're doing.
EDIT: How to fix your code.
I thought that Nihey Takizawa nicely explained how to fix your code, but it's actually not completely correct as I see. Here's solution:
filenames = ['bla.txt', 'bla.txt', 'bla.txt']
with open('out.dat', 'w') as outfile:
for fname in filenames:
with open(fname) as infile:
outfile.write(infile.read().strip())
Now your out.dat file contains aaaaaaaaa string only without newlines.

How do I concatenate multiple CSV files row-wise using python?

I have a dataset of about 10 CSV files. I want to combine those files row-wise into a single CSV file.
What I tried:
import csv
fout = open("claaassA.csv","a")
# first file:
writer = csv.writer(fout)
for line in open("a01.ihr.60.ann.csv"):
print line
writer.writerow(line)
# now the rest:
for num in range(2, 10):
print num
f = open("a0"+str(num)+".ihr.60.ann.csv")
#f.next() # skip the header
for line in f:
print line
writer.writerow(line)
#f.close() # not really needed
fout.close()
Definitively need more details in the question (ideally examples of the inputs and expected output).
Given the little information provided, I will assume that you know that all files are valid CSV and they all have the same number or lines (rows). I'll also assume that memory is not a concern (i.e. they are "small" files that fit together in memory). Furthermore, I assume that line endings are new line (\n).
If all these assumptions are valid, then you can do something like this:
input_files = ['file1.csv', 'file2.csv', 'file3.csv']
output_file = 'output.csv'
output = None
for infile in input_files:
with open(infile, 'r') as fh:
if output:
for i, l in enumerate(fh.readlines()):
output[i] = "{},{}".format(output[i].rstrip('\n'), l)
else:
output = fh.readlines()
with open(output_file, 'w') as fh:
for line in output:
fh.write(line)
There are probably more efficient ways, but this is a quick and dirty way to achieve what I think you are asking for.
The previous answer implicitly assumes we need to do this in python. If bash is an option then you could use the paste command. For example:
paste -d, file1.csv file2.csv file3.csv > output.csv
I don't understand fully why you use the library csv. Actually, it's enough to fill the output file with the lines from given files (it they have the same columns' manes and orders).
input_path_list = [
"a01.ihr.60.ann.csv",
"a02.ihr.60.ann.csv",
"a03.ihr.60.ann.csv",
"a04.ihr.60.ann.csv",
"a05.ihr.60.ann.csv",
"a06.ihr.60.ann.csv",
"a07.ihr.60.ann.csv",
"a08.ihr.60.ann.csv",
"a09.ihr.60.ann.csv",
]
output_path = "claaassA.csv"
with open(output_path, "w") as fout:
header_written = False
for intput_path in input_path_list:
with open(intput_path) as fin:
header = fin.next()
# it adds the header at the beginning and skips other headers
if not header_written:
fout.write(header)
header_written = True
# it adds all rows
for line in fin:
fout.write(line)

"Move" some parts of the file to another file

Let say I have a file with 48,222 lines. I then give an index value, let say, 21,000.
Is there any way in Python to "move" the contents of the file starting from index 21,000 such that now I have two files: the original one and the new one. But the original one now is having 21,000 lines and the new one 27,222 lines.
I read this post which uses partition and is quite describing what I want:
with open("inputfile") as f:
contents1, sentinel, contents2 = f.read().partition("Sentinel text\n")
with open("outputfile1", "w") as f:
f.write(contents1)
with open("outputfile2", "w") as f:
f.write(contents2)
Except that (1) it uses "Sentinel Text" as separator, (2) it creates two new files and require me to delete the old file. As of now, the way I do it is like this:
for r in result.keys(): #the filenames are in my dictionary, don't bother that
f = open(r)
lines = f.readlines()
f.close()
with open("outputfile1.txt", "w") as fn:
for line in lines[0:21000]:
#write each line
with open("outputfile2.txt", "w") as fn:
for line in lines[21000:]:
#write each line
Which is quite a manual work. Is there a built-in or more efficient way?
You can also use writelines() and dump the sliced list of lines from 0 to 20999 into one file and another sliced list from 21000 to the end into another file.
with open("inputfile") as f:
content = f.readlines()
content1 = content[:21000]
content2 = content[21000:]
with open("outputfile1.txt", "w") as fn1:
fn1.writelines(content1)
with open('outputfile2.txt','w') as fn2:
fn2.writelines(content2)

Insert string at the beginning of each line

How can I insert a string at the beginning of each line in a text file, I have the following code:
f = open('./ampo.txt', 'r+')
with open('./ampo.txt') as infile:
for line in infile:
f.insert(0, 'EDF ')
f.close
I get the following error:
'file' object has no attribute 'insert'
Python comes with batteries included:
import fileinput
import sys
for line in fileinput.input(['./ampo.txt'], inplace=True):
sys.stdout.write('EDF {l}'.format(l=line))
Unlike the solutions already posted, this also preserves file permissions.
You can't modify a file inplace like that. Files do not support insertion. You have to read it all in and then write it all out again.
You can do this line by line if you wish. But in that case you need to write to a temporary file and then replace the original. So, for small enough files, it is just simpler to do it in one go like this:
with open('./ampo.txt', 'r') as f:
lines = f.readlines()
lines = ['EDF '+line for line in lines]
with open('./ampo.txt', 'w') as f:
f.writelines(lines)
Here's a solution where you write to a temporary file and move it into place. You might prefer this version if the file you are rewriting is very large, since it avoids keeping the contents of the file in memory, as versions that involve .read() or .readlines() will. In addition, if there is any error in reading or writing, your original file will be safe:
from shutil import move
from tempfile import NamedTemporaryFile
filename = './ampo.txt'
tmp = NamedTemporaryFile(delete=False)
with open(filename) as finput:
with open(tmp.name, 'w') as ftmp:
for line in finput:
ftmp.write('EDF '+line)
move(tmp.name, filename)
For a file not too big:
with open('./ampo.txt', 'rb+') as f:
x = f.read()
f.seek(0,0)
f.writelines(('EDF ', x.replace('\n','\nEDF ')))
f.truncate()
Note that , IN THEORY, in THIS case (the content is augmented), the f.truncate() may be not really necessary. Because the with statement is supposed to close the file correctly, that is to say, writing an EOF (end of file ) at the end before closing.
That's what I observed on examples.
But I am prudent: I think it's better to put this instruction anyway. For when the content diminishes, the with statement doesn't write an EOF to close correctly the file less far than the preceding initial EOF, hence trailing initial characters remains in the file.
So if the with statement doens't write EOF when the content diminishes, why would it write it when the content augments ?
For a big file, to avoid to put all the content of the file in RAM at once:
import os
def addsomething(filepath, ss):
if filepath.rfind('.') > filepath.rfind(os.sep):
a,_,c = filepath.rpartition('.')
tempi = a + 'temp.' + c
else:
tempi = filepath + 'temp'
with open(filepath, 'rb') as f, open(tempi,'wb') as g:
g.writelines(ss + line for line in f)
os.remove(filepath)
os.rename(tempi,filepath)
addsomething('./ampo.txt','WZE')
f = open('./ampo.txt', 'r')
lines = map(lambda l : 'EDF ' + l, f.readlines())
f.close()
f = open('./ampo.txt', 'w')
map(lambda l : f.write(l), lines)
f.close()

Categories

Resources