How to combine and append python 200+ python files into one [duplicate] - python

This question already has answers here:
How do I concatenate text files in Python?
(12 answers)
Closed 5 years ago.
suppose we have many text files as follows:
file1:
abc
def
ghi
file2:
ABC
DEF
GHI
file3:
adfafa
file4:
ewrtwe
rewrt
wer
wrwe
How can we make one text file like below:
result:
abc
def
ghi
ABC
DEF
GHI
adfafa
ewrtwe
rewrt
wer
wrwe
Related code may be:
import csv
import glob
files = glob.glob('*.txt')
for file in files:
with open('result.txt', 'w') as result:
result.write(str(file)+'\n')
After this? Any help?

You can read the content of each file directly into the write method of the output file handle like this:
import glob
read_files = glob.glob("*.txt")
with open("result.txt", "wb") as outfile:
for f in read_files:
with open(f, "rb") as infile:
outfile.write(infile.read())

The fileinput module is designed perfectly for this use case.
import fileinput
import glob
file_list = glob.glob("*.txt")
with open('result.txt', 'w') as file:
input_lines = fileinput.input(file_list)
file.writelines(input_lines)

You could try something like this:
import glob
files = glob.glob( '*.txt' )
with open( 'result.txt', 'w' ) as result:
for file_ in files:
for line in open( file_, 'r' ):
result.write( line )
Should be straight forward to read.

It is also possible to combine files by incorporating OS commands. Example:
import os
import subprocess
subprocess.call("cat *.csv > /path/outputs.csv")

filenames = ['resultsone.txt', 'resultstwo.txt']
with open('resultsthree', 'w') as outfile:
for fname in filenames:
with open(fname) as infile:
for line in infile:
outfile.write(line)

Related

concatenating files in python

I have files in a directory and i want to concatenate these files vertically to make a single file.
input
file1.txt file2.txt
1 8
2 8
3 9
i need output
1
2
3
8
8
9
My script is
import glob
import numpy as np
for files in glob.glob(*.txt):
print(files)
np.concatenate([files])
but it doesnot concatenate vertically instead it produces last file of for loop.Can anybody help.Thanks.
There's a few things wrong with your code,
Numpy appears a bit overkill for such a mundane task in my opinion. You can use a much simpler approach, like for instance:
import glob
result = ""
for file_name in glob.glob("*.txt"):
with open(file_name, "r") as f:
for line in f.readlines():
result += line
print(result)
In order to save the result in a .txt-file, you could do something like:
with open("result.txt", "w") as f:
f.write(result)
This should work.
import glob
for files in glob.glob('*.txt'):
fileopen = open(r"" + files, "r+")
file_contents = fileopen.read()
output = open("output.txt", "a")
output.write(file_contents)
output.close()

Counting the number of lines of text in all files within a folder and subfolders using Python

I have a huge folder with subfolders and multiple .sql files within those subfolders. I want to get the number of lines of code within every .sql file. This is what I've tried:
import os
import glob
os.chdir("path of folder")
names=[]
for fn in glob.glob("*.sql"):
with open(fn) as f:
names[fn]=sum(1 for line in f if line.strip() and not line.startswith('#'))
print(names)
But the output I get is [ ]. Could you guys help me with where I'm going wrong?
I know how to count the number of lines of code within a single file using "num_lines". I can't do that manually for each file and need to quicken the process.
The following version of you code works for files in the target directory, but not sub-folders:
import os
import glob
os.chdir("foo")
names = {}
for fn in glob.glob("*.sql"):
with open(fn) as f:
names[fn] = sum(1 for line in f if line.strip() and not line.startswith('#'))
print(names)
A version with the newer pathlib works recursively too:
#!/usr/bin/env python3
from pathlib import Path
target = Path("foo")
names = {}
for file in target.glob("**/*.sql"):
with file.open("rt") as f:
names[f.name] = sum(
1 for line in f
if line.strip() and not line.startswith('#')
)
print(names)
try this:
sql_folder_path = "full/path/to/sql/folder"
sql_files = [join(sql_folder_path, f) for f in listdir(sql_folder_path) if isfile(join(sql_folder_path, f)) and f.endswith(".sql")]
files_stats = {}
for file in sql_files:
with open(file) as f:
files_stats[file]=sum(1 for line in f if line.strip() and not line.startswith('#'))
print(files_stats)

Replacing commas with dots and save the change, doesn't work good with me?

I have 10 files, each one of them has 2 columns with 1000000 rows. I'm trying to replace all comma's in my files with dots. I used the following script
import glob
import os, os.path
list =[]
for filename in glob.glob("inputfile/*"):
with open(filename, 'r') as searchfile:
for line in searchfile:
if ',' in line:
replace=line.replace(",", ".")
list.append(replace)
f = open(filename, 'w')
for item in list:
f.write(item)
It's working, but the resulted files have 2 columns and just 365 rows, which means that I lost 999635 rows of my data.
can you help me please??
Edit:
sample of my data
-0,0222950 0,1429029
-0,0216510 0,1419368
-0,0226171 0,1406487
-0,0222950 0,1393607
This is one approach. Write to a temp file and after processing rename the temp file to original file and delete old file
Ex:
import glob
import os, os.path
base_path = "inputfile/"
for filename in glob.glob("{}\*".format(base_path)):
path, file_name = os.path.split(filename)
with open(filename, 'r') as searchfile, open(os.path.join(path, "temp_{}".format(file_name)), 'w') as searchfile_out:
for line in searchfile:
if ',' in line:
line = line.replace(",", ".")
searchfile_out.write(line) #Write to temp file
os.rename(filename, os.path.join(path, "OLD_{}".format(file_name))) #Rename old file
os.rename(os.path.join(path, "temp_{}".format(file_name)), filename) #Rename temp file to original file

How to loop through all CSV files, open each, and perform some operations on each?

I'm trying to loop through all CSV files in a folder, open each, do some find/replace things, then save and close each CSV. Here is my code, which should be close, I think, but apparently something is off because it's not working.
import glob
path = "C:\\Users\\ryans\\OneDrive\\Desktop\\downloads\\Products\\*.csv"
for fname in glob.glob(path):
print(str(fname))
with open(str(fname)) as f:
newText = f.read().replace('|', ',').replace(' ', '')
with open(str(fname), "w") as f:
f.write(newText)
What is wrong here?
you should finish the operation and close the file in your for loop.
please also note that it is more elegant to use raw strings for a path rather than escaping each backslash
import glob
path = r"C:\Users\ryans\OneDrive\Desktop\downloads\Products\*.csv"
for fname in glob.glob(path):
print(str(fname))
with open(str(fname), "w") as f:
newText = f.read().replace('|', ',').replace(' ', '')
f.write(newText)
import glob
path = "path/to/dir/*.csv"
for fname in glob.glob(path):
print(fname)
with open((fname), "w") as f:
newText = f.read().replace('|', ',').replace(' ', '')
f.write(newText)
f.close()
use Pandas Library to read the csv file and replace the value with the intended one.
df['range'] = df['range'].str.replace(',','-')
range is the column name.
and save it by following
df.to_csv(file_name, sep=',')
or without using a ibrary
with open(resource,'rb') as f, open("output.txt", "a+") as outputfile:
for line in f:
line = line.replace(' ', '-')
outputfile.write(line)

Join the content of files to one file

I have two files and I want to join the content of them into one file side-by-side, i.e., line n of the output file should consist of line n of file 1 and line n of file 2. The files have the same number of lines.
What I have until now:
with open('test1.txt', 'r') as f1, open('test2.txt', 'r') as f2:
with open('joinfile.txt', 'w') as fout:
fout.write(f1+f2)
but it gives an error saying -
TypeError: unsupported operand type(s) for +: 'file' and 'file'
What am I doing wrong?
I'd try itertools.chain() and work line per line (you use "r" to open your files, so I assume you do not red binary files:
from itertools import chain
with open('test1.txt', 'r') as f1, open('test2.txt', 'r') as f2:
with open('joinfile.txt', 'w') as fout:
for line in chain(f1, f2):
fout.write(line)
It works as generator, so no memory problems are likely, even for huge files.
Edit
New reuqirements, new sample:
from itertools import izip_longest
separator = " "
with open('test1.txt', 'r') as f1, open('test2.txt', 'r') as f2:
with open('joinfile.txt', 'w') as fout:
for line1, line2 in izip_longest(f1, f2, fillvalue=""):
line1 = line1.rstrip("\n")
fout.write(line1 + separator + line2)
I added a separator string which is put between the lines.
izip_longest also works if one file has more lines than the other. The fill_value "" is then used for the missing line. izip_longestalso works as generator.
Important is also the line line1 = line1.rstrip("\n"), I guess it's obvious what it does.
You can do it with:
fout.write(f1.read())
fout.write(f2.read())
You are actualy concatenating 2 file objects, however, you want to conctenate strings.
Read the file contents first with f.read. For example, this way:
with open('test1.txt', 'r') as f1, open('test2.txt', 'r') as f2:
with open('joinfile.txt', 'w') as fout:
fout.write(f1.read()+f2.read())
I would prefer to use shutil.copyfileobj. You can easily combine it with glob.glob to concatenate a bunch of files by patterns
>>> import shutil
>>> infiles = ["test1.txt", "test2.txt"]
>>> with open("test.out","wb") as fout:
for fname in infiles:
with open(fname, "rb") as fin:
shutil.copyfileobj(fin, fout)
Combining with glob.glob
>>> import glob
>>> with open("test.out","wb") as fout:
for fname in glob.glob("test*.txt"):
with open(fname, "rb") as fin:
shutil.copyfileobj(fin, fout)
But over and above that if you are in a system where you can use posix utilities, prefer its use
D:\temp>cat test1.txt test2.txt > test.out
In case you are using windows, you can issue the following from command prompt.
D:\temp>copy/Y test1.txt+test2.txt test.out
test1.txt
test2.txt
1 file(s) copied.
Note
Based on your latest update
Yes it has the same number of lines and I want to join every line of
one file with the other file
with open("test.out","wb") as fout:
fout.writelines('\n'.join(''.join(map(str.strip, e))
for e in zip(*(open(fname) for fname in infiles))))
And on posix system, you can do
paste test1.txt test2.txt

Categories

Resources