How to combine and append python 200+ python files into one [duplicate]

How to combine and append python 200+ python files into one [duplicate] - python

This question already has answers here:
How do I concatenate text files in Python?
(12 answers)
Closed 5 years ago.
suppose we have many text files as follows:
file1:
abc
def
ghi
file2:
ABC
DEF
GHI
file3:
adfafa
file4:
ewrtwe
rewrt
wer
wrwe
How can we make one text file like below:
result:
abc
def
ghi
ABC
DEF
GHI
adfafa
ewrtwe
rewrt
wer
wrwe
Related code may be:
import csv
import glob
files = glob.glob('*.txt')
for file in files:
with open('result.txt', 'w') as result:
result.write(str(file)+'\n')
After this? Any help?

You can read the content of each file directly into the write method of the output file handle like this:
import glob
read_files = glob.glob("*.txt")
with open("result.txt", "wb") as outfile:
for f in read_files:
with open(f, "rb") as infile:
outfile.write(infile.read())

The fileinput module is designed perfectly for this use case.
import fileinput
import glob
file_list = glob.glob("*.txt")
with open('result.txt', 'w') as file:
input_lines = fileinput.input(file_list)
file.writelines(input_lines)

You could try something like this:
import glob
files = glob.glob( '*.txt' )
with open( 'result.txt', 'w' ) as result:
for file_ in files:
for line in open( file_, 'r' ):
result.write( line )
Should be straight forward to read.

It is also possible to combine files by incorporating OS commands. Example:
import os
import subprocess
subprocess.call("cat *.csv > /path/outputs.csv")

filenames = ['resultsone.txt', 'resultstwo.txt']
with open('resultsthree', 'w') as outfile:
for fname in filenames:
with open(fname) as infile:
for line in infile:
outfile.write(line)

Related

concatenating files in python

I have files in a directory and i want to concatenate these files vertically to make a single file.
input
file1.txt file2.txt
1 8
2 8
3 9
i need output
1
2
3
8
8
9
My script is
import glob
import numpy as np
for files in glob.glob(*.txt):
print(files)
np.concatenate([files])
but it doesnot concatenate vertically instead it produces last file of for loop.Can anybody help.Thanks.

There's a few things wrong with your code,
Numpy appears a bit overkill for such a mundane task in my opinion. You can use a much simpler approach, like for instance:
import glob
result = ""
for file_name in glob.glob("*.txt"):
with open(file_name, "r") as f:
for line in f.readlines():
result += line
print(result)
In order to save the result in a .txt-file, you could do something like:
with open("result.txt", "w") as f:
f.write(result)

This should work.
import glob
for files in glob.glob('*.txt'):
fileopen = open(r"" + files, "r+")
file_contents = fileopen.read()
output = open("output.txt", "a")
output.write(file_contents)
output.close()

Counting the number of lines of text in all files within a folder and subfolders using Python

I have a huge folder with subfolders and multiple .sql files within those subfolders. I want to get the number of lines of code within every .sql file. This is what I've tried:
import os
import glob
os.chdir("path of folder")
names=[]
for fn in glob.glob("*.sql"):
with open(fn) as f:
names[fn]=sum(1 for line in f if line.strip() and not line.startswith('#'))
print(names)
But the output I get is [ ]. Could you guys help me with where I'm going wrong?
I know how to count the number of lines of code within a single file using "num_lines". I can't do that manually for each file and need to quicken the process.

The following version of you code works for files in the target directory, but not sub-folders:
import os
import glob
os.chdir("foo")
names = {}
for fn in glob.glob("*.sql"):
with open(fn) as f:
names[fn] = sum(1 for line in f if line.strip() and not line.startswith('#'))
print(names)
A version with the newer pathlib works recursively too:
#!/usr/bin/env python3
from pathlib import Path
target = Path("foo")
names = {}
for file in target.glob("**/*.sql"):
with file.open("rt") as f:
names[f.name] = sum(
1 for line in f
if line.strip() and not line.startswith('#')
)
print(names)

try this:
sql_folder_path = "full/path/to/sql/folder"
sql_files = [join(sql_folder_path, f) for f in listdir(sql_folder_path) if isfile(join(sql_folder_path, f)) and f.endswith(".sql")]
files_stats = {}
for file in sql_files:
with open(file) as f:
files_stats[file]=sum(1 for line in f if line.strip() and not line.startswith('#'))
print(files_stats)

Replacing commas with dots and save the change, doesn't work good with me?

I have 10 files, each one of them has 2 columns with 1000000 rows. I'm trying to replace all comma's in my files with dots. I used the following script
import glob
import os, os.path
list =[]
for filename in glob.glob("inputfile/*"):
with open(filename, 'r') as searchfile:
for line in searchfile:
if ',' in line:
replace=line.replace(",", ".")
list.append(replace)
f = open(filename, 'w')
for item in list:
f.write(item)
It's working, but the resulted files have 2 columns and just 365 rows, which means that I lost 999635 rows of my data.
can you help me please??
Edit:
sample of my data
-0,0222950 0,1429029
-0,0216510 0,1419368
-0,0226171 0,1406487
-0,0222950 0,1393607

This is one approach. Write to a temp file and after processing rename the temp file to original file and delete old file
Ex:
import glob
import os, os.path
base_path = "inputfile/"
for filename in glob.glob("{}\*".format(base_path)):
path, file_name = os.path.split(filename)
with open(filename, 'r') as searchfile, open(os.path.join(path, "temp_{}".format(file_name)), 'w') as searchfile_out:
for line in searchfile:
if ',' in line:
line = line.replace(",", ".")
searchfile_out.write(line) #Write to temp file
os.rename(filename, os.path.join(path, "OLD_{}".format(file_name))) #Rename old file
os.rename(os.path.join(path, "temp_{}".format(file_name)), filename) #Rename temp file to original file

How to loop through all CSV files, open each, and perform some operations on each?

I'm trying to loop through all CSV files in a folder, open each, do some find/replace things, then save and close each CSV. Here is my code, which should be close, I think, but apparently something is off because it's not working.
import glob
path = "C:\\Users\\ryans\\OneDrive\\Desktop\\downloads\\Products\\*.csv"
for fname in glob.glob(path):
print(str(fname))
with open(str(fname)) as f:
newText = f.read().replace('|', ',').replace(' ', '')
with open(str(fname), "w") as f:
f.write(newText)
What is wrong here?

you should finish the operation and close the file in your for loop.
please also note that it is more elegant to use raw strings for a path rather than escaping each backslash
import glob
path = r"C:\Users\ryans\OneDrive\Desktop\downloads\Products\*.csv"
for fname in glob.glob(path):
print(str(fname))
with open(str(fname), "w") as f:
newText = f.read().replace('|', ',').replace(' ', '')
f.write(newText)

import glob
path = "path/to/dir/*.csv"
for fname in glob.glob(path):
print(fname)
with open((fname), "w") as f:
newText = f.read().replace('|', ',').replace(' ', '')
f.write(newText)
f.close()

use Pandas Library to read the csv file and replace the value with the intended one.
df['range'] = df['range'].str.replace(',','-')
range is the column name.
and save it by following
df.to_csv(file_name, sep=',')
or without using a ibrary
with open(resource,'rb') as f, open("output.txt", "a+") as outputfile:
for line in f:
line = line.replace(' ', '-')
outputfile.write(line)

Join the content of files to one file

I have two files and I want to join the content of them into one file side-by-side, i.e., line n of the output file should consist of line n of file 1 and line n of file 2. The files have the same number of lines.
What I have until now:
with open('test1.txt', 'r') as f1, open('test2.txt', 'r') as f2:
with open('joinfile.txt', 'w') as fout:
fout.write(f1+f2)
but it gives an error saying -
TypeError: unsupported operand type(s) for +: 'file' and 'file'
What am I doing wrong?

I'd try itertools.chain() and work line per line (you use "r" to open your files, so I assume you do not red binary files:
from itertools import chain
with open('test1.txt', 'r') as f1, open('test2.txt', 'r') as f2:
with open('joinfile.txt', 'w') as fout:
for line in chain(f1, f2):
fout.write(line)
It works as generator, so no memory problems are likely, even for huge files.
Edit
New reuqirements, new sample:
from itertools import izip_longest
separator = " "
with open('test1.txt', 'r') as f1, open('test2.txt', 'r') as f2:
with open('joinfile.txt', 'w') as fout:
for line1, line2 in izip_longest(f1, f2, fillvalue=""):
line1 = line1.rstrip("\n")
fout.write(line1 + separator + line2)
I added a separator string which is put between the lines.
izip_longest also works if one file has more lines than the other. The fill_value "" is then used for the missing line. izip_longestalso works as generator.
Important is also the line line1 = line1.rstrip("\n"), I guess it's obvious what it does.

You can do it with:
fout.write(f1.read())
fout.write(f2.read())

You are actualy concatenating 2 file objects, however, you want to conctenate strings.
Read the file contents first with f.read. For example, this way:
with open('test1.txt', 'r') as f1, open('test2.txt', 'r') as f2:
with open('joinfile.txt', 'w') as fout:
fout.write(f1.read()+f2.read())

I would prefer to use shutil.copyfileobj. You can easily combine it with glob.glob to concatenate a bunch of files by patterns
>>> import shutil
>>> infiles = ["test1.txt", "test2.txt"]
>>> with open("test.out","wb") as fout:
for fname in infiles:
with open(fname, "rb") as fin:
shutil.copyfileobj(fin, fout)
Combining with glob.glob
>>> import glob
>>> with open("test.out","wb") as fout:
for fname in glob.glob("test*.txt"):
with open(fname, "rb") as fin:
shutil.copyfileobj(fin, fout)
But over and above that if you are in a system where you can use posix utilities, prefer its use
D:\temp>cat test1.txt test2.txt > test.out
In case you are using windows, you can issue the following from command prompt.
D:\temp>copy/Y test1.txt+test2.txt test.out
test1.txt
test2.txt
1 file(s) copied.
Note
Based on your latest update
Yes it has the same number of lines and I want to join every line of
one file with the other file
with open("test.out","wb") as fout:
fout.writelines('\n'.join(''.join(map(str.strip, e))
for e in zip(*(open(fname) for fname in infiles))))
And on posix system, you can do
paste test1.txt test2.txt

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to combine and append python 200+ python files into one [duplicate] - python

You can read the content of each file directly into the write method of the output file handle like this: import glob read_files = glob.glob("*.txt") with open("result.txt", "wb") as outfile: for f in read_files: with open(f, "rb") as infile: outfile.write(infile.read())

The fileinput module is designed perfectly for this use case. import fileinput import glob file_list = glob.glob("*.txt") with open('result.txt', 'w') as file: input_lines = fileinput.input(file_list) file.writelines(input_lines)

You could try something like this: import glob files = glob.glob( '*.txt' ) with open( 'result.txt', 'w' ) as result: for file_ in files: for line in open( file_, 'r' ): result.write( line ) Should be straight forward to read.

It is also possible to combine files by incorporating OS commands. Example: import os import subprocess subprocess.call("cat *.csv > /path/outputs.csv")

filenames = ['resultsone.txt', 'resultstwo.txt'] with open('resultsthree', 'w') as outfile: for fname in filenames: with open(fname) as infile: for line in infile: outfile.write(line)

Related

concatenating files in python

Counting the number of lines of text in all files within a folder and subfolders using Python

Replacing commas with dots and save the change, doesn't work good with me?

How to loop through all CSV files, open each, and perform some operations on each?

Join the content of files to one file

Categories

Resources