concatenating files in python - python

I have files in a directory and i want to concatenate these files vertically to make a single file.
input
file1.txt file2.txt
1 8
2 8
3 9
i need output
1
2
3
8
8
9
My script is
import glob
import numpy as np
for files in glob.glob(*.txt):
print(files)
np.concatenate([files])
but it doesnot concatenate vertically instead it produces last file of for loop.Can anybody help.Thanks.

There's a few things wrong with your code,
Numpy appears a bit overkill for such a mundane task in my opinion. You can use a much simpler approach, like for instance:
import glob
result = ""
for file_name in glob.glob("*.txt"):
with open(file_name, "r") as f:
for line in f.readlines():
result += line
print(result)
In order to save the result in a .txt-file, you could do something like:
with open("result.txt", "w") as f:
f.write(result)

This should work.
import glob
for files in glob.glob('*.txt'):
fileopen = open(r"" + files, "r+")
file_contents = fileopen.read()
output = open("output.txt", "a")
output.write(file_contents)
output.close()

Related

Replacing commas with dots and save the change, doesn't work good with me?

I have 10 files, each one of them has 2 columns with 1000000 rows. I'm trying to replace all comma's in my files with dots. I used the following script
import glob
import os, os.path
list =[]
for filename in glob.glob("inputfile/*"):
with open(filename, 'r') as searchfile:
for line in searchfile:
if ',' in line:
replace=line.replace(",", ".")
list.append(replace)
f = open(filename, 'w')
for item in list:
f.write(item)
It's working, but the resulted files have 2 columns and just 365 rows, which means that I lost 999635 rows of my data.
can you help me please??
Edit:
sample of my data
-0,0222950 0,1429029
-0,0216510 0,1419368
-0,0226171 0,1406487
-0,0222950 0,1393607
This is one approach. Write to a temp file and after processing rename the temp file to original file and delete old file
Ex:
import glob
import os, os.path
base_path = "inputfile/"
for filename in glob.glob("{}\*".format(base_path)):
path, file_name = os.path.split(filename)
with open(filename, 'r') as searchfile, open(os.path.join(path, "temp_{}".format(file_name)), 'w') as searchfile_out:
for line in searchfile:
if ',' in line:
line = line.replace(",", ".")
searchfile_out.write(line) #Write to temp file
os.rename(filename, os.path.join(path, "OLD_{}".format(file_name))) #Rename old file
os.rename(os.path.join(path, "temp_{}".format(file_name)), filename) #Rename temp file to original file

combine multiple text files into one text file?

I'm trying to combine multiple files into one, where each of them contains one column and I need to get one file with two columns and plot the resulted file like (x,y) as follows:
x y result
1 4 1 4
2 5 2 5
3 6 3 6
and run the code for n text files.
How can I do this?
A simple solution assuming that all files have the same number of elements in float format.
import numpy as np
filename_list=['file0.txt', 'file1.txt']#and so on
columns = []
for filename in filename_list:
f=open(filename)
x = np.array([float(raw) for raw in f.readlines()])
columns.append(x)
columns = np.vstack(columns).T
np.savetxt('filename_out.txt', columns)
see also the method savetxt to customize the output
EDIT:
if you have 100 files in a certain directory (let's call it files_dir) you can use the method listdir in the os library, be be careful, since listdir returns directories and files:
import os
filename_list = [f for f in os.listdir(files_dir) if os.path.isfile(f)]
here's a quick-and-dirty solution. I assumed that all files have the exact number of rows. The function write_files gets an input files which is a list of file-paths (strings).
def write_files(files):
opened_files = []
for f in files:
opened_files.append(open(f, "r"))
output_file = open("output.txt", "w")
num_lines = sum(1 for line in opened_files[0])
opened_files[0].seek(0,0)
for i in range(num_lines):
line = [of.readline().rstrip() for of in opened_files]
line = " ".join(line)
line += "\n"
output_file.write(line)
for of in opened_files:
of.close()
output_file.close()
write_files(["1.txt", "2.txt"])

How to combine and append python 200+ python files into one [duplicate]

This question already has answers here:
How do I concatenate text files in Python?
(12 answers)
Closed 5 years ago.
suppose we have many text files as follows:
file1:
abc
def
ghi
file2:
ABC
DEF
GHI
file3:
adfafa
file4:
ewrtwe
rewrt
wer
wrwe
How can we make one text file like below:
result:
abc
def
ghi
ABC
DEF
GHI
adfafa
ewrtwe
rewrt
wer
wrwe
Related code may be:
import csv
import glob
files = glob.glob('*.txt')
for file in files:
with open('result.txt', 'w') as result:
result.write(str(file)+'\n')
After this? Any help?
You can read the content of each file directly into the write method of the output file handle like this:
import glob
read_files = glob.glob("*.txt")
with open("result.txt", "wb") as outfile:
for f in read_files:
with open(f, "rb") as infile:
outfile.write(infile.read())
The fileinput module is designed perfectly for this use case.
import fileinput
import glob
file_list = glob.glob("*.txt")
with open('result.txt', 'w') as file:
input_lines = fileinput.input(file_list)
file.writelines(input_lines)
You could try something like this:
import glob
files = glob.glob( '*.txt' )
with open( 'result.txt', 'w' ) as result:
for file_ in files:
for line in open( file_, 'r' ):
result.write( line )
Should be straight forward to read.
It is also possible to combine files by incorporating OS commands. Example:
import os
import subprocess
subprocess.call("cat *.csv > /path/outputs.csv")
filenames = ['resultsone.txt', 'resultstwo.txt']
with open('resultsthree', 'w') as outfile:
for fname in filenames:
with open(fname) as infile:
for line in infile:
outfile.write(line)

Multiple editing of CSV files

I have a small delay with operating CSV files in python (3.5). Previously I was working with single files and there was no problem, but right now I have >100 files in one folder.
So, my goal is:
To parse all *.csv files in the directory
From each file delete first 6 rows , the files consists of the following data:
"nu(Ep), 2.6.8"
"Date: 2/10/16, 11:18:21 AM"
19
Ep,nu
0.0952645,0.123776,
0.119036,0.157720,
...
0.992060,0.374300,
Save each file separately (for example adding "_edited"), so there should be only numbers saved.
As an option - I have data subdivided on two parts for one material. For example: Ag(0-1_s).csv and Ag(1-4)_s.csv (after steps 1-3 the should be like Ag(*)_edited.csv). How can I merge this two files in a way of adding data from (1-4) to the end of (0-1) saving it in a third file?
My code so far is the following:
import os, sys
import csv
import re
import glob
import fileinput
def get_all_files(directory, extension='.csv'):
dir_list = os.listdir(directory)
csv_files = []
for i in dir_list:
if i.endswith(extension):
csv_files.append(os.path.realpath(i))
return csv_files
csv_files = get_all_files('/Directory/Path/Here')
#Here is the problem with csv's, I dont know how to scan files
#which are in the list "csv_files".
for n in csv_files:
#print(n)
lines = [] #empty, because I dont know how to write it properly per
#each file
input = open(n, 'r')
reader = csv.reader(n)
temp = []
for i in range(5):
next(reader)
#a for loop for here regarding rows?
#for row in n: ???
# ???
input.close()
#newfilename = "".join(n.split(".csv")) + "edited.csv"
#newfilename can be used within open() below:
with open(n + '_edited.csv', 'w') as nf:
writer = csv.writer(nf)
writer.writerows(lines)
This is the fastest way I can think of. If you have a solid-state drive, you could throw multiprocessing at this for more of a performance boost
import glob
import os
for fpath in glob.glob('path/to/directory/*.csv'):
fname = os.basename(fpath).rsplit(os.path.extsep, 1)[0]
with open(fpath) as infile, open(os.path.join('path/to/dir', fname+"_edited"+os.path.extsep+'csv'), 'w') as outfile:
for _ in range(6): infile.readline()
for line in infile: outfile.write(line)

Saving Filenames with Condition

I'm trying to save the names of files that fulfill a certain condition.
I think the easiest way to do this would make a short Python program that imports and reads the files, checks if the condition is met, and (assuming it is met) then saves the names of the files.
I have data files with just two columns and four rows, something like this:
a: 5
b: 5
c: 6
de: 7
I want to save the names of the files (or part of the name of the files, if that's a simple fix, otherwise I can just sed the file afterwards) of the data files that have the 4th number ([3:1]) greater than 8. I tried importing the files with numpy, but it said it couldn't import the letters in the first column.
Another way I was considering trying to do it was from the command line something along the lines of cat *.dat >> something.txtbut I couldn't figure out how to do that.
The code I've tried to write up to get this to work is:
import fileinput
import glob
import numpy as np
#Filter to find value > 8
#Globbing value datafiles
file_list = glob.glob("/path/to/*.dat")
#Creating output file containing
f = open('list.txt', 'w')
#Looping over files
for file in file_list:
#For each file in the directory, isolating the filename
filename = file.split('/')[-1]
#Opening the files, checking if value is greater than 8
a = np.loadtxt("file", delimiter=' ', usecols=1)
if a[3:0] > 8:
print >> f, filename
f.close()
When I do this, I get an error that says TypeError: 'int' object is not iterable, but I don't know what that's referring to.
I ended up using
import fileinput
import glob
import numpy as np
#Filter to find value > 8
#Globbing datafiles
file_list = glob.glob("/path/to/*.dat")
#Creating output file containing
f = open('list.txt', 'w')
#Looping over files
for file in file_list:
#For each file in the directory, isolating the filename
filename = file.split('/')[-1]
#Opening the files, checking if value is greater than 8
a = np.genfromtxt(file)
if a[3,1] > 8:
f.write(filename + "\n")
f.close()
it is hard to tell exactly what you want but maybe something like this
from glob import glob
from re import findall
fpattern = "/path/to/*.dat"
def test(fname):
with open(fname) as f:
try:
return int(findall("\d+",f.read())[3])>8
except IndexError:
pass
matches = [fname for fname in glob(fpattern) if test(fname)]
print matches

Categories

Resources